Repository files navigation Awarded 16 analytical reports with Python at 16GB Kernel of Kaggle.
Name: Satoru Shibata / 柴田 怜
Job: Sr. Data Scientist
Titles:
Departments
Top Levels
Highest Rank
Awarded Medals
Code
0.2%
317/161,898
3 Silver 13 Bronze
Discussion
0.3%
588/188,433
100 Bronze
Datasets
1%
354/34,643
3 Bronze
Competitions
20-30%
Optimized LightGBM with Optuna adding SAKT Model
Lead sentences
Submitted code using two Ensemble Learning Methods.
Used 100 million rows of training data for prediction on a 16GB Kernel removing unnecessary objects.
Issue
Algorithms for TOEIC Learning Applications
Significance
Predict percentage of correct answers based on user's behavioral history.
User's percentage of correct answers will increase with the number of problems solved.
Purpose
Optimize Binary Classification for AUC.
Methodology:
Ensemble Learning of LightGBM and SAKT.
Hyperparameter Optimization with Optuna.
Results
Score: AUC = 0.781.
Code: 31 Points.
Considerations
Obsessed with Models, Feature Engineering Remains a Challenge.
Systematizing Multiple Models will also be a challenge in the future.
Conclusion
LightGBM Classifier and Logistic Regression Report
Lead sentences
Optimized Classification of Anonymized Raw Data from Stock Market on 16GB Kernel.
Contributed code that systematizes Ensemble Learning and Logistic Regression.
Issue
Utility Function Optimization of Supply and Demand Forecasting in Securities Markets.
Significance
Calculate based on indicators of absence or degree of stock returns.
Optimize the behavior of whether to trade or not.
Purpose
AI Dev for Profit Maximization.
Methodology
Optimal classification of LightGBM.
Logit Transformation of Purpose Variables Based on Probability Distributions.
Results
Score: 3741.118 (Outside of Medal Zone).
Code: 33 Points.
Considerations
Utility function was not fully deciphered
Which left some issues for paper surveys.
The report was appreciated by other Kagglers.
Conclusion
Optimize LightGBM HyperParameter with Optuna and GPU
Lead sentences
Unprecedented LightGBM Hyperparameter Optimization on GPU.
Procedure was annexed and highly evaluated.
Issues
Preliminaries of “LightGBM Classifier and Logistic Regression Report“ .
Significance
High Parameter Optimization.
There were few precedents for LightGBM.
Purpose
Code submission for optimizing LightGBM Hyperparameter on GPU.
Methodology
A survey of prior case studies using Optuna for LightGBM.
Procedure of ssubmissions.
Results
Run: 953.9s on GPU
Code: 31 Points.
Consideration
Available hyperparameter optimization of futures.
Conclusion
Optimized Logit LightGBM Classifier and CNN Models
Lead sentences
Submitted a simulation of Multiple Model Systematization.
Based on this failure, I was able to concentrate on LightGBM Optimization and Inference.
Issue
Exploring Optimization Models
Significance
Simulation iterations of Optimization Model.
Purpose
Optimize Utility Function by systematizing Multiple Models.
Methodology
Applying the Logit Transform to LightGBM.
Explore combining with CNN.
Results
Score: 3344.738 (Outside of Medal Zone).
Code: 15 Points.
Considerations
This code does LightGBM and CNN at the same time, which was prone to overflow.
From now on, I will focus on one Model Optimization.
Conclusion
Optimized LightGBM with Optuna
Lead sentences
Dev Baseline Model for Code Competition to process 100 million rows of training data.
The minimum performance was predicted to be 16GB.
Issue
100 million rows of training data must be predicted on a 16GB Kernel.
Significance
This is the cornerstone of the final submission model.
Preprocess and Feature Engineering were adjusted for further optimization.
Purpose
Methodology
Binary Classification by LightGBM Optimization.
Results
Score: AUC = 0.774.
Code: 12 Points.
Considerations
Policy of additional development to Baseline Model.
The improvement of AUC by the additional development was only 0.07, which left some issues.
Conclusion
LightGBM on GPU with Feature Engineering, Optuna, and Visualization
Lead sentence
Code Bronze Medal for first attempt at submitting code.
Issue
This was my first real effort at Kaggle.
Significance
Visualize in a timely manner, and features were studied.
Optuna was also used for the first time and applied later.
Purpose
Work on Feature Engineering.
Methodology
I read and referred to posted code by Kaggle Grandmaster.
Results
Consideration
I could gain experiences in implementing LightGBM with Optuna on GPU.
Conclusion
LightGBM with the Inference and Empirical Analysis
Lead sentences
In the first scored submission code, AUC = 0.76.
The challenges were used as the cornerstone of development experiences.
Issue
Scoring by developing additions to the submitted code for my first challenge.
Significance
A single process was limited to Model Object Generation.
Purpose
To further improve the performance of Prediction Model.
Methodology
Inference was added to improve Score.
Empirical Analysis between raw data and predicted results.
Detected significant differences in Gaussian Distribution.
Results
Score: AUC = 0.76.
Code: 12 Points.
Consideration
This submitted code left insufficient understanding of inference as an issue.
Conclusion
Submission and the Inference of LightGBM
Lead sentences
My first scoring submission code prototype
Few examples of Empirical Analysis, I won Code Bronze Medal.
Issue
Prototype version of submission code for first scoring.
Significance
Implementing the scoring submission code.
Purpose
Gaining development experiences.
Methodology
Model objects were coded for scoring.
Empirical Analysis detected a significant difference in Gaussian Distribution.
Result
Considerations
Actual scoring submission code became a separate file.
This was an opportunity for me to feel the challenge of coding.
Focused on its afterwards.
Conclusion
Market Prediction XGBoost with GPU Modified
Lead sentences
Performance comparison with LightGBM by XGBoost Optimization.
LightGBM takes the cake.
Issue
I seen good results with XGBoost sometimes.
Significance
Simulate on Models other than LightGBM and search for Optimized Model.
Purpose
Score improvement by XGBoost.
Methodology
GPU Implementation into XGBoost Optimization.
Results
Score: 3308.824 (Outside of Medal Zone).
Code: 8 Points.
Considerations
XGBoost is easy to implement due to its many precedents.
LightGBM is superior in performance comparison, which led me to focus on LightGBM.
Conclusion
Cassava Leaf Disease Best Keras CNN Tuning
Lead sentences
I also participated in a competition on image analysis, challenging myself with raw data of various properties.
I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.
Issue
I would like to try my hand at image analysis and find out what I am good at.
Significance
I want to gain experience in Keras implementation.
Deepen my understanding CNN.
Purpose
I learn to understand and implement acoustic analysis and image analysis.
Methodology
I complemented the advanced submission code.
Results
Score: Accuracy = 0.885.
Code: 18 Points.
Considerations
Theoretical aspects of acoustic analysis and image analysis remained a challenge.
An opportunity to raise awareness to need to start with a survey of theoretical papers.
Conclusion
RFCX Residual Network with TPU Customized
Lead sentences
I also participated in a competition for acoustic analysis, and tried my hand at raw data of various properties.
I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.
Issue
I would like to try my hand at acoustic analysis and find out what I am good at.
Significance
I want to gain experience in Keras implementation.
Deepen my understanding CNN.
Purpose
I learn to understand and implement acoustic analysis and image analysis.
Methodology
I complemented the advanced submission code.
Results
Score: 0.772.
Code: 12 Points.
Considerations
Theoretical aspects of acoustic analysis and image analysis remained a challenge.
An opportunity to raise awareness to need to start with a survey of theoretical papers.
Conclusion
Research with Customized Sharp Weighted
Lead sentences
Work on Custom Metrics Clarification and systematization of Hyperparameters Optimization in LightGBM.
An each milestone optimization object generation is still important.
Issue
Private Custom Metrics were used as an Evaluation Function.
Significance
Improve prediction accuracy by elucidating Private Custom Metrics.
Reproducibility will be determined based on the Evaluation Function.
Purpose
Custom Metrics Clarification.
Methodology
LightGMB High Parameter Optimization.
Systematization with Custom Metrics Decoding Examples.
Results
Generate each Parameter Optimization Object.
Code: 6 Points.
Consideration
Importance of each milestone optimization object generation was reaffirmed.
Conclusion
Optimize CatBoost HyperParameter with Optuna and GPU
Lead sentences
Performance comparison was performed on optimized Ensemble Learning.
LightGBM won the prediction accuracy.
Issue
I was new to CatBoost and wanted to compare performance with LightGBM.
Significance
Performance comparison of Ensemble Learning: LightGBM, XGBoost, CatBoost, etc.
Purpose
Algorithm selection for Prediction Models.
Methodology
Hyper-parameter optimization.
CatBoost implementation.
Results
Score: AUC = 0.500.
Code: 17 Points.
Consideration
At the baseline model stage, I gave the edge to LightGBM.
Conclusion
LightGBM on Lyft Tabular Data added Inference and Tuning
Lead sentences
Regression Prediction of LightGBM with Grid Search and Multiple Evaluation Functions.
A harvest that uncovered all sorts of challenges!.
Issue
Regression Problem for Table Data Related to Automated Driving.
Significance
I want to work on Regression Prediction with LightGBM.
Gain further development experiences.
Implement multiple evaluation functions to improve accuracy.
Purpose
Improving accuracy of Regression Prediction.
Methodology
Set evaluation functions of LightGBM in MSE and RMSE.
Parameter search by grid search.
Results
Score: 356.084.
Code: 10 Points.
Considerations
Grid search shown that hyperparameter optimization is inefficient.
I reaffirmed the need to use feature engineering and inference.
Conclusion
COVID-19 with H2OAutoML Baseline Model
Lead sentences
Experimented with AutoML performance, but found the original to be more powerful.
This led to the original development of the LightGBM optimization.
Issue
COVID-19 infection explosion and new global challenges.
Significance
Improvement of coding techniques for anonymized Table data.
Accumulate experiences using AutoML.
Purpose
Optimization Regression Prediction with AutoML.
Methodology
Set RMSLE as evaluation function for Regression Prediction with H2O.
Extract the optimized Regression Prediction Models: Deep Learning, XGBoost, GLM, GBM, etc.
Results
Score: RMSLE = 0.086.
Code: 6 Points.
Considerations
Original development was more powerful than H2OAutoML.
Opportunity to work on Optimized Regression Prediction with LightGBM.
Conclusion
Optimized Predictive Model with H2OAutoML
Lead sentences
Even in Binary Classification, AutoML was found to be inferior to proprietary.
It is thought that the difference was due to Preprocess and Feature Engineering.
Issue
Regression Prediction by H2OAutoML was inferior to original development.
Significance
It was unclear whether results would be similar to Regression Prediction.
Purpose
Experiment on H2OAutoML in Binary Classification.
Methodology
Set RMSLE as the evaluation function for Binary Classification with H2O.
Extract the Optimized Binary Classification Models: Deep Learning, XGBoost, GLM, GBM. ey/tc.
Results
Score: AUC = 0.850.
Code: 5 Points.
Considerations
The performance was higher than that of Regression Prediction Case.
Process and Feature Engineering itself is not automated.
It has to be developed independently.
Conclusion
About
16 optimizing insights on ensemble learning with Python.
Topics
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.