KDnuggets Home » News » 2015 » Mar » Opinions, Interviews, Reports » 10 Steps to Success in Kaggle Data Science Competitions ( 15:n09 )

10 Steps to Success in Kaggle Data Science Competitions


The author, ranked in top 10 in five Kaggle competitions, shares his 10 steps for success. These also apply to any well-defined predictive analytics or modeling problem with a closed dataset.



Step 7: Do your research For any given problem, it’s likely that there are people dedicating their lives to its solution. Those people (often academics) have probably published papers, benchmarks and code, which you can learn from. This has always worked well for me, as I’ve learned something new and applied it successfully in every competition I’ve worked on. Unlike actually winning, which is not only dependent on you, gaining deeper knowledge and understanding is the only sure reward of a competition. Step 8: Apply the basics rigorously While playing with obscure methods can be a lot of fun, it’s often the case that the basics will get you very far. Common algorithms have good implementations in most major languages, so there’s no reason not to try them. However, when you experiment with any method, you must do some minimal tuning of the main parameters (e.g., number of trees in a random forest or the regularisation of a linear model). Running a method without minimal tuning is worse than not running it at all, because you may get a false negative – giving up on something that would actually work very well. Step 9: Ensemble all the things Not to be confused with ensemble methods (which are also very important), the idea here is to combine models that were developed independently. In high-profile competitions, it is often the case that teams merge and gain a significant boost from combining their models. This is worth doing even when competing alone, because almost no competition is won by a single model. Step 10: Win Typically, steps 1-5 would happen once per competition or problem, while steps 6-9 would be repeated in a loop or occur in parallel until you run out of time. Often, overall performance in a competition is a function of the time invested. If you persist, keep trying things and learning from your experience, you will do well. The most important step is step 0: commit to working on a competition. If you do, you’re guaranteed to win the reward of learning and growing as a data scientist. About the author Yanir Seroussi Yanir Seroussi is a data scientist from Sydney, Australia. He has participated in five Kaggle competitions, ranking in the top ten in all cases. Yanir has a PhD from Monash University and a BSc from Technion. He’s currently working on his own projects, while doing part-time data science consulting for various clients. Related: