Top 4 tricks for competing on Kaggle and why you should start
If you aren't familiar with Kaggle, you should be. Hear why from two expert Kagglers in this article.
Kaggle is the world's biggest data science competition platform, and it has huge relevance to real-world data science. Competing on Kaggle means staying on top of the state of the art.
On Kaggle, you'll also meet and get the chance to team up with data scientists, data enthusiasts, and field specialists from across the globe, learning from an amazing community as you go. Consistently participating in Kaggle competitions is a card you can play smartly to show interest and passion in your data science job search. It’s also very useful for improving some specific skills that can differentiate you as a data scientist and not make you obsolete in front of AutoML solutions.
And it's about far more than just the competitions – the resources available to you on Kaggle are second-to-none:
- Notebooks full of example code from experienced competitors that you can run with free Kaggle compute
- Datasets for you to explore to your heart's content
- Discussion forums filled with valuable back and forth
After playing with data and models on Kaggle for a while, you’ll have had the chance to see enough different datasets, problems, and ways to deal with them under time pressure that when faced with similar problems in real settings, you’ll be skilled in finding solutions quickly and effectively.
Hear from the experts
Luca Massaron and Konrad Banachewicz are two Kaggle Grandmasters who have competed for over 20 collective years in 330 competitions. Here they are:
And here's how they got into Kaggle way back when:
Konrad: "I started 12 years ago when I was working at a bank. At that time, R was the primary tool used in financial modeling and, one day, searching for some mundane detail of a random forest implementation, I found a script that did exactly what I needed – on Kaggle. That caught my attention, so I created an account and started looking around. My early exploits were not exactly my finest work ever ;-) but thanks to the fantastic culture of sharing I had a chance to learn from the best: how to do exploratory analysis, set up a validation scheme properly, handle missing values – all the things I wish the university statistics curriculum included. Before I knew it, Kaggle competitions became a regular component of my intellectual workout routine."
Luca: "I found Kaggle just by chance 10 years ago. Reluctant to start, I finally got into a competition because it was dealing with psychometrics and NLP, my interests at the time. I immediately learned the hard reality of not properly cross-validating and got how Kaggle could have been important for me to develop my career. I never stopped since then; even when I do not have time and resources to participate in a competition in full, I still start it and give it a try because there is something new that I can learn in every competition. In competitions I ended up at the 7th position in the worldwide rankings. At the moment, I’m a 1xGrandmaster and 2xMaster."
Tell me how I can be a better competitor!
Luca and Konrad know better than anyone that it's easy to get discouraged on Kaggle, especially when you're just starting out and everything seems overwhelming. That's why in their new book, The Kaggle Book (https://amzn.to/3K57Wdn), they have collected all the wisdom they've built up over their long Kaggle journeys into a guide for anyone looking to get the most out of the platform. There’s even a foreword from the Kaggle Founder & CEO himself, Anthony Goldbloom.
Below are 4 suggestions, all snippets taken from the book, for getting the best out of Kaggle:
1. Design a good validation scheme
"Having a proper validation strategy is the great discriminator between successful Kaggle competitors and those who just overfit the leaderboard and end up in lower-than-expected rankings after a competition."
The book dedicates a chapter to explaining designing good validation in the detail it deserves.
2. Understand the competition metric
"You cannot escape the fact that the basic principle at the core of both real-world projects and Kaggle competitions is the same. Your work will be evaluated according to some criteria, and understanding the details of such criteria, optimizing the fit of your model in a smart way, or selecting its parameters according to the criteria, will bring you success. If you can learn more about how model evaluation occurs in Kaggle, your real-world data science job will also benefit from it."
Chapter 5, Competition Tasks and Metrics, tackles how to deal with all the common metrics you'll see in ML tasks, as well as never-before-seen metrics that have only shown up in Kaggle competitions.
3. Build simple baselines, then iterate quickly
In the book, Luca and Konrad interview over 30 Kaggle Grandmasters and Masters about their time on Kaggle, and there are clear common threads in their answers. One is the importance of keeping things simple. In Grandmaster Giuliano Janson's words:
"One of the lessons learned that I always share with people new to ML is to “never get over-enamored with overly complex ideas.” When facing a new complex problem, it is easy to be tempted to build complex solutions. Complex solutions usually require time to develop. But the main issue is that complex solutions are often of marginal value, conditional on robust baselines. [...] My advice is to stick to Occam’s razor and try easy things before being tempted by more complex approaches."
4. Don't go it alone!
"Teaming has its own advantages because it can multiply efforts to find a better solution. A team can spend more time on the problem together and different skills can be of great help; not all data scientists will have the same skills or the same level of skill when it comes to different models and data manipulation."
Many Kagglers have met future friends, colleagues, and employers as a result of their time on the site – whether that's through directly interacting with them on the platform, competing with them, or participating in networking events like Kaggle Days. Kaggle's competition rules also favor teams over solo competitors, in terms of how points are distributed.
For more insights like these, as well as sample code, end-to-end pipelines, and competition analysis, The Kaggle Book is available on Amazon at the following link: https://amzn.to/3K57Wdn