Lessons From My First Kaggle Competition

How I chose my first Kaggle competition to enter and what I learned from doing it.

By Shruti Turner, Prosthetics PhD Researcher at Imperial College London


Image by Johnson Martin from Pixabay


A little background

I find starting out in a new area of programming a somewhat daunting experience. I have been programming for the past 8 years, but only recently have developed a keen interest in Data Science. I want to share my experience to encourage you to take the plunge too!

I started out dipping my toe in the ocean of this vast topic with a couple of the Kaggle mini-courses. I didn’t need to learn how to write in Python, but I needed to equip myself with the tools to do the programming that I wanted. First up was Intro to Machine Learning — it seemed like a good place to start. As part of this course you contribute to an in course competition, but even after completing it, I didn’t feel prepared to do a public competition. Cue Intermediate Machine Learning, where I learned to use a new model and how to think deeper about a data problem.


Choosing a competition

I spent some time choosing the right competition. I didn’t want to just have a go at doing something with a dataset because I wanted to see my progress and evaluate my success, but I also didn’t want to feel bad about not being able to achieve anything. Having guidance on what to aim for felt like a nice security blanket too.

There is a competition using the same data as the in course competition, but I wanted something a bit different, but using the skills that I had learned, which mainly centred around numerical data to predict an outcome. The Titanic: Machine Learning from Disaster looked like a good one, it was labelled “Getting Started”. I don’t win anything from doing well at it, which might be a turn off. But, you have to evaluate: why are you doing the competition? At this point in my Data Science journey, I was looking to improve my knowledge and apply what I had learned.


What did I learn?

Short answer: quite a bit. Both about myself and how to think about a data science problem.
Long answer:

  1. I can apply what I’ve learned — this is perhaps an obvious one, but it was nice to be able to prove to myself that I could do it without following step by step instructions.
  2. I can spot errors and use my skills to solve them — the first submission I made achieved a 0.0000 mark i.e. it predicted nothing correctly. My first reactions were demoralised and upset. I really thought that I was going to be able to achieve something. It turns out, my output was in float type and not integer. One small change and suddenly I have an approximately 70% success rate.
  3. The Kaggle community is full of knowledge — at first I didn’t want to look at the other notebooks that had been shared, I wanted to make an attempt on my own first. I still think this was a good approach, but looking at them after I had submitted a couple of solutions myself allowed me to learn about what I don’t already know including approaches to the data and new algorithms. I even shared my notebook publicly in case it is of use for anyone else or I receive feedback/tips on it to improve.
  4. Some planning might help before starting next time — I was so keen to get stuck in, that I looked at the data given, made some quick observations and jumped into writing the code. Having looked at other solutions/tutorial notebooks, I’ve come to understand more of what to look for in the data and different ways to solve issues (and why!) The things I found weren’t too different to what I had done, but they were more in depth solutions; the next steps on from what I had done.
  5. I enjoy solving these types of problems and want to do more — an important one if this is something I’m thinking about pursuing a career in. I was only going to spend an hour on it, next time I looked up 4 hours had passed.


Next steps

Going forward, I’m keen to develop my data skills and I think Kaggle is a good place to start with that. I have by no means perfected my solution for the Titanic competition, but I have learned a lot from the experience.

I’d like to attempt another competition on Kaggle, but at a more measured pace. I’ll sit down with pen and paper first to decide how each column of data is relevant and the ways I can deal with the gaps in the data effectively and efficiently.

I would like to expand my knowledge too, not only going into more depth with the approaches to numerical data and predictions, but also into text analysis and image recognition. This will open so many more doors for me, both in terms of exploring new things and being able to attempt more competitions.

If you’re thinking about getting into Data Science, or maybe you’re already into it and want more practice, I’d really recommend giving the Kaggle competitions a go.

Bio: Shruti Turner is a Prosthetics PhD Researcher at Imperial College London. She is passionate about using her engineering and programming skills to improve quality of life.

Original. Reposted with permission.