5 steps to actually learn data science

Data science is a broad and varied field, and hence the path to becoming a unicorn is full of darkness. To light up your path and guide you to become one, here are 5 simple steps to be followed.

2. Learn by doing

Learning about neural networks, image recognition, and other cutting-edge techniques is important. But most data science doesn’t involve any of it:

  • 90% of your work will be data cleaning.
  • Knowing a few algorithms really well is better than knowing a little about many algorithms.
    • If you know linear regression, k-means clustering, and logistic regression well, can explain and interpret their results, and can actually complete a project from start to finish with them, you’ll be much more employable than if you know every single algorithm, but can’t use them.
  • Most of the time, when you use an algorithm, it will be a version from a library (you’ll rarely be coding your own SVM implementations – it takes too long).

What all of this means is that the best way to learn is to work on projects. By working on projects, you gain skills that are immediately applicable and useful. You also have a nice way to build a portfolio.

One technique to start projects is to find a dataset you like. Answer an interesting question about it. Rinse and repeat.

Here are some good places to find datasets to get you started:

Another technique (and my technique) was to find a deep problem, predicting the stock market, that could be broken down into small steps. I first connected to the yahoo finance API, and pulled down daily price data. I then created some indicators, like average price over the past few days, and used them to predict the future (no real algorithms here, just technical analysis). This didn’t work so well, so I learned some statistics, and then used linear regression. Then I connected to another API, scraped minute by minute data, and stored it in a SQL database. And so on, until the algorithm worked well.

The great thing about this is that I had context for my learning. I didn’t just learn SQL syntax – I used it to store price data, and thus learned 10x as much as I would have by just studying syntax. Learning without application isn’t going to be retained very well, and won’t prepare you to do actual data science work.

Predicting the stock market
Fig. 3 This guy’s trying to predict the stock market, but needs some data science, apparently (via DailyMail)

3. Learn to communicate insights

Data scientists constantly need to present the results of their analysis to others. Skill at doing this can be the difference between an okay and a great data scientist.

Part of communicating insights is understanding the topic and theory well. Another part is understanding how to clearly organize your results. The final piece is being able to explain your analysis clearly.

It’s hard to get good at communicating complex concepts effectively, but here are some things you should try:

  • Start a blog. Post the results of your data analysis.
  • Try to teach your less tech-savvy friends and family about data science concepts. It’s amazing how much teaching can help you understand concepts.
  • Try to speak at meetups.
  • Use github to host all your analysis.
  • Get active on communities like Quora, DataTau, and the machine learning subreddit.