5 steps to actually learn data science
Data science is a broad and varied field, and hence the path to becoming a unicorn is full of darkness. To light up your path and guide you to become one, here are 5 simple steps to be followed.
4. Learn from peers
It’s amazing how much you can learn from working with others. In data science, teamwork can also be very important in a job setting.
Some ideas here:
 Find people to work with at meetups.
 Contribute to open source packages.
 Message people who write interesting data analysis blogs seeing if you can collaborate.
 Try out Kaggle, a machine learning competition site, and see if you can find a teammate.
5. Constantly increase the degree of difficulty
Are you completely comfortable with the project you’re working on? Was the last time you used a new concept a week ago? It’s time to work on something more difficult. Data science is a steep mountain to climb, and if you stop climbing, it’s easy to never make it.
If you find yourself getting too comfortable, here are some ideas:
 Work with a larger dataset. Learn to use spark.
 See if you can make your algorithm faster.
 How would you scale your algorithm to multiple processors? Can you do it?
 Uunderstand the theory of the algorithm you’re using more. Does this change your assumptions?
 Try to teach a novice to do the same things you’re doing now.
The bottom line
This is less a roadmap of exactly what to do that it is a rough set of guidelines to follow as you learn data science. If you do all of these things well, you’ll find that you’re naturally developing data science expertise.
I generally dislike the “here’s a big list of stuff” approach, because it makes it extremely hard to figure out what to do next. I’ve seen a lot of people give up learning when confronted with a giant list of textbooks and MOOCs.
I personally believe that anyone can learn data science if they approach it with the right frame of mind.
I’m also the founder of Dataquest, a site that helps you learn data science in your browser. It encapsulates a lot of the ideas discussed in this post to create a better learning experience. You learn by analyzing interesting datasets like CIA documents and NBA player stats. You also complete projects and build a portfolio. It’s not a problem if you don’t know how to code – we teach you python. We teach python because it’s the most beginnerfriendly language, is used in a lot of production data science work, and can be used for a variety of applications.
Some helpful resources
As I worked on projects, I found these resources helpful. Remember, resources on their own aren’t useful – find a context for them:
 Dataquest – learn data science in your browser, complete projects, and build a portfolio.
 Khan Academy – good basic statistics and linear algebra content.
 Introduction to Linear Algebra, 4th Edition – Great linear algebra book by Gilbert Strang.
 Calculus Online Textbook – also by Gilbert Strang, great calculus book.
 Elements of statistical learning – good machine learning book.
 Andrew Ng’s Machine Learning Class – the original coursera machine learning class. Mostly videobased.
 OpenIntro Statistics – Good basic stats book.
 Google Scholar – A paper can be a great way to learn about a topic. For example, here’s Breiman’s original random forest paper.
 Statsoft statistics textbook – Good for looking up statistics concepts.
This post is adapted from my Quora answer on how to become a data scientist.
Bio: Vik Paruchuri is a selftaught data scientist, and the founder of Dataquest.io, a platform for learning data science in your browser.
Related:
Top Stories Past 30 Days

