5 steps to actually learn data science
Data science is a broad and varied field, and hence the path to becoming a unicorn is full of darkness. To light up your path and guide you to become one, here are 5 simple steps to be followed.
4. Learn from peers
It’s amazing how much you can learn from working with others. In data science, teamwork can also be very important in a job setting.
Some ideas here:
- Find people to work with at meetups.
- Contribute to open source packages.
- Message people who write interesting data analysis blogs seeing if you can collaborate.
- Try out Kaggle, a machine learning competition site, and see if you can find a teammate.
5. Constantly increase the degree of difficulty
Are you completely comfortable with the project you’re working on? Was the last time you used a new concept a week ago? It’s time to work on something more difficult. Data science is a steep mountain to climb, and if you stop climbing, it’s easy to never make it.
If you find yourself getting too comfortable, here are some ideas:
- Work with a larger dataset. Learn to use spark.
- See if you can make your algorithm faster.
- How would you scale your algorithm to multiple processors? Can you do it?
- Uunderstand the theory of the algorithm you’re using more. Does this change your assumptions?
- Try to teach a novice to do the same things you’re doing now.
The bottom line
This is less a roadmap of exactly what to do that it is a rough set of guidelines to follow as you learn data science. If you do all of these things well, you’ll find that you’re naturally developing data science expertise.
I generally dislike the “here’s a big list of stuff” approach, because it makes it extremely hard to figure out what to do next. I’ve seen a lot of people give up learning when confronted with a giant list of textbooks and MOOCs.
I personally believe that anyone can learn data science if they approach it with the right frame of mind.
I’m also the founder of Dataquest, a site that helps you learn data science in your browser. It encapsulates a lot of the ideas discussed in this post to create a better learning experience. You learn by analyzing interesting datasets like CIA documents and NBA player stats. You also complete projects and build a portfolio. It’s not a problem if you don’t know how to code – we teach you python. We teach python because it’s the most beginner-friendly language, is used in a lot of production data science work, and can be used for a variety of applications.
Some helpful resources
As I worked on projects, I found these resources helpful. Remember, resources on their own aren’t useful – find a context for them:
- Dataquest – learn data science in your browser, complete projects, and build a portfolio.
- Khan Academy – good basic statistics and linear algebra content.
- Introduction to Linear Algebra, 4th Edition – Great linear algebra book by Gilbert Strang.
- Calculus Online Textbook – also by Gilbert Strang, great calculus book.
- Elements of statistical learning – good machine learning book.
- Andrew Ng’s Machine Learning Class – the original coursera machine learning class. Mostly video-based.
- OpenIntro Statistics – Good basic stats book.
- Google Scholar – A paper can be a great way to learn about a topic. For example, here’s Breiman’s original random forest paper.
- Statsoft statistics textbook – Good for looking up statistics concepts.
This post is adapted from my Quora answer on how to become a data scientist.
Bio: Vik Paruchuri is a self-taught data scientist, and the founder of Dataquest.io, a platform for learning data science in your browser.
- Top 20 Data Science MOOCs
- Business Analytics & Business Intelligence Online Certificates & Degrees
- How to become a Data Scientist for Free
- A Great way to learn Data Science by simply doing it