# 5 steps to actually learn data science

Data science is a broad and varied field, and hence the path to becoming a unicorn is full of darkness. To light up your path and guide you to become one, here are 5 simple steps to be followed.

**4. Learn from peers**

It’s amazing how much you can learn from working with others. In data science, teamwork can also be very important in a job setting.

Some ideas here:

- Find people to work with at meetups.
- Contribute to open source packages.
- Message people who write interesting data analysis blogs seeing if you can collaborate.
- Try out Kaggle, a machine learning competition site, and see if you can find a teammate.

** ****5. Constantly increase the degree of difficulty**

Are you completely comfortable with the project you’re working on? Was the last time you used a new concept a week ago? It’s time to work on something more difficult. Data science is a steep mountain to climb, and if you stop climbing, it’s easy to never make it.

If you find yourself getting too comfortable, here are some ideas:

- Work with a larger dataset. Learn to use spark.
- See if you can make your algorithm faster.
- How would you scale your algorithm to multiple processors? Can you do it?
- Uunderstand the theory of the algorithm you’re using more. Does this change your assumptions?
- Try to teach a novice to do the same things you’re doing now.

**The bottom line**

This is less a roadmap of exactly what to do that it is a rough set of guidelines to follow as you learn data science. If you do all of these things well, you’ll find that you’re naturally developing data science expertise.

I generally dislike the “here’s a big list of stuff” approach, because it makes it extremely hard to figure out what to do next. I’ve seen a lot of people give up learning when confronted with a giant list of textbooks and MOOCs.

I personally believe that anyone can learn data science if they approach it with the right frame of mind.

I’m also the founder of Dataquest, a site that helps you learn data science in your browser. It encapsulates a lot of the ideas discussed in this post to create a better learning experience. You learn by analyzing interesting datasets like CIA documents and NBA player stats. You also complete projects and build a portfolio. It’s not a problem if you don’t know how to code – we teach you python. We teach python because it’s the most beginner-friendly language, is used in a lot of production data science work, and can be used for a variety of applications.

**Some helpful resources**

As I worked on projects, I found these resources helpful. Remember, resources on their own aren’t useful – find a context for them:

- Dataquest – learn data science in your browser, complete projects, and build a portfolio.
- Khan Academy – good basic statistics and linear algebra content.
- Introduction to Linear Algebra, 4th Edition – Great linear algebra book by Gilbert Strang.
- Calculus Online Textbook – also by Gilbert Strang, great calculus book.
- Elements of statistical learning – good machine learning book.
- Andrew Ng’s Machine Learning Class – the original coursera machine learning class. Mostly video-based.
- OpenIntro Statistics – Good basic stats book.
- Google Scholar – A paper can be a great way to learn about a topic. For example, here’s Breiman’s original random forest paper.
- Statsoft statistics textbook – Good for looking up statistics concepts.

*This post is adapted from my Quora answer on how to become a data scientist.*

**Bio: **Vik Paruchuri is a self-taught data scientist, and the founder of Dataquest.io, a platform for learning data science in your browser.

**Related:**