Learn Data Science in 8 (Easy) Steps

Want to learn data science? Check out these 8 (easy) steps to set out in the right direction!

Step 3. Understand Databases

When you start out learning data science, you see that a lot of tutorials focus on you retrieving data from flat files. However, when you start working or when you get in touch with the industry itself, you see that most of the work happens through a connection with one or multiple databases.

And there are a lot of databases out there. Companies might work with commercial ones like Oracle or they might opt for open-source alternatives. The key to seeing the forest for the trees here is to understand how databases work. Learn about the why and how of databases and the what will come. Concepts that you should grasp and know your way around in are the Relational Database Management Systems (RDBMS) and data warehousing. That means that relational versus dimensional modeling should not hold any secrets for you, nor should SQL or the Extract-Transform-Load process (ETL) surprise you.

If you want to learn to understand how databases work, you should check out Mongo DB University, the “Introduction to Databases” class at Stanford Online and the tutorials at DataStax and TutorialsPoint.

Step 4. Explore The Data Science Workflow

A next phase in the learning process would be to explore the data science workflow. A lot of tutorials or courses focus on only one or two aspects of it, but lose the general overview of the process that you will need to go through once you’re working as a data scientist or in a data science team. It’s essential not to lose sight of the iterative process that data science is.

For data science beginners that know how to program, the easiest way to discover how the data science workflow works is by practicing your coding skills: get started on your journey with R or Python. There are several packages and libraries that you designed to make your coding life easier. Check out the infographic snippet below:

8 Easy Steps

For those beginners who still feel that their hacking skills are lacking, it’s worth checking out the open-source alternatives that don’t require you to code everything. These tools will allow you to do more than one step in the data science workflow at the same time. For example, RapidMiner allows you to import or collect your data, do some operations on it to clean it, model and evaluate it. Note that it’s good to know how to work with these tools but that you should keep on working on your coding skills!

Step 5. Level Up with Big Data

Many learners are so concerned with what they call “the fundamentals” of data science that they forget the bigger picture out there. Literally. You have had some hints in the previous sections about this, but there is a discrepancy. Just like the discrepancy between the flat files that you use in many tutorials and the databases that are used in the industry, the velocity, variety and volume of the data that is out there. It’s a reality that you cannot nor should not miss.

Big data might have been a hype, but it’s definitely out there, and it’s important to realize this and understand what it encompasses. Three things to learn about big data are:

  1. See why big data requires a different approach of data processing. The best approach to do this is probably by looking at big data use cases. You can read up on some here.
  2. Get familiar with the Hadoop framework: it’s widely used for distributed data storage and processing.
  3. Don’t forget about Spark. Getting the hang out of Spark in combination with Python or Scala is the way to go. And, even better, you kill two birds with one stone: you practice your coding skills and widen your view on data science.

Step 6. Grow, Connect and Learn

Grow. Once you have gotten to this point where you already master the fundamentals, it’s time to grow: practice as much as you can by doing data science challenges, like the ones you find on Kaggle or DrivenData. They will definitely challenge you to put the theory into practice. Also, you should also let your intuition grow.

Connect. As a data science learner, you might fall into the pitfall of staying occupied with your learning and that of other learners, but it is equally important to connect to those who already have some more experience in the field. This way, you build up a network to fall back on in case you have questions, need advice or tips, or whatever. These people will motivate you to keep up the good learning and will challenge you to go even further.

Learn. Continuous learning and data science could be synonyms. The Kaggle and DrivenData challenges that have been mentioned above will teach you a thing or two about how data science is done in practice. Apart from these relatively small exercises, you might consider starting up a pet project and explore some things even on a deeper level.

Step 7. Immerse Yourself Completely

Just like a language bath, you’re in need of a data science bath. Depending on your skills and knowledge that you already have, you might consider a bootcamp, an internship or a job. A bootcamp is an amazing way of kickstarting or boosting your data science learning. As a plus, you meet a lot of people, and you have an opportunity to build or extend your network. Are you having trouble finding one? Check out Galvanize and Metis, but also don’t forget that your Meetup Groups might also organize bootcamps and workshops for the community!

Secondly, when you have already got the basics of data science under control, you should consider getting an internship. A lot of the big companies like Facebook, Quora and Amazon have looked for interns before, so this is a great place to start your search. Also, you can use your social channels or your network to get first-hand information on open positions for internships. Lastly, also take a look at startups: these smaller companies can be willing to let you learn on the job as long as you learn quickly. AngelList is worth checking out for startup jobs.

The last immersion option is where most learners experience a bottleneck, as the recent search trend in “Data Science Interviews” confirms. Even though you might be very enthusiastic about a job as a data scientist, it’s essential to keep a couple of things in mind when you’re looking for a job:

  • The job postings don’t always have the roles right. They might post for a “Data Scientist” position, but in reality, they’re looking for a data engineer or business analyst. Check out DataCamp’s The Data Industry: Who Does What infographic to see what companies look for when they post open positions.
  • Set your expectations straight: starting in a data scientist or analytics position if you haven’t had any real-life experience with the data science workflow, databases or end-to-end development, is not realistic. Make sure you have relevant experience to show when you’re applying.

Don’t get discouraged if you can’t get the job immediately. Instead, try to make sure that you keep busy and build our experience and keep an eye out for the companies that have posted data science positions before, like Google, Microsoft and Twitter.

Step 8. Engage with The Community

This last step is one that can be overlooked sometimes. Even when you have a job in data science or as a data scientist, you still need to remember that data science equals continuous learning. There are new advancements all the time, and it’s of key importance to stay informed and curious about what’s happening around you.  So don’t hold back to contribute to discussions on social media, subscribe to a newsletter, follow the key people of the data science industry, listen to a podcast, … Whatever you can do to engage with the community!

To stay up to date with the latest news, you can register to the following newsletters: the bimonthly KD Nuggets newsletter and Data Elixir or the Data Science Weekly newsletters. Next, follow some of the key people in the data science industry on Twitter. This will also keep you up to speed with the latest. Just some of the people that might interest you are DJ Patil, Andrew Ng, and Ben Lorica.

Join some communities online. LinkedIn, Facebook, Reddit, ... They all offer the possibility to connect with peers. You should take on the opportunity to become a member of one of those groups:

  • On LinkedIn, make sure to take a look at the “Big Data, Analytics, Business Intelligence”, “Big Data Analytics”, “Data Scientists” or “Data Mining, Statistics, Big Data, Data Visualization, and Data Science” groups.
  • At Facebook, the “Beginning Data Science, Analytics, Machine Learning, Data Mining, R, Python”, “Learn Python” groups might interest you.
  • Subreddits that you can keep an eye on are “/r/datascience”, “/r/rstats” and “/r/python”, among many others!

This list is just meant as a pointer and isn’t exhaustive! If you would like to see an overview with even more resources, go here.

Lastly, don’t forget to contribute to the communities for which you have signed up!

On DataCamp

DataCamp is an online interactive education platform that that focuses on building the best learning experience specifically for Data Science. Our courses on R, Python and Data Science are built around a certain topic, and combine video instruction with in-browser coding challenges so that you can learn by doing. You can start every course for free, whenever you want, wherever you want.

Bio: Karlijn Willems is a data science journalist and writes for the DataCamp community, focusing on data science education, the latest news and the hottest trends. She holds degrees in Literature and Linguistics and Information Management.