How To Decide What Data Skills To Learn

Read this article to learn about getting the most valuable skills on the job market.

By Brandon Walker, Data Scientist at IBM, Cloud Pak Acceleration Team

If you google “how to learn <skill>” you’re probably going to find at least one online course, youtube tutorial, book, or article covering it well. Many of these resources will even be for free. When it comes to deciding where to learn a skill, there are many opinions. The people with these opinions haven’t tried every single educational product (and are maybe even trying to sell you something), so it’s hard to say what the best resource is. When it comes to picking the resource, I have no recommendation, other than to not stick with it if you don’t like it. There is almost always a better resource to switch to. The real question here is; What skill are you going to put in to your google search?


Don’t Do What The Job Descriptions Say

Well, this sounds counter intuitive. If you don’t learn what’s on the job apps, how will you look attractive to employers? I have two reasons that job apps are not a good barometer. First, top skills asked for include things like Excel and Tableau, jobs that have these as the main requirements aren’t really data science jobs to me. Sure you may use them as a data scientist occasionally, but this is more the job of a data analyst, business intelligence person, or business analyst. Essentially, companies don’t all have the same definition of what constitutes a data scientist and what constitutes a data analyst, so job apps are a less effective gauge.

My main reason not to use job descriptions is an indicator is that they are usually written by HR and/or recruiters, not by whoever is going to be your manager. I’ve seen jobs that ask for 5 years experience with PyTorch, which is hard experience to find considering it is around 3 years old. Should you really be taking your queue for what to learn from people that don’t do data science work? I don’t think so. The people who know what the valuable skills are are other data scientists. Additionally, you’re more than likely going to be interviewed by a senior data scientist at some point. Since they are likely the toughest filter to get past, you should optimize for impressing the data scientist, not the recruiter.


Don’t Just Do What Other Data Scientists Do

But I just told you to do that! Well, I would suggest these conditions. You need to first, have the core skills that everyone has. This really means Python, including matplotlib, pandas, sklearn, and numpy. But if you’re reading this, you’ve likely already got that covered. From there, I would use the plot below to help decide what to learn next. The data for this plot is from a KDnuggets poll of data scientists; Asking them what skills they had and what they wanted. The point I’m making here is you can’t just ask data scientists what tools they use, then learn those tools as well. You must distinguish yourself from other data scientists. To do so, you have to learn the skills that few of them have. What’s even better is to pick the most wanted skills out of the ones few people have. Essentially, you need to try to get skills in the top left of this graph.


I’d break the top left of the graph down into 4 things, loosely ranked below:

  1. Deep Learning — learn the theory, then how to implement it in TensorFlow or PyTorch (not super important which one you pick)
  2. NLP — ranks higher than the two below because tools go in and out of fashion, but NLP knowledge is useful no matter how you implement it
  3. Big Data Tools —Hadoop is a good skill to have for making use of Big Data, but Big Data is by no means limited to Hadoop.
  4. Spark

Again, I don’t know the best way to learn these skills, I don’t think anyone else does either (hopefully someone has an exciting dataset that can prove me right or wrong). I do believe though that this is the best framework for selecting what to learn. Learning what other data scientists don’t know but wish they did is a powerful advantage on the job market.

Bio: Brandon Walker is a data scientist at IBM in Austin, Texas. He works on the Cloud Pak Acceleration Team (CPAT), where he consults with clients on the best way to make use of IBM AI/data science products, particularly Watson and the Cloud Pak for Data. His best skills are machine learning, data wrangling, python, and R, but he also has a strong interest in reinforcement learning and DevOps.

Original. Reposted with permission.