Gold BlogHow to Acquire the Most Wanted Data Science Skills

We recently surveyed KDnuggets readers to determine the "most wanted" data science skills. Since they seem to be those most in demand from practitioners, here is a collection of resources for getting started with this learning.



KDnuggets recently conducted a survey to find out which data science skills our readers reported currently being in possession of, and which skills they had hoped to add or improve upon.

The analysis of the results shows that the top 10 most wanted skills from our list of 50 (in decreasing order of percentage of respondents who wanted) were:

  1. Reinforcement Learning
  2. TensorFlow
  3. Deep Learning Algorithms
  4. PyTorch
  5. AWS (Amazon Web Services)
  6. Natural Language Processing
  7. Apache Spark
  8. Docker
  9. No-SQL Databases
  10. Computer Vision

This article will present some no-cost options for getting started learning about each of these skills. Instead of throwing a bunch of resources at the wall for each skill, I will instead point out one or two which I believe have proven their merit, some of which in turn point to additional vetted resources.

Keep in mind that these are approaches to learning the basics of a given topic, and in many cases gaining the expertise necessary for utilizing any of these particular skills on a daily basis or having an expert-level in-depth understanding of them could take many, many hours or even years of practice and study. But don't let that discourage you; get started learning now.

Figure

Figure 1: Top 10 Most Wanted Data Science/Machine Learning Skills
Plotted by percentage of respondents who have against percentage of respondents who want to add or improve upon

 

 
1. Reinforcement Learning

Reinforcement learning topped our list of wanted skills, with 51.9% of respondents stating that they hoped to add it to their skills portfolio. Two comprehensive reinforcement learning resources with which I am familiar are:

 
2. TensorFlow

51.2% of respondents said they would like to either improve upon their TensorFlow skills or add it to their repertoire. And this is for good reason, as TensorFlow remains one of the most prolific and widely-used Python machine learning libraries out there. A great resource for learning practical TensorFlow is:

 
3. Deep Learning Algorithms

Deep learning algorithms were found to be a desirable skill to acquire by 50.8% of respondents. The passage of time is solidifying the notion that deep learning isn't a fad that's on its way out; just recently, Geoffrey Hinton was quoted saying that "Deep learning is going to be able to do everything." Seems like good reason to at least have some understanding of what these algorithms can do. Good places to turn to understand deep learning on a fundamental level continue to be:

 
4. PyTorch

In fourth place was PyTorch, with 50.1% of respondents noting that they would be interested in adding or increasing their knowledge of the deep learning library. The best place to learn the basics of PyTorch continues to be the PyTorch official tutorials, available here:

 
5. AWS (Amazon Web Services)

48.8% of respondents reported their interest in learning AWS, which is a very wide array of related services from Amazon. Given this, the place to go to learn the basics of any of these many related services remains the official Amazon AWS training site, available here:

 
6. Natural Language Processing

Natural language processing remains a popular skill (or set of skills), as evidenced by their being in demand by 48.7% of our respondents. Learning NLP is a lot more time intensive that taking a few courses, and approaching it casually from solely a data science, computer science, or AI background will only get you so far. However, gaining an understanding of the very basics can be accomplished by a course such as the one from Amazon's Machine Learning University:

 
7. Apache Spark

45.3% of respondents are interested in knowing more about Apache Spark. Big data is no longer worthy of note, since so much of the data we work with is big, and its just assumed that we know how to be able to process it. This is where Apache Spark comes in. The following article provides some insight into where to start with learning Spark given differing objectives, and notes 5 free sources for doing so.

 
8. Docker

Docker is becoming increasingly more of a must-know for those in the data development world, especially those in data DevOps and data engineering. It should be of no surprise that 44.9% of respondents are interested in knowing more about the technology in order to add it to their list of skills. Here is an article which addresses approaching Docker from the point of view of a data scientist:

 
9. NoSQL Databases

NoSQL is a broad term encompassing a wide variety of database engines and technologies which do not fit into the traditional SQL/relational database mold. As such, the term functionally includes database types such as graph databases, key-value stores, columnar databases, and document stores. These are all different, but have in common that they are employed when relation models are not feasible. 43.0% of respondents want to know more, and this article can help scratch the surface and point them in the right direction to learn more.

 
10. Computer Vision

Finally, in the number 10 position, computer vision is a skill (or, again, related set of skills) desired by 42.7% of respondents. Similar to NLP, mastery of the computer vision field of play is much more complicated than taking a course, but — also similar to NLP — a rudimentary understanding of the basics can be achieved via an Amazon Machine Learning University course:

 
Related: