Which Data Science Skills are core and which are hot/emerging ones?
Tags: Career, Data Science Skills, Data Visualization, Deep Learning, Excel, Machine Learning, Poll, Python, PyTorch, Scala, Skills, Statistics, TensorFlow
We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.
1. Which skills / knowledge areas do you currently have (at the level you can use in work or research)? and
2. Which skills do you want to add or improve?
We selected a list of 30 skills based on a number of previous KDnuggets articles and polls - see useful links at the end of this post, as well as external sources.
Altogether(*), this poll received over 1,500 votes - a large enough sample to make meaningful inferences. An average voter reported having 10 skills and wanted to add or improve 6.5 skills.
Fig. 1 below shows key findings, with X-axis showing % Have Skill - answers to the first poll question, and Y-axis showing % Want Skill - answers to the 2nd poll question. The size of each circle is proportional to the percent of voters that have that skill, while color depends on the ratio of Want/Have (red is high - more than 1, blue is low - less than 1).
Note: Other Big Data Tools entry is for Big Data tools other than Hadoop or Spark.
Fig. 1: Data Science-related Skills, Have skill vs Want to add or improve skill
We note two main clusters in this chart.
Cluster 1, in blue dashed rectangle on the right side of the chart, includes skills that over 40% of all voters have, and where the ratio of Want/Have is less than 1. We call them Core Data Science Skills.. They are listed in Table 1.
Table 1: Core Data Science Skills, in decreasing order of %Have
|ETL - Data Preparation||48.3%||14.1%||0.29|
Of these, the skills with most desire to add or improve are Machine Learning (41%) and Python (37%). The least growing skill is Excel - only 7% want to add or improve their Excel skills.
The second cluster, on the left in Fig. 1 and marked with a red border includes skills currently less popular (%Have< 30%) but growing, with %Want/%Have ratio over 1 - see table 2. We call them Hot / Emerging Data Science Skills.
Table 2: Hot / Emerging Data Science Skills, in decreasing order of %Want/%Have
|Other Big Data Tools||8.9%||27.4%||3.08|
|NLP - Text Processing||25.0%||33.8%||1.35|
Interestingly, despite opinions that Hadoop is declining, in this poll more people want to learn Hadoop than already know it, so it may still grow in popularity.
We did not include Julia among hot/emerging skills despite its high Want/Have ratio=3.4, because with only 2% of voters selecting it, it doesn't yet have enough support.
The remaining skills - XGBoost, Software Engineering, Java, MATLAB, SAS are possessed by between 10 and 30% of voters, but are not growing - have Want/Have ratio < 1.
Table 3: Other Data Science Skills, in decreasing order of %Have
Here is more detail on the poll. Fig. 2 ranks all the skills in decreasing order of %Have.
Fig. 2: Data Science Skills KDnuggets readers have
Fig. 3 show the skills readers want to add or improve, overlayed with skills they have.
Fig. 3: Data Science Skills KDnuggets readers want to add or improve (red) and have (blue)
We see that the top skills current and aspiring Data Scientists want to add are Deep Learning, Tensorflow, Machine Learning, and Python.
Poll also asked about employment type:
- Industry/Self-employed, 64.4%
- Government/non-profit, 7.2%
- Academia/University, 7.0%
- Student, 14.3%
- Other/NA, 7.1%
- US/Canada, 37.9%
- Europe, 28.3%
- Asia, 19.3%
- Latin America, 6.1%
- Africa/Middle East, 4.8%
- Other, 3.5%
Note: We originally launched this poll using Google Forms, and it was attacked by bots with over 50,000 votes for Julia and MATLAB each. We removed bot votes while keeping other votes and relaunched poll using another platform, however without Julia and MATLAB - to avoid another attack. Final Julia and MATLAB results are estimated based on the valid votes in the first poll version.
- Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis
- Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis
- Top 13 Skills To Become a Rockstar Data Scientist
- The Most in Demand Skills for Data Scientists
- I wasn't getting hired as a Data Scientist. So I sought data on who is.