Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
1. Which skills / knowledge areas do you currently have (at the level you can use in work or research)? and
2. Which skills do you want to add or improve?
The classical Data Science Venn Diagram which Drew Conway proposed in 2013 has 3 main areas: Hacking (Programming), Math & Statistics, and Business/Domain Knowledge. However, the Data Science field has been evolving at such speed these 3 areas are no longer sufficient. Now Data Science includes additional areas, such as Deep Learning algorithms and Cloud Computing Platforms. More Math knowledge (especially Algebra and Calculus) is needed for Deep Learning. The COVID pandemic added the demand for Survival Analysis and Epidemiology. Deploying Data Science requires understanding of software development, DevOps, and using GitHub, Docker, and similar tools.
We reviewed many blogs and articles on Data Science skills, and updated and expanded the list of skills/knowledge areas from 30 items last year to 50 in this poll. To better organize this list, we divided it into 8 categories, adding 5 more to ones in Conway Venn diagram:
- Programming Languages : Python, R, Java, Java, C++, MATLAB, SAS, Scala, Julia
- Math & Stats: Algebra & Calculus, Probability & Stats, Survival Analysis, Epidemiology
- Business & Communication: Business Understanding, Critical Thinking, Communications Skills, Excel, Data Visualization, Tableau, PowerBI
- Data Science / ML Tools/Methods: Data Cleaning / Prep, ML Algorithms, Scikit-learn, Text Processing, XGBoost, Unstructured Data, Kaggle, Reinforcement Learning
- Software Development: Github, Software Engineering, Docker, DevOps, Kubernetes
- SQL / Databases: SQL/Database Coding, No-SQL Databases, Graph Databases
- Big Data / Cloud: AWS, Apache Spark, Dask, Microsoft Azure, Google Cloud, Hadoop, Other Big Data Tools, Other Cloud Computing Platforms
- Deep Learning: DL algorithms, Keras, NLP, TensorFlow, Computer Vision, PyTorch, Other DL frameworks
This poll received nearly 1000 votes. An average respondent had 16 skills (vs 10 in 2019) and wanted to add or improve 18 skills (vs 6.5 in 2019).
Fig. 1 below shows a radar chart of skills by categories, with blue line indicating skills respondents have and orange line indicating skills wanted. Since there are many entries in each category, we used the maximum percentage (the most popular entry) to represent that category.
Fig. 1: 8 Categories of Modern Data Science-related Skills, Have vs Want
We note that a typical Data Scientist does well on the first 6 of those categories: Programming, Math & Stats, Business & Comm, DS/ML Tools, SW Development, and SQL/Databases, with % Have ranging from 79% to 69%. Only 54% have Software development skills. Two most modern areas show gaps in the current skills, with % wanting the skill exceeding % having the skill - Big Data & Cloud and Deep Learning.
Table 1: % Have vs % Want by Category
|Category||Max %Have||Max %Want|
|Math & Stats||73%||39%|
|Business & Communication||72%||38%|
|Data Science & ML Tools/Methods||70%||52%|
|Big Data & Cloud||20%||49%|
Modern Data Science is not done by unicorns that have all needed skills, but by teams of people, and it would be useful to examine the skills of different job profiles - Researcher, Data Scientist, Data Analyst, Machine Learning Engineer, Business Analyst, etc. We leave this until a future time.
Next, we examine the popularity of individual Data Science skills in this poll.
Fig. 2: Modern Data Science-related Skills, Have vs Want
X-axis shows % Have Skill - answers to the first poll question, and Y-axis showing % Want Skill - answers to the 2nd poll question.
Shape represents the category - see below. Shape size is proportional to % of voters that have that skill. The color depends on the ratio of Want/Have: red is high - more than 1.2, grey is between 1.2 and 0.8, and blue is low - less than 0.8).
As in last year Data Science skills poll, we can see two main clusters.
Cluster 1, in blue dashed rectangle on the right side of the chart which includes all the skills that over 50% of all respondents have. The color of all shapes in this cluster is blue, indicating that Want/Have is less than 0.8. As last year, we call this set Core Data Science Skills. They are listed in Table 2.
Table 2: 13 Core Data Science Skills, in decreasing order of %Have
|Probability & Statistics||Math & Stats||73.4%||38.7%||0.53|
|Data Visualization||Business & Communication||71.6%||37.7%||0.53|
|Math (Algebra & Calculus)||Math & Stats||70.7%||28.6%||0.40|
|Critical Thinking||Business & Communication||70.3%||28.8%||0.41|
|Data Cleaning / Data Preparation / ETL||Data Science & ML Tools||70.1%||31.9%||0.45|
|Communications Skills||Business & Communication||69.4%||33.4%||0.48|
|Excel||Business & Communication||69.4%||15.0%||0.22|
|Machine Learning Techniques||Data Science & ML Tools||61.9%||42.2%||0.68|
|Business Understanding||Business & Communication||60.9%||34.9%||0.57|
|Scikit-learn||Data Science & ML Tools||52.3%||37.6%||0.72|
The core skills are almost the same as in 2019 poll, with two exceptions. R declined in popularity from 45% to 40% this year and was not included in core skills. One new skill was added: Github (not in 2019 poll).
The most common categories among core skills are Business & Communication (5) and Data Science & ML Tools (3).
The poll also allowed people to select both "have" and "want to add or improve" the skill (which explains why for some skills %Have + %Want > 100%). Among the core skills, the ones people most want to improve are
- Python, 33% of those that have it want to improve it
- Machine Learning Techniques, 33%
- Probability & Statistics, 31%
- Data Visualization, 30%
- Scikit-learn, 29%
- Excel, 11%
- SQL, 18%
The second cluster, on the left in Fig. 2, marked with a red border, includes skills that fewer people currently have (%Have < 30%) but more people want to add them, with %Want/%Have > 1.2, and with at least 15% of respondents wanting them.
We call them Hot / Emerging Data Science Skills, and they are listed in Table 3. We see that the hottest skills, with the highest percentage that want to learn them, are Reinforcement Learning, TensorFlow, Deep Learning Algorithms, and PyTorch.
Table 3: Hot / Emerging Modern Data Science Skills, in decreasing order of %Want
|Reinforcement Learning||Data Science & ML Tools||51.9%||13.8%||3.8|
|Deep Learning Algorithms||Deep Learning||50.8%||34.0%||1.5|
|AWS (Amazon Web Services)||Big Data & Cloud||48.8%||20.1%||2.4|
|Apache Spark||Big Data & Cloud||45.3%||17.8%||2.5|
|Computer Vision||Deep Learning||42.7%||20.7%||2.1|
|Unstructured Data||Data Science & ML Tools||40.8%||29.4%||1.4|
|Survival Analysis||Math & Stats||37.7%||19.8%||1.9|
|Google Cloud Computing||Big Data & Cloud||37.4%||14.7%||2.5|
|Microsoft Azure||Big Data & Cloud||37.3%||15.3%||2.4|
|Kaggle||Data Science & ML Tools||36.0%||25.9%||1.4|
|PowerBI||Business & Communication||33.6%||25.1%||1.3|
|Big Data Tools other than Hadoop or Spark||Big Data & Cloud||32.6%||9.5%||3.4|
|Hadoop||Big Data & Cloud||32.5%||13.1%||2.5|
|Other DL frameworks||Deep Learning||30.0%||6.0%||5.0|
|Epidemiology||Math & Stats||27.4%||8.2%||3.3|
|Dask||Big Data & Cloud||21.2%||3.4%||6.2|
|Other Cloud Computing Platforms||Big Data & Cloud||18.4%||4.4%||4.2|
The most common categories among emerging Data Science skills are:
- Big Data & Cloud, 8
- Deep Learning, 7
- Data Science & ML Tools 3
- Programming Lang, 3
- Software Development, 3
The remaining skills are those where the demand is not growing strongly Want/Have is < 1.2 and the current popularity is less than 50%. They can still be very useful for many areas. This group is shown in Table 4.
Table 4: Useful / Other Data Science Skills, in decreasing order of %Have
|R Language||Programming Lang||40.6%||34.8%||0.86|
|Text Processing||Data Science & ML Tools||37.5%||39.9%||1.1|
|Software Engineering||Software Development||33.9%||31.8%||0.94|
|Tableau||Business & Communication||31.8%||35.5%||1.1|
|XGBoost||Data Science & ML Tools||29.5%||34.6%||1.2|
Let us know what we missed and what you think - comment below!
- These Data Science Skills will be your Superpower
- Top 5 must-have Data Science skills for 2020
- Which Data Science Skills are core and which are hot/emerging ones?, 2019 KDnuggets Poll
- Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis
- Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis
- Top 13 Skills To Become a Rockstar Data Scientist
- The Most in Demand Skills for Data Scientists
- I wasn't getting hired as a Data Scientist. So I sought data on who is.