Gold BlogModern Data Science Skills: 8 Categories, Core Skills, and Hot Skills

We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.

The latest KDnuggets Poll was a follow-up on last year's very popular poll on Data Science Skills, and asked the same two questions:
1. Which skills / knowledge areas do you currently have (at the level you can use in work or research)? and
2. Which skills do you want to add or improve?

The classical Data Science Venn Diagram which Drew Conway proposed in 2013 has 3 main areas: Hacking (Programming), Math & Statistics, and Business/Domain Knowledge. However, the Data Science field has been evolving at such speed these 3 areas are no longer sufficient. Now Data Science includes additional areas, such as Deep Learning algorithms and Cloud Computing Platforms. More Math knowledge (especially Algebra and Calculus) is needed for Deep Learning. The COVID pandemic added the demand for Survival Analysis and Epidemiology. Deploying Data Science requires understanding of software development, DevOps, and using GitHub, Docker, and similar tools.

We reviewed many blogs and articles on Data Science skills, and updated and expanded the list of skills/knowledge areas from 30 items last year to 50 in this poll. To better organize this list, we divided it into 8 categories, adding 5 more to ones in Conway Venn diagram:
  • Programming Languages : Python, R, Java, Java, C++, MATLAB, SAS, Scala, Julia
  • Math & Stats: Algebra & Calculus, Probability & Stats, Survival Analysis, Epidemiology
  • Business & Communication: Business Understanding, Critical Thinking, Communications Skills, Excel, Data Visualization, Tableau, PowerBI
  • Data Science / ML Tools/Methods: Data Cleaning / Prep, ML Algorithms, Scikit-learn, Text Processing, XGBoost, Unstructured Data, Kaggle, Reinforcement Learning
  • Software Development: Github, Software Engineering, Docker, DevOps, Kubernetes
  • SQL / Databases: SQL/Database Coding, No-SQL Databases, Graph Databases
  • Big Data / Cloud: AWS, Apache Spark, Dask, Microsoft Azure, Google Cloud, Hadoop, Other Big Data Tools, Other Cloud Computing Platforms
  • Deep Learning: DL algorithms, Keras, NLP, TensorFlow, Computer Vision, PyTorch, Other DL frameworks
The above list and categorization are not complete or perfect, but they are a useful way to understand the current state of skills of Data Scientists, as the poll results show.

This poll received nearly 1000 votes. An average respondent had 16 skills (vs 10 in 2019) and wanted to add or improve 18 skills (vs 6.5 in 2019).

Fig. 1 below shows a radar chart of skills by categories, with blue line indicating skills respondents have and orange line indicating skills wanted. Since there are many entries in each category, we used the maximum percentage (the most popular entry) to represent that category.

Skill 2020 Radar 8cat
Fig. 1: 8 Categories of Modern Data Science-related Skills, Have vs Want

We note that a typical Data Scientist does well on the first 6 of those categories: Programming, Math & Stats, Business & Comm, DS/ML Tools, SW Development, and SQL/Databases, with % Have ranging from 79% to 69%. Only 54% have Software development skills. Two most modern areas show gaps in the current skills, with % wanting the skill exceeding % having the skill - Big Data & Cloud and Deep Learning.

Table 1: % Have vs % Want by Category
Category Max %HaveMax %Want
Programming Languages79%43%
Math & Stats73%39%
Business & Communication72%38%
Data Science & ML Tools/Methods70%52%
Software Development54%45%
Big Data & Cloud20%49%
Deep Learning34%51%

Modern Data Science is not done by unicorns that have all needed skills, but by teams of people, and it would be useful to examine the skills of different job profiles - Researcher, Data Scientist, Data Analyst, Machine Learning Engineer, Business Analyst, etc. We leave this until a future time.

Next, we examine the popularity of individual Data Science skills in this poll.

Data Science Skills Want vs Have
Fig. 2: Modern Data Science-related Skills, Have vs Want
X-axis shows % Have Skill - answers to the first poll question, and Y-axis showing % Want Skill - answers to the 2nd poll question.
Shape represents the category - see below. Shape size is proportional to % of voters that have that skill. The color depends on the ratio of Want/Have: red is high - more than 1.2, grey is between 1.2 and 0.8, and blue is low - less than 0.8).
Skill Category Shape

As in last year Data Science skills poll, we can see two main clusters.

Cluster 1, in blue dashed rectangle on the right side of the chart which includes all the skills that over 50% of all respondents have. The color of all shapes in this cluster is blue, indicating that Want/Have is less than 0.8. As last year, we call this set Core Data Science Skills. They are listed in Table 2.

Table 2: 13 Core Data Science Skills, in decreasing order of %Have
PythonProgramming Lang78.8%43.1%0.55
Probability & StatisticsMath & Stats73.4%38.7%0.53
Data VisualizationBusiness & Communication71.6%37.7%0.53
Math (Algebra & Calculus)Math & Stats70.7%28.6%0.40
Critical ThinkingBusiness & Communication70.3%28.8%0.41
Data Cleaning / Data Preparation / ETLData Science & ML Tools70.1%31.9%0.45
Communications SkillsBusiness & Communication69.4%33.4%0.48
ExcelBusiness & Communication69.4%15.0%0.22
Machine Learning TechniquesData Science & ML Tools61.9%42.2%0.68
Business UnderstandingBusiness & Communication60.9%34.9%0.57
GithubSoftware Development54.2%41.1%0.76
Scikit-learnData Science & ML Tools52.3%37.6%0.72

The core skills are almost the same as in 2019 poll, with two exceptions. R declined in popularity from 45% to 40% this year and was not included in core skills. One new skill was added: Github (not in 2019 poll).

The most common categories among core skills are Business & Communication (5) and Data Science & ML Tools (3).

The poll also allowed people to select both "have" and "want to add or improve" the skill (which explains why for some skills %Have + %Want > 100%). Among the core skills, the ones people most want to improve are
  • Python, 33% of those that have it want to improve it
  • Machine Learning Techniques, 33%
  • Probability & Statistics, 31%
  • Data Visualization, 30%
  • Scikit-learn, 29%
The skills with the lowest desire to improve them are
  • Excel, 11%
  • SQL, 18%

The second cluster, on the left in Fig. 2, marked with a red border, includes skills that fewer people currently have (%Have < 30%) but more people want to add them, with %Want/%Have > 1.2, and with at least 15% of respondents wanting them.
We call them Hot / Emerging Data Science Skills, and they are listed in Table 3. We see that the hottest skills, with the highest percentage that want to learn them, are Reinforcement Learning, TensorFlow, Deep Learning Algorithms, and PyTorch.

Table 3: Hot / Emerging Modern Data Science Skills, in decreasing order of %Want
Skill Category %Want %Have %Want/
Reinforcement LearningData Science & ML Tools51.9%13.8%3.8
TensorFlowDeep Learning51.2%26.0%2.0
Deep Learning AlgorithmsDeep Learning50.8%34.0%1.5
PyTorchDeep Learning50.1%12.5%4.0
AWS (Amazon Web Services)Big Data & Cloud48.8%20.1%2.4
NLPDeep Learning48.7%27.3%1.8
Apache SparkBig Data & Cloud45.3%17.8%2.5
DockerSoftware Development44.9%17.0%2.6
No-SQL DatabasesSQL/Databases43.0%25.5%1.7
Computer VisionDeep Learning42.7%20.7%2.1
KubernetesSoftware Development41.3%5.8%7.2
KerasDeep Learning41.1%28.2%1.5
Unstructured DataData Science & ML Tools40.8%29.4%1.4
Graph DatabasesSQL/Databases39.4%14.2%2.8
Survival AnalysisMath & Stats37.7%19.8%1.9
Google Cloud ComputingBig Data & Cloud37.4%14.7%2.5
Microsoft AzureBig Data & Cloud37.3%15.3%2.4
DevOpsSoftware Development36.2%14.9%2.4
KaggleData Science & ML Tools36.0%25.9%1.4
PowerBIBusiness & Communication33.6%25.1%1.3
Big Data Tools other than Hadoop or SparkBig Data & Cloud32.6%9.5%3.4
HadoopBig Data & Cloud32.5%13.1%2.5
Other DL frameworksDeep Learning30.0%6.0%5.0
JuliaProgramming Lang29.1%2.0%14.9
ScalaProgramming Lang28.4%5.9%4.8
EpidemiologyMath & Stats27.4%8.2%3.3
DaskBig Data & Cloud21.2%3.4%6.2
Other Cloud Computing PlatformsBig Data & Cloud18.4%4.4%4.2
SASProgramming Lang17.6%11.6%1.5

The most common categories among emerging Data Science skills are:
  • Big Data & Cloud, 8
  • Deep Learning, 7
  • Data Science & ML Tools 3
  • Programming Lang, 3
  • Software Development, 3

The remaining skills are those where the demand is not growing strongly Want/Have is < 1.2 and the current popularity is less than 50%. They can still be very useful for many areas. This group is shown in Table 4.

Table 4: Useful / Other Data Science Skills, in decreasing order of %Have
R LanguageProgramming Lang40.6%34.8%0.86
Text ProcessingData Science & ML Tools37.5%39.9%1.1
Software EngineeringSoftware Development33.9%31.8%0.94
TableauBusiness & Communication31.8%35.5%1.1
XGBoostData Science & ML Tools29.5%34.6%1.2
JavaProgramming Lang22.2%22.3%1.0
C++Programming Lang21.0%24.9%1.2
MATLABProgramming Lang18.8%16.1%0.86

Let us know what we missed and what you think - comment below!