Most Demanded Data Science and Data Mining Skills
Our analysis of most demanded data scientist skills shows that Data Science is a team effort focused on business analytics, with top 5 platform skills being SQL, Python, R, SAS, and Hadoop.
Many articles have been written about data science skills - see also at the bottom of this post.
KDnuggets Job board www.kdnuggets.com/jobs/ has received the all-time high number of jobs in 2014 (about 245 so far), which is a sufficiently large sample to analyze and actually quantify the skills in-demand.
About 85% of the KDnuggets 2014 jobs are from US, but there are also jobs from 14 other countries: Canada, China, Estonia, Germany, India, Israel, Luxembourg, Malta, Portugal, Serbia, Singapore, Switzerland, the Netherlands, and the UK.
About 33% of the jobs had the title "Data Scientist", up from about 25% in 2013 and 19% in 2012.
The second most common job title was Engineer - with many different versions, such as BI Engineer, Machine Learning Engineer, Software Engineer, and more - see word cloud below.
Next we looked at the skills most in demand by examining the most frequent keywords in job descriptions.
The most common were general terms:
- team, in 88% of the job ads
- business, 73%
- analytics, 64%
- design, 63%
- development, 62%
- statistics, 61%
- statistical, 61%
- research, 61%
- machine learning, 53%
- data mining, 52%
- modeling, 49%
- solutions, 47%
This suggests that a data scientist job is a team effort focused on business analytics, with research, design and development playing a major role. Statisics, Machine Learning, and Data Mining are used almost synonymously.
Looking at terms corresponding to more specific skills / languages we have
- SQL, 54% of all job ads
- Python, 46%
- R, 44%
- SAS, 36%
- Hadoop, 35%
- Java, 32%
- optimization, 23%
- C++, 21%
- visualization, 20%
- MATLAB, 18%
- BI or Business Intelligence, 17%
- distributed, 16%
- regression, 16%
- unstructured, 16%
- Hive, 16%
- mobile, 15%
NoSQL was mentioned in 11% of job ads.
The US map below shows the distribution of US-based jobs, with the size of the circle corresponding to the number of jobs and color corresponds to the log of ratio of SAS (blue) vs R (red) jobs.
The next figure shows the distribution of SAS vs R for US cities.
We see that New York, San Diego, Rochester, Portland, and Dallas have more SAS jobs than R, while Seattle, Boston, Redmond, and San Francisco, CA are more R oriented. Chicago, Cupertino, and Palo Alto have about equal number of R and SAS jobs.
We also looked at interaction between top languages/systems, by measuring the
lift = actual number of jobs with the pair X,Y / expected number of jobs if X, Y were independently distributed.
We see that the strongest pairing is between R and Python (1.61 lift), but also between R and SAS. The only negative lift is between SAS and Hadoop - less likely to be required together.
Finally, we looked at Education.
Almost all of the jobs required a graduate degree (Masters), and 48% of the jobs required or preferred a PhD.
What do you see as most-demanded skills for Data Scientists?
- 9 Must-Have Skills You Need to Become a Data Scientist
- Australia Analytics Professionals Skills and Salary Report
- Hiring Data Scientists: What to look for?
- 2015 Predictions - What's Next for Data Scientists?
- Data Science Skills and Business Problems