Most Demanded Data Science and Data Mining Skills

Our analysis of most demanded data scientist skills shows that Data Science is a team effort focused on business analytics, with top 5 platform skills being SQL, Python, R, SAS, and Hadoop.

This year saw continued strong demand for Data Scientists. Although "data scientist" Job Trends from indeed show the keyword peaking in 2013 and remaining flat in 2014, it may reflect changing titles and Data Scientists becoming more of a mainstream job.

Data Scientist job trends from, 2006-2014

Many articles have been written about data science skills - see also at the bottom of this post.

KDnuggets Job board has received the all-time high number of jobs in 2014 (about 245 so far), which is a sufficiently large sample to analyze and actually quantify the skills in-demand.

About 85% of the KDnuggets 2014 jobs are from US, but there are also jobs from 14 other countries: Canada, China, Estonia, Germany, India, Israel, Luxembourg, Malta, Portugal, Serbia, Singapore, Switzerland, the Netherlands, and the UK.

About 33% of the jobs had the title "Data Scientist", up from about 25% in 2013 and 19% in 2012.

The second most common job title was Engineer - with many different versions, such as BI Engineer, Machine Learning Engineer, Software Engineer, and more - see word cloud below.

KDnuggets 2014 Jobs Titles, word cloud

Next we looked at the skills most in demand by examining the most frequent keywords in job descriptions.

The most common were general terms:
  • team, in 88% of the job ads
  • business, 73%
  • analytics, 64%
  • design, 63%
  • development, 62%
  • statistics, 61%
  • statistical, 61%
  • research, 61%
  • machine learning, 53%
  • data mining, 52%
  • modeling, 49%
  • solutions, 47%

This suggests that a data scientist job is a team effort focused on business analytics, with research, design and development playing a major role. Statisics, Machine Learning, and Data Mining are used almost synonymously.

Looking at terms corresponding to more specific skills / languages we have
  • SQL, 54% of all job ads
  • Python, 46%
  • R, 44%
  • SAS, 36%
  • Hadoop, 35%
  • Java, 32%
  • optimization, 23%
  • C++, 21%
  • visualization, 20%
  • MATLAB, 18%
  • BI or Business Intelligence, 17%
  • distributed, 16%
  • regression, 16%
  • unstructured, 16%
  • Hive, 16%
  • mobile, 15%

NoSQL was mentioned in 11% of job ads.

The US map below shows the distribution of US-based jobs, with the size of the circle corresponding to the number of jobs and color corresponds to the log of ratio of SAS (blue) vs R (red) jobs.

KDnuggets US Jobs for 2014

The next figure shows the distribution of SAS vs R for US cities.

KDnuggets US Jobs for 2014, SAS vs R

We see that New York, San Diego, Rochester, Portland, and Dallas have more SAS jobs than R, while Seattle, Boston, Redmond, and San Francisco, CA are more R oriented. Chicago, Cupertino, and Palo Alto have about equal number of R and SAS jobs.

We also looked at interaction between top languages/systems, by measuring the

lift = actual number of jobs with the pair X,Y / expected number of jobs if X, Y were independently distributed.

We see that the strongest pairing is between R and Python (1.61 lift), but also between R and SAS. The only negative lift is between SAS and Hadoop - less likely to be required together.

Interaction between data science systems required in 2014 job ads

Finally, we looked at Education.

Almost all of the jobs required a graduate degree (Masters), and 48% of the jobs required or preferred a PhD.

What do you see as most-demanded skills for Data Scientists?