The 2011 McKinsey report on Big Data said that “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data.”
Demand for data scientists is off the charts ... data science skills shortages are present in almost every large U.S. city. Nationally, we have a shortage of 151,717 people with data science skills, with particularly acute shortages in New York City (34,032 people), the San Francisco Bay Area (31,798 people), and Los Angeles (12,251 people).
Note that LinkedIn reports shortages for people with "Data Science Skills", not necessarily people with "Data Scientist" title.
We can estimate the demand for "Data Scientists" from two popular job search sites - indeed and Glassdoor.
Search on indeed.com for “data scientist” (in quotes) in USA finds only about 4,800 jobs.
Note: using quotes are important for searches on indeed. Search for data scientist without quotes finds about 30,000 jobs, but we are not sure how many of those jobs are for scientists in other areas.
US is the largest but not the only market for Data Scientists. We can also see strong demand for Data Scientists elsewhere, for example by checking regional indeed sites (indeed.co.uk, indeed.fr, indeed.de, indeed.co.in, etc)
UK: 1,100 jobs
France: 718 jobs
Germany: 900 jobs
India: 500 jobs
Glassdoor search for "Data Scientist" finds about 26000 jobs in USA (same results if quotes are removed).
2. How many "Data Scientists" are there?
Google search defines a data scientist as
“a person employed to analyze and interpret complex digital data, such as the usage statistics of a website, especially in order to assist a business in its decision-making.”
There are many people in the industry and academia who do this work without having the formal title of a data scientist, since Data Science is an interdisciplinary field at the intersection of Statistics, Computer Science, Machine Learning, and Business. We can estimate the current population of Data Scientist by examining popular data science platforms.
Kaggle (now part of Google) is a platform for data science and analytics competitions. It claims to be the world’s largest community of active data scientists. While not all Data Scientists take part in Kaggle competitions or have a Kaggle account, and not all Kagglers do work of data science, it is reasonable to assume a large overlap. In June 2017, the Kaggle community crossed 1 million members, and Kaggle email on Sep 19, 2018 says they surpassed 2 million members in August 2018. Since not all Kaggle members are active, Kaggle membership is probably a global upper bound for people engaged in data science.
KDnuggets is now reaching over 500,000 unique visitors per month, and given our focus on helping Data Scientists and Machine Learning Engineers to do their job better, we think it is also a reasonable estimate that the majority of our visitors work in Data Science / Machine Learning area, regardless of their job title. While visitors may stumble on KDnuggets randomly, we can look at subscribers / followers - a more active subset.
KDnuggets now has about 240,000 subscribers/followers over Twitter, LinkedIn, Facebook, RSS, and email, and while there is some overlap, about 200,000 seems a reasonable low bound for a number of Data Scientists globally.
On LinkedIn, there are many groups dedicated to data science, and although the engagement in those groups has been falling, we can use their membership as a rough estimate. Here are three of the largest groups
Examining the titles of members, we see great diversity. The titles include Data Scientist, Data Analyst , Statistician, Bioinformatician, Neuroscientist, Marketing executive, Computer scientist, etc... It is safe to say that any person who does the tasks that a conventional data scientist does can be considered in this category. With the growing need to analyze data to derive insights or make key decisions, people with traditionally different job titles and responsibilities are keen to learn new techniques of data analysis to suit their domains. This doesn’t make them a data scientist primarily but they do possess that knowledge and talent of the field.
Fig. 1: LinkedIn Data Scientist profile, by industry and by location.
Searching LinkedIn for “data scientist” (quotes are important) we find over 100,000 people with that actual title. So if globally between 200,000 and 1,000,000 people are doing some Data Science related work, then a majority of them does not have a Data Scientist title.
We can also estimate the size of larger data analysis/visualization/statistics community by looking at activities related to languages and platforms most connected to Data Science: R, Python, Machine Learning libraries, Spark, and Jupyter. Apache Spark Meetups had 225K members recently and growing every month. Intel Capital estimated that there 1 million R programmers worldwide. Based on the public data on python.org website, there have been around 2.75 million downloads. Jupyter project has around 3 million users at present. These numbers can give us a rough upper limit on the number of data analysts/data scientists around the world.
3. Future Prospects for Data Scientists
The near-term future for Data Scientists looks bright.
Fig. 2: Top 10 emerging jobs on LinkedIn and their growth from 2012 to 2017.
Job growth in the next decade is expected to outstrip growth during the previous decade, creating 11.5M jobs in the Data Science/Analytics area by 2026, according to the U.S. Bureau of Labor Statistics.
IBM recently claimed that by 2020 the number of Data Science and Analytics job listings is projected to grow by nearly 364,000 listings to approximately 2,720,000. No matter what the true number of data professionals out there currently, their number is likely to grow in the near future.
Long-term, however, automation will be replacing many jobs in the industry, and Data Scientist job will not be an exception. Already today companies like DataRobot and H2O offer automated solutions to Data Science problems.