The first mention of the ‘data scientist’ profession brings us back to 2008. This is when Jeff Hammerbacher and DJ Patil built the first formal data science teams at Facebook and LinkedIn. They met to discuss how to call professionals dealing with this type of job, considering the rapid growth of the field.
Nowadays, the data scientist title has the hottest standing in the US. Still, there is no established approach to building up the skill set needed for this career.
The average data scientist is a male, whose median experience on the job is 2 years. His competency profile is dominated by the programming languages R and Python, followed closely by SQL. The typical data scientist graduated a second-tier academic degree. Approximately 75% of professionals hold a PhD (27%) or a Master’s (48%) degree. On average (median), they needed 4.5 years to get the title, which is remarkable for the level of seniority and responsibility this position entails.
To reach these conclusions, the study investigated 1,001 data scientist LinkedIn profiles. Unlike many previous publications, the data were not collected from job ads that present the employers’ point of view. Instead, the research relied on data posted by professionals on LinkedIn, which is a good proxy of their resumes.
Company and country quotas were assigned to limit bias. The cohort was divided into two groups depending on whether a person was employed by a Fortune 500 Company or not. In addition, the sample involved data scientists working in the US (40%), UK (30%), India (15%), and other countries (15%). Convenience sampling was employed due to data accessibility limitations.
Any research aiming to identify what it takes to become a data scientist should focus on technical skills. After exploring the top 3 tools each data scientist relies on, the results of the study confirm prior research. Programming languages are the main tool in the data scientist toolkit (as opposed to software like SPSS, SAS, and Tableau).
The traditional leaders Java and C/C++ continue to lose ground to R and Python. In fact, 53% of the data scientists in the sample have indicated R and/or Python as one of their top 3 skills. The sample size was insufficient to determine a clear winner, but the insight is clear – R and/or Python are currently a must for data scientists. SQL comes in third at 40%, while Matlab (19%), Java (18%), and C/C++ (18%) lag behind and their usage decreases as predicted by previous research.
SKILL SET BY INDUSTRY
It is much more intriguing to zoom into the popularity of the top programming languages by industry. The companies in the sample are organized into 4 clusters: Industrial (FMCG, Aerospace, Automotive, etc.), Healthcare, Financial, and Technology/IT.
Technology/IT is leading in adoption of Python and R. Sure enough, tech companies are the places where innovative practices are created and most widely implemented. In contrast, healthcare is least represented by the top 6 programming languages, as data scientists in that industry rely on tools like SPSS, SAS, and Tableau.
Interestingly, the Financial sector is the leader in R and Java. One explanation for that is the rigid attitude banking organizations have to innovation and the implementation of new technology. The fact that R and Java are relatively old languages further corroborates this interpretation.
Finally, the Industrial cluster is characterized by a lower usage of older languages and significant adoption of the top three coding languages (R, Python, and SQL).
INDUSTRY OF EMPLOYMENT
So, which industries are the biggest employers of data scientists? The data shows that most data science professionals are occupied in Technology/IT (42%). Following closely is the Industrial sector (37%), while the Financial and Healthcare industries account for 16% and 5%, respectively.
The study breaks down employment data to identify differences across countries. In the US, UK, and ‘Other countries’, the two most popular sectors are Industrial and Technology/IT.
A major outlier from this trend is India. Data scientists working in India are mostly employed by Tech/IT companies (68%). Previous research identifies a 100% year-on-year rise of data science job openings across IT companies in India, which is well-aligned with these findings.
Another interesting observation is that in the UK, the financial sector employs more data scientists compared to other countries. The City of London can be considered the financial capital of Europe (at least until Brexit occurs). Hence, it makes sense that the UK employs more data scientists due to the abundance of financial, trading, and brokerage firms.
One major tip for aspiring data scientists - consider fine-tuning your skill set according to the country where you would like to be employed.
For example, for a UK job, a good grasp of Python or R, for Finance will probably come in handy as nearly 20% of the data scientists in the UK are occupied in the Financial sector. In India, on the other hand, it is a good idea to follow the most contemporary programming tools - the tech industry dominates there.
This study sheds some additional light on what it takes to become a data scientist. Approaching the problem from the perspective of the data, the programming languages required seem clearer and clearer. Nonetheless, there are still significant differences across country and industry, which provides a variety of opportunities for aspiring data scientists.