Analyzing the Analyzers – A Survey of Data Scientists – free ebook

Interesting patterns emerge from clustering their skills, activities, education, and self-identification. The report combines analytics results with insights and argues for the clearer communication around roles, teams, and careers.

Analyzing the AnalyzersAnalyzing the Analyzers: An Introspective Survey of Data Scientists and their Work is the result of applying the methods of data science to our own professional community. It is freely available from O'Reilly publishers and was written by Harlan D. Harris, Sean Murphy and Marck Vaisman who run professional Meetup groups for statistical and analytics professionals in the Washington, DC area, and is based on a survey of several hundred data science professionals.

The report applied the methods of data science to study the data scientists and find underlying explanatory structure in the results that would help to improve communication, expectations, and opportunities for and about data scientists.

The survey asked for skills which were mapped into five general areas: Business, ML/Big Data, Math/OR, and Statistics:

Skills Group

Each respondent's responses were "compressed" by replacing their eleven Self-ID ratings with their four Self-ID Group loadings, and similarly for Skills.

The clustering analysis identified four major categories of data scientists:

  • Data Businesspeople are the product and profit-focused data scientists. They're leaders, managers, and entrepreneurs, but with a technical bent. A common educational path is an engineering degree paired with an MBA.
  • Data Creatives are eclectic jacks-of-all-trades, able to work with a broad range of data and tools. They may think of themselves as artists or hackers, and excel at visualization and open source technologies.
  • Data Developers are focused on writing software to do analytic, statistical, and machine learning tasks, often in production environments. They often have computer science degrees, and often work with so-called "big data".
  • Data Researchers apply their scientific training, and the tools and techniques they learned in academia, to organizational data. They may have PhDs, and their creative applications of mathematical tools yields valuable insights and products.

Combining skills vwith self-id resulted in the following mosaic plot. We can observe significant correlations: data business people are strong on business (not surprising) , data researchers are weak on programming, data creative are strong all-around.

Skills vs Self-ID

Here is the blog post from the authors of the report

There's More Than One Kind of Data Scientist.

The ebook is available at