Gold Blog, Sep 2017Python vs R – Who Is Really Ahead in Data Science, Machine Learning?

We examine Google Trends, job trends, and more and note that while Python has only a small advantage among current Data Science and Machine Learning related jobs, this advantage is likely to increase in the future.



My recent analysis of KDnuggets Poll results (Python overtakes R, becomes the leader in Data Science, Machine Learning platforms) has gathered a lot of attention and generated a tremendous number of comments, discussion, and inevitable critique from proponents of both languages.

Some have complained that the poll is not scientific and voters represent a self-selected sample. That is obviously true. But KDnuggets has conducted polls since 2001 and reaches a large audience of several hundred thousand visitors each month. In our experience KDnuggets polls have been a good indicator of trends and developments in Data Mining and Data Science. We tracked R vs Python debate for several years, so unlike other sites we can compare the latest poll results with several previous years.

Let's examine other measures of Python vs R popularity among Data Scientists.

First, we analyze Google Trends (this was also done by DSC after the publication of our poll results).

Python is a much more popular language overall, and it is IEEE Spectrum No. 1 language of 2017 (thanks to Martin Skarzynski @marskar for the link), so it is unfair to compare Python and R searches directly, but we can compare Google Trends for search terms "Python data science" vs "R data science".

Here is the chart since Jan 1, 2012. Note that if you select the range that includes full months, and start in 2012, then you get smoothed monthly trends, rather than more chaotic weekly trends.
Google Trends Python Data Science R 2012 2017
Fig. 1: Google Trends, Jan 2012 - Aug 2017, "Python data science" vs "R data science".

We note that R was slightly ahead in 2014 and 2015, as Data Science was gathering popularity, but "Python data science" searches moved ahead of "R data science" in late 2016 and are clearly ahead since January 2017.

Note: the statistics are the same regardless of how Data Science is capitalized: "Data Science" or "data science", but Google autocomplete suggests "data science" for both Python and R.

However, recently Machine Learning has become very popular - see my post Machine Learning overtaking Big Data? (May 2017), so let's examine Python vs R for "Machine Learning" in Google Trends.

Google Trends Python R Data Science Machine Learning 2012 2017
Fig. 2: Google Trends, Jan 2012 - Aug 2017, "Python Machine Learning", "R Machine Learning", "Python data science", and "R data science".

We see that "Python Machine Learning" is way ahead of "Python data science", and both are significantly ahead of "R data science" and "R Machine Learning".

Relative search volume for Aug 2017 is
  • Python Machine Learning: 100
  • Python data science: 49
  • R data science: 33
  • R Machine Learning: 32
(Note: while Google autocomplete suggests search term "Python data science", with lower-case "data science", it suggests Capitalized search term "Python Machine Learning". There is probably some deep meaning here ... )

Indeed Data Scientist Jobs, Python and R, 2017
Fig. 3: Snapshot of indeed.com Data Scientist job ads in USA that also include Python and/or R, Sep 2017
Next, let's look at job ads on indeed.com. All numbers below are for jobs in USA as of Sep 11, 2017. We represent this relationship in a Venn Diagram on the right.

Indeed job trends below also show that demand for Data Scientists that know Python and those that know R has been very close until very recently, and these represent significant portion of all Data Scientist jobs.
Indeed Data Scientist Python R Job Trends
Fig. 4: Indeed "Data Scientist", "Data Scientist" Python, and "Data Scientist" R Job Trends, 2014-2017

These job ad counts suggest that current employers see most Data Scientists as able to use both Python and R as needed, but Python has a small advantage at the moment.

Google trend results suggest that Python advantage will grow and Python-related Data Science and Machine Learning jobs will grow faster than those related to R.

Note: with indeed.com you need to specify the search string carefully, and search for [Data Scientist Python] will include many jobs that have either Data or Scientist but not necessarily both.

Finally, among many comments on my original post Python overtakes R in Data Science I want to highlight two observations:
  • Stanislav Seltser notes that among top 15 languages on the github https://octoverse.github.com, Python is no. 3 while R is not on the list.
  • Stanislav also noted Kaggle 2016 Year Summary which says
    In past years, R was the language of choice on Kaggle, but 2016 has seen Python emerge as a clear winner when it came to the number of kernels written.
    Kaggle Python Vs R Kernels 2016


Related: