R, Python users show surprising stability, but strong regional differences
R remains the dominant language, with Python slowly catching up, and other languages shrinking. We also found surprising stability, with about 90% of R and Python users staying with that language, and strong regional differences.
The results of latest KDnuggets Poll: Your primary programming language for Analytics, Data Mining, Data Science tasks show a surprisingly high level of stability among users of R and Python, with interesting 2nd-level flows which we analyze below.
The 2015 KDnuggets Data Mining Software Poll indicated that data scientists use on average 4.8 tools, with R and Python being among most popular tools.
We also looked at Which Big Data, Data Mining, and Data Science Tools go together? and found that R and Python are used together 41% more frequently than indicated by chance.
In this poll: Primary programming language for Analytics, Data Mining, Data Science tasks: R, Python, or Other Poll, we wanted to find out which language was primary and whether there were significant changes
Fig. 1: Primary Analytics, Data Mining, Data Science Languages in 2014 and 2015.
Circle sizes correspond to percentage of voters who chose that language as primary in 2014. Among those who chose None in 2014, 55% switched to R, 15% to Python, 7.5% to Other, and 22.5% stayed with none, but the last 3 arrows were omitted from the graph for comprehensibility.
Compared to 2013 Poll Results: R has a big lead, but Python is gaining, the 2015 results show much higher stability - about 88% of R users in 2014 stayed with R and 91% stayed with Python. Percentage of primary R and primary Python users have grown, while percentage of users who chose Other or None have declined.
We do observe that Python has a stronger inflow than outflow from both R and Other language users.
regional participation was
- US/Canada 44%
- Europe, 35%
- Asia, 9.0%
- Latin America, 5.3%
- Australia/NZ, 3.5%
- Africa/Middle East, 2.9%
Fig. 2: R vs Python vs Other share by region in 2014 and 2015.
. Green arrows indicate significant increase, black arrows significant decline.
Looking by region, we note significant regional differences.
- in US, there was no change in R share (stayed at 48%), but significant increase in Python share: from 21% to 28%.
- in Europe, R share increased from 46% to 55%, Python share increased from 23% to 31%, while share of other tools dropped from 21% to 11%.
- in Asia, both R and Python share increased slightly, while other tools share stayed constant.
- in other regions (Africa, Middle East, Latin America, Australia/NZ) R share grew the most, from 49% to 63%, Python share stayed the same, and other tools share dropped.
Here is the table with votes.
|Your primary programming language for Analytics, Data Mining, Data Science tasks: [512 voters]|
|2015 primary programming language:|
|R (and its packages) (263)||51% (of 2015 votes)|
|Python (including scikit-learn and other libraries) (151)||29%|
|Other (Java, MATLAB, SAS, Scala, etc ) (89)||17%|
|2014 primary programming language:|
|R (and its packages) (237)||46% (of 2014 votes)|
|Python (including scikit-learn and other libraries) (117)||23%|
|Other (Java, MATLAB, SAS, Scala, etc ) (118)||23%|
- R vs Python for Data Science: The Winner is …
- R vs Python, why each is better
- The Grammar of Data Science: Python vs R