R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.
The 17th annual KDnuggets Software Poll asked
The poll got tremendous participation from analytics and data science community and vendors, attracting 2,895 voters, who chose from a record number of 102 different tools. Here we give a first overview. See also a follow-up analysis here: What Big Data, Data Science, Deep Learning software goes together?
R remains the leading tool, with 49% share (up from 46.9% in 2015), but Python usage grew faster and it almost caught up to R with 45.8% share (up from 30.3%). RapidMiner remains the most popular general platform for data mining/data science, with about 33% share. Notable tools with the most growth in popularity include Dato, Dataiku, MLlib, H2O, Amazon Machine Learning, scikit-learn, and IBM Watson.
The increased choice of tools is reflected in wider usage. The average number of tools used was 6.0, vs 4.8 in 2015.
The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 (and 17% in 2014), driven by Apache Spark, MLlib (Spark Machine Learning Library) and H2O.
See also
Next table has the top 10 most popular tools in 2016 poll
In this table 2016 % share is % of voters who used this tool, % change is the change in share vs 2015 poll, and % alone is the percent of voters who used only the reported tool among all voters who used that tool. E.g. 4.4% of KNIME voters reported using only KNIME and nothing else. We note a decrease in such lone voting, with only 9 tools having 5% or more lone votes.
Fig 1: KDnuggets Analytics/Data Science 2016 Software Poll: top 10 most popular tools in 2016
Compared to 2015 KDnuggets Analytics/Data Science Poll results, the only newcomer in top 10 was scikit-learn, displacing SAS.
Tools with the highest growth (among tools with at least 15 users in 2015) were
This year, 86% of voters used commercial software and 75% used free software. About 25% used only commercial software, and 13% used only open source/free software. A majority of 61% used both free and commercial software, similar to 64% in 2015.
New (in this poll) tools that received at least 1% share votes in 2016 were
Here are the Big Data tools and their share in 2016, 2015, and %change.
Google Tensorflow jumped to first place, displacing last year leader Theano/Pylearn2 ecosystem.
Top tools:
Here are the programming languages sorted by popularity.
next page has the table with the full poll results, including 3-year share comparison for each tool.
What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months?
The poll got tremendous participation from analytics and data science community and vendors, attracting 2,895 voters, who chose from a record number of 102 different tools. Here we give a first overview. See also a follow-up analysis here: What Big Data, Data Science, Deep Learning software goes together?
R remains the leading tool, with 49% share (up from 46.9% in 2015), but Python usage grew faster and it almost caught up to R with 45.8% share (up from 30.3%). RapidMiner remains the most popular general platform for data mining/data science, with about 33% share. Notable tools with the most growth in popularity include Dato, Dataiku, MLlib, H2O, Amazon Machine Learning, scikit-learn, and IBM Watson.
The increased choice of tools is reflected in wider usage. The average number of tools used was 6.0, vs 4.8 in 2015.
The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 (and 17% in 2014), driven by Apache Spark, MLlib (Spark Machine Learning Library) and H2O.
See also
- KDnuggets interview with Spark Creator Matei Zaharia
- KDnuggets interview with Arno Candel, H2O.ai on How to Quick Start Deep Learning with H2O
Top Analytics/Data Science Tools
Next table has the top 10 most popular tools in 2016 poll
Tool | 2016 % share | % change | % alone |
---|---|---|---|
R | 49% | +4.5% | 1.4% |
Python | 45.8% | +51% | 0.1% |
SQL | 35.5% | +15% | 0% |
Excel | 33.6% | +47% | 0.2% |
RapidMiner | 32.6% | +3.5% | 11.7% |
Hadoop | 22.1% | +20% | 0% |
Spark | 21.6% | +91% | 0.2% |
Tableau | 18.5% | +49% | 0.2% |
KNIME | 18.0% | -10% | 4.4% |
scikit-learn | 17.2% | +107% | 0% |
In this table 2016 % share is % of voters who used this tool, % change is the change in share vs 2015 poll, and % alone is the percent of voters who used only the reported tool among all voters who used that tool. E.g. 4.4% of KNIME voters reported using only KNIME and nothing else. We note a decrease in such lone voting, with only 9 tools having 5% or more lone votes.
Fig 1: KDnuggets Analytics/Data Science 2016 Software Poll: top 10 most popular tools in 2016
Compared to 2015 KDnuggets Analytics/Data Science Poll results, the only newcomer in top 10 was scikit-learn, displacing SAS.
Tools with the highest growth (among tools with at least 15 users in 2015) were
Tool | % change | 2016 %share | 2015 %share |
---|---|---|---|
Dato | 377% | 2.4% | 0.5% |
Dataiku | 292% | 7.8% | 2.0% |
MLlib | 253% | 11.6% | 3.3% |
H2O | 233% | 6.7% | 2.0% |
Amazon Machine Learning | 171% | 1.9% | 0.7% |
scikit-learn | 107% | 17.2% | 8.3% |
IBM Watson | 99% | 4.2% | 2.1% |
Splunk/ Hunk | 98% | 2.2% | 1.1% |
Spark | 91% | 21.6% | 11.3% |
Scala | 79% | 6.2% | 3.5% |
This year, 86% of voters used commercial software and 75% used free software. About 25% used only commercial software, and 13% used only open source/free software. A majority of 61% used both free and commercial software, similar to 64% in 2015.
New (in this poll) tools that received at least 1% share votes in 2016 were
- Anaconda, 16%
- Microsoft other ML/Data Science tools, 1.6%
- SAP HANA, 1.2%
- XLMiner, 1.2%
- Ayasdi, down 85%, to 0.3% share from 2.0%
- Actian, down 83%, to 0.3% share from 2.0%
- Datameer, down 52%, to 0.4% share from 0.9%
- SAP Analytics, down 51%, to 1.5% share from 3.0%
- SAS Enterprise Miner, down 49%, to 5.6% from 10.9%
- Alteryx, down 46%, to 3.0% share from 5.6%
- F#, down 42%, to 0.4% share from 0.7%
- TIBCO Spotfire, down 36%, to 2.8% share from 4.3%
- JMP, down 36%, to 2.0% share from 3.1%
Hadoop/Big Data Tools
The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 and 17% in 2014), driven mainly by big growth in Apache Spark, MLlib (Spark Machine Learning Library) and H2O, which we included among Big Data tools.Here are the Big Data tools and their share in 2016, 2015, and %change.
Tool | 2016 %Share | 2015 %share | % change |
---|---|---|---|
Hadoop | 22.1% | 18.4% | +20.5% |
Spark | 21.6% | 11.3% | +91% |
Hive | 12.4% | 10.2% | +21.3% |
MLlib | 11.6% | 3.3% | +253% |
SQL on Hadoop tools | 7.3% | 7.2% | +1.6% |
H2O | 6.7% | 2.0% | +234% |
HBase | 5.5% | 4.6% | +18.6% |
Apache Pig | 4.6% | 5.4% | -16.1% |
Apache Mahout | 2.6% | 2.8% | -7.2% |
Dato | 2.4% | 0.5% | +338% |
Datameer | 0.4% | 0.9% | -52.3% |
Other Hadoop/HDFS-based tools | 4.9% | 4.5% | +7.5% |
Deep Learning Tools
For the second year KDnuggets poll include Deep Learning Tools. This year, 18% of voters used Deep Learning tools, doubling the 9% in 2015.Google Tensorflow jumped to first place, displacing last year leader Theano/Pylearn2 ecosystem.
Top tools:
- Tensorflow, 6.8%
- Theano ecosystem (including Pylearn2), 5.1%
- Caffe, 2.3%
- MATLAB Deep Learning Toolbox, 2.0%
- Deeplearning4j, 1.7%
- Torch, 1.0%
- Microsoft CNTK, 0.9%
- Cuda-convnet, 0.8%
- mxnet, 0.6%
- Convnet.js, 0.3%
- darch, 0.1%
- Nervana, 0.1%
- Veles, 0.1%
- Other Deep Learning Tools, 3.7%
Programming Languages
Python, Java, Unix tools, Scala grew in popularity, while C/C++, Perl, Julia, F#, Clojure, and Lisp declined.Here are the programming languages sorted by popularity.
- Python, 45.8% share (was 30.3%), 51% increase
- Java, 16.8% share (was 14.1%), 19% increase
- Unix shell/awk/gawk 10.4% share (was 8.0%), 30% increase
- C/C++, 7.3% share (was 9.4%), 23% decrease
- Other programming/data languages, 6.8% share (was 5.1%), 34.1% increase
- Scala, 6.2% share (was 3.5%), 79% increase
- Perl, 2.3% share (was 2.9%), 19% decrease
- Julia, 1.1% share (was 1.1%), 1.6% decrease
- F#, 0.4% share (was 0.7%), 41.8% decrease
- Clojure, 0.4% share (was 0.5%), 19.4% decrease
- Lisp, 0.2% share (was 0.4%), 33.3% decrease
next page has the table with the full poll results, including 3-year share comparison for each tool.