KDnuggets Home » News » 2017 » May » News, Features » New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll ( 17:n20 )

Gold Blog, May 2017New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll


Python caught up with R and (barely) overtook it; Deep Learning usage surges to 32%; RapidMiner remains top general Data Science platform; Five languages of Data Science.



Full Results and 3-year trends


% alone is the percent of tool voters used only that tool alone, shown only for tools that have 5% or such votes. For example, 11.4% of RapidMiner users have used only RapidMiner.

What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? [2881 voters]
Tool (number of votes) % users in 2017
% users in 2016
% users in 2015
Python (1516) 52.6%
45.8%
30.3%
R language (1502) 52.1%
49.0%
46.9%
SQL language (1006) 34.9%
35.5%
30.9%
RapidMiner (946), 13.6% alone 32.8%
32.6%
31.5%
Excel (810) 28.1%
33.6%
22.9%
Spark (654) 22.7%
21.6%
11.3%
Anaconda (629) 21.8%
16.0%
na
Tensorflow (581) 20.2%
6.8%
na
scikit-learn (561) 19.5%
17.2%
8.3%
Tableau (560) 19.4%
18.5%
12.4%
KNIME (551) 19.1%
18.0%
20.0%
Hadoop: Open Source Tools (431) 15.0%
22.1%
18.4%
Java (399) 13.8%
16.8%
14.1%
Microsoft SQL Server (334) 11.6%
10.8%
9.7%
SQL on Hadoop tools (298) 10.3%
7.3%
7.2%
Microsoft Power BI (295) 10.2%
5.6%
3.6%
Weka (281) 9.8%
10.9%
11.2%
Unix shell/awk/gawk (278) 9.6%
10.4%
8.0%
Keras (274) 9.5%
na
na
PyCharm (260) 9.0%
na
na
Dataiku (235), 12.8% alone 8.2%
7.8%
2.0%
Hadoop: Commercial Tools (218) 7.6%
na
na
Scala (214) 7.4%
6.2%
3.5%
MATLAB (214) 7.4%
9.1%
8.8%
SAS Base (204) 7.1%
7.8%
11.3%
Other programming and data languages (196) 6.8%
6.8%
5.1%
IBM SPSS Statistics (196) 6.8%
8.4%
7.7%
Microsoft Azure Machine Learning (184) 6.4%
5.1%
3.7%
IBM SPSS Modeler (182) 6.3%
7.7%
7.1%
C/C++ (181) 6.3%
7.3%
9.4%
H2O.ai (179) 6.2%
6.7%
2.0%
Theano (167) 5.8%
5.1%
3.8%
SAS Enterprise Miner (162) 5.6%
5.6%
10.9%
Alteryx (152) 5.3%
3.0%
5.6%
Other free analytics/data mining tools (139) 4.8%
6.8%
5.0%
Other Deep Learning Tools (138) 4.8%
3.7%
3.8%
MLlib (130) 4.5%
11.6%
3.3%
Microsoft R Server (125) 4.3%
na
na
IBM Watson / Watson Analytics (125) 4.3%
4.2%
2.1%
QlikView (121) 4.2%
5.3%
4.2%
Orange (115), 6.1% alone 4.0%
3.1%
1.9%
Microsoft CNTK (98) 3.4%
0.9%
na
Caffe (89) 3.1%
2.3%
1.1%
IBM DSX (87), 6.9% alone 3.0%
na
na
PyTorch (86) 3.0%
na
na
Rattle (74) 2.6%
3.6%
4.2%
TIBCO Spotfire (72) 2.5%
2.8%
4.3%
Teradata (69) 2.4%
na
na
Gnu Octave (69) 2.4%
3.1%
2.3%
Other paid analytics/data mining/data science software (66) 2.3%
2.5%
2.4%
Microsoft other ML/Data Science tools (64) 2.2%
1.6%
na
DL4J (62) 2.2%
1.7%
0.4%
IBM Cognos (61) 2.1%
2.2%
1.8%
DataRobot (54), 9.3% alone 1.9%
0.5%
na
JMP (53) 1.8%
2.0%
3.1%
Pentaho (52) 1.8%
2.3%
2.7%
mxnet (51) 1.8%
0.6%
na
Oracle Adv. Analytics (51), 11.8% alone 1.8%
1.1%
0.8%
Amazon Machine Learning (49) 1.7%
1.9%
0.7%
Perl (49) 1.7%
2.3%
2.9%
Minitab (42) 1.5%
na
na
DataScience.com (40), 10.0% alone 1.4%
na
na
Mathematica (40) 1.4%
1.8%
1.9%
C4.5/C5.0/See5 (36) 1.2%
2.0%
1.3%
Torch (34) 1.2%
1.0%
1.0%
SAP HANA (34) 1.2%
1.2%
na
Stata (33) 1.1%
1.3%
1.3%
Julia (32) 1.1%
1.1%
1.1%
MicroStrategy (32) 1.1%
1.6%
0.9%
Vowpal Wabbit (32) 1.1%
1.6%
1.3%
SAP BusinessObjects Predictive Analytics (31), 6.5% alone 1.1%
1.5%
3.0%
Angoss (29), 34.5% alone 1.0%
0.1%
0.4%
BigML (29) 1.0%
0.9%
0.8%
Lasagne (27) 0.9%
na
na
XLMiner (19) 0.7%
1.2%
na
Domino Data Labs (18), 11.1% alone 0.6%
na
na
F# (16), 12.5% alone 0.6%
0.4%
0.7%
Quest (formerly Statistica/ Dell/ StatSoft) (13), 7.7% alone 0.5%
1.2%
1.7%
Lisp (11), 9.1% alone 0.4%
0.2%
0.4%
BayesiaLab (11) 0.4%
0.6%
0.6%
Salford SPM/CART/RF/MARS/TreeNet (11) 0.4%
3.5%
2.3%
Clojure (8) 0.3%
0.4%
0.5%
RapidInsight/Veera (7) 0.2%
3.0%
0.2%
FICO (6) 0.2%
0.2%
0.0%
Ontotext GraphDB (6) 0.2%
0.2%
0.0%
Ayasdi (5) 0.2%
0.3%
2.0%
Lavastorm (5) 0.2%
0.4%
0.4%
Turi (former Dato/GraphLab) (5) 0.2%
2.4%
0.5%
Alpine Data Labs (4) 0.1%
0.6%
0.5%
Birst (3) 0.1%
0.2%
0.1%
Skytree (3) 0.1%
0.3%
0.1%
Actian (3) 0.1%
0.3%
2.0%
Sisense (2) 0.1%
0.2%
0.2%


Comments

AM, Fair and Square
Unlike many other "review" sites, this is indeed a good way of seeing how a specific software/prog. language is positioned within the Data Science community. As I have witnessed for example some sites where you clearly see people giving 10/10 ratings on a tool, just to find out that they are working for that editor (doesn't make sense right?).
In my opinion, there isn't any "negative" bias as they are not asking for an opinion, or a grade or anything of that sort. The poll is simply asking if we have used a specific tool. As a matter of fact, I was redirected here from the specific RapidMiner mail from Ingo, which was totally non-directive and fine. I have simply stated the fact that I had used RapidMiner but also ended up in me specifying that I have used also others from the list like Hadoop and Qlik (for the sake of exemplifying my statements). (If you want to call that up-selling of votes go ahead ;)).

Also a number of votes is interesting - but moreover I thank Gregory for this poll because it allows us to see how the industry is evolving when we compare year-to-year polls.

JP, Vendor requests
I also got an email from RapidMiner to complete the survey. An other comment suggests this is "a fair request", but it's only fair if all vendors do it. It doesn't help that some "vendors" literally can't do it.
As a registered voter of this survey, I would be okay with a followup survey asking who contacted me to ask to fill out the survey.
Just a note, it's not necessarily a negative thing for a company to encourage users to vote. It could simply be a reflection that are tapped into the community and they are proud of their product. Both of those are good things.

Here are the results of the previous KDnuggets Polls on Analytics, Data Mining, Data Science Software: