KDnuggets Home » Polls » What Analytics, Data Mining, Data Science software/tools you used in the past 12 months for a real project Poll (Jun 2014)

What Analytics, Data Mining, Data Science software/tools you used in the past 12 months for a real project Poll


 
  
The 15th annual KDnuggets Software Poll got huge attention from analytics and data mining community and vendors, attracting over 3,000 voters.

For full analysis and comments, see:
KDnuggets 2014 Software Poll: RapidMiner continues to lead.

Many vendors asked their users to vote in the poll, but RapidMiner was especially successful and had the most votes.
One vendor, alas, has created a special page hardcoded to vote only for their software. In a fair campaign, it is normal to advocate for your candidate, but it not OK to give voters a ballot with only one option. Voters should be able to consider all the choices. The invalid votes from this vendor were removed from the poll, leaving 3285 valid votes used for this analysis.

The average number of tools used was 3.7, significantly higher than 3.0 in 2013.

The boundary between commercial and free software is shrinking. (Note: since RapidMiner has introduced a commercial version relatively recently, we counted RapidMiner as a free software for the analysis below).

This year, 71% of voters used commercial software and 78% used free software. About 25% used only commercial software, down from 29% in 2013. About 28.5% used free-software only, slightly down from 30% in 2013. 49% used both free and commercial software, up from 41% in 2013.

About 17% of voters report using Hadoop or other Big data tools, compared to 14% in 2013 (and 3% in 2011).

This implies Big Data usage growth slowly, and still is primarily the domain of a select group of analysts in web giants, government agencies, and very large enterprises. Most data analysis is still done on "medium" and small data.

The top 10 tools by share of users were

  1. RapidMiner, 44.2% share (39.2% in 2013)
  2. R, 38.5% ( 37.4% in 2013)
  3. Excel, 25.8% ( 28.0% in 2013)
  4. SQL, 25.3% ( na in 2013)
  5. Python, 19.5% ( 13.3% in 2013)
  6. Weka, 17.0% ( 14.3% in 2013)
  7. KNIME, 15.0% ( 5.9% in 2013)
  8. Hadoop, 12.7% ( 9.3% in 2013)
  9. SAS base, 10.9% ( 10.7% in 2013)
  10. Microsoft SQL Server, 10.5% (7.0% in 2013)
Among tools with at least 2% share, the highest increase in 2014 was for
  • Alteryx, 1079% up, to 3.1% share in 2014, from 0.3% in 2013
  • SAP (including BusinessObjects/Sybase/Hana), 377% up, to 6.8% from 1.4%
  • BayesiaLab, 310% up, to 4.1% from 1.0%
  • KNIME, 156% up, to 15.0% from 5.9%
  • Oracle Data Miner, 117% up in 2014, to 2.2% from 1.0%
  • KXEN (now part of SAP), 104% up, to 3.8% from 1.9%
  • Revolution Analytics R, 102% up, to 9.1% from 4.5%
  • TIBCO Spotfire, up 100%, to 2.8%, from 1.4%
  • Salford SPM/CART/Random Forests/MARS/TreeNet, up 61%, to 3.6% from 2.2%
  • Microsoft SQL Server, up 50%, to 10.5% from 7.0%
Revolution Analytics, Salford Systems, and Microsoft SQL server have showed strong increases for 2 years in the row.

The growing analytics market was also reflected in more tools (over 70).
New analytics tools (not counting languages like Perl or SQL) that received at least 1% share in 2014 were

  • Pig 3.5%
  • Alpine Data Labs, 2.7%
  • Pentaho, 2.6%
  • Spark, 2.6%
  • Mahout, 2.5%
  • MLlib, 1.0%

Among tools with at least 2% share, the largest decline in 2014 was for

  • StatSoft Statistica (now part of Dell), down 81%, to 1.7% share in 2014, from 9.0% in 2013 (partly due to lack of campaigning for Statistica, now that it is part of Dell)
  • Stata, down 32%, to 1.4% from 2.1%
  • IBM Cognos, down 24%, to 1.8% from 2.4%
  • MATLAB, down 15%, to 8.4% from 9.9%

Statistica share has now declined for 2 years in a row (was 14% in 2012).

The following table shows results of the poll.
% alone is the percent of tool voters used only that tool alone. For example, just 1% of Python users have used only Python, while 35% of RapidMiner users indicated they used that tool alone.
For tools not included last year, there are no 2013 numbers.

What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? [3285 voters]
Legend: Red: Free/Open Source tools
Green: Commercial tools
% users in 2014
% users in 2013
RapidMiner (1453), 35.1% alone 44.2%
39.2%
R (1264), 2.1% alone 38.5%
37.4%
Excel (847), 0.1% alone 25.8%
28.0%
SQL (832), 0.1% alone 25.3%
na
Python (639), 0.9% alone 19.5%
13.3%
Weka (558), 0.4% alone 17.0%
14.3%
KNIME (492), 10.6% alone 15.0%
5.9%
Hadoop (416), 0% alone 12.7%
9.3%
SAS base (357), 0% alone 10.9%
10.7%
Microsoft SQL Server (344), 0% alone 10.5%
7.0%
Revolution Analytics R (300), 13.3% alone 9.1%
4.5%
Tableau (298), 1.3% alone 9.1%
6.3%
MATLAB (277), 0% alone 8.4%
9.9%
IBM SPSS Statistics (253), 0.4% alone 7.7%
8.7%
SAS Enterprise Miner (235), 1.3% alone 7.2%
5.9%
SAP (including BusinessObjects/Sybase/Hana) (225), 0% alone 6.8%
1.4%
Unix shell/awk/gawk (190), 0% alone 5.8%
na
IBM SPSS Modeler (187), 3.2% alone 5.7%
6.1%
Other free analytics/data mining tools (168), 1.8% alone 5.1%
3.4%
Rattle (161), 0% alone 4.9%
4.5%
BayesiaLab (136), 23.5% alone 4.1%
1.0%
Other Hadoop/HDFS-based tools (129), 0% alone 3.9%
na
Gnu Octave (128), 0% alone 3.9%
2.9%
JMP (125), 3.2% alone 3.8%
4.1%
KXEN (now part of SAP) (125), 0% alone 3.8%
1.9%
Predixion Software (122), 47.5% alone 3.7%
2.7%
Salford SPM/CART/Random Forests/MARS/TreeNet (118), 31.4% alone 3.6%
2.2%
Pig (116), 0% alone 3.5%
na
Orange (112), 0% alone 3.4%
3.6%
Alteryx (103), 50.5% alone 3.1%
0.3%
Perl (100), 2.0% alone 3.0%
na
Other languages for analytics (98), 0% alone 3.0%
na
QlikView (97), 1.0% alone 3.0%
2.4%
TIBCO Spotfire (91), 25.3% alone 2.8%
1.4%
Alpine Data Labs (88), 52.3% alone 2.7%
na
Pentaho (87), 0% alone 2.6%
na
Spark (87), 0% alone 2.6%
na
Mahout (81), 0% alone 2.5%
na
Mathematica (74), 0% alone 2.3%
2.1%
Oracle Data Miner (72), 5.6% alone 2.2%
1.0%
Other paid analytics/data mining/data science software (62), 0% alone 1.9%
2.4%
IBM Cognos (60), 0% alone 1.8%
2.4%
StatSoft Statistica (now part of Dell) (56), 14.3% alone 1.7%
9.0%
C4.5/C5.0/See5 (49), 0% alone 1.5%
1.1%
Stata (46), 0% alone 1.4%
2.1%
XLSTAT (38), 0% alone 1.2%
0.9%
MLlib (33), 0% alone 1.0%
na
Graphlab (29), 0% alone 0.9%
na
BigML (28), 14.3% alone 0.9%
na
Miner3D (28), 14.3% alone 0.9%
1.8%
Julia (27), 0% alone 0.8%
na
Datameer (26), 34.6% alone 0.8%
na
Zementis (26), 15.4% alone 0.8%
0.9%
Splunk/ Hunk (24), 0% alone 0.7%
na
F# (17), 5.9% alone 0.5%
0.7%
Clojure (16), 0% alone 0.5%
na
Actian (15), 0% alone 0.5%
na
RapidInsight/Veera (15), 0% alone 0.5%
0.5%
Angoss (13), 0% alone 0.4%
0.3%
Lisp (10), 0% alone 0.3%
na
Lavastorm (9), 0% alone 0.3%
0.4%
WPS: World Programming System (8), 0% alone 0.2%
na
FICO Model Builder (7), 0% alone 0.2%
na
WordStat (7), 0% alone 0.2%
0.4%
0xdata and H2O (5), 0% alone 0.2%
na
SciDB from Paradigm4 (5), 0% alone 0.2%
na
Megaputer Polyanalyst/TextAnalyst (4), 0% alone 0.1%
0.1%
SiSense (4), 50.0% alone 0.1%
na
GoodData (3), 0% alone 0.1%
na

KDnuggets Home » Polls » What Analytics, Data Mining, Data Science software/tools you used in the past 12 months for a real project Poll (Jun 2014)