KDnuggets Home » News » 2014 » Jun » News, Features » KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead ( 14:n14 )

KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead

          

With over 3,000 data miners taking part in KDnuggets 15th Annual Software Poll, RapidMiner continues to lead. Free software is used much more outside US, and Hadoop usage grows fastest in Asia.

By Gregory Piatetsky, @kdnuggets, Jun 7, 2014.

The 15th annual KDnuggets Software Poll got huge attention from analytics and data mining community and vendors, attracting over 3,000 voters.

The poll measures both how widely a data mining tool is used, and, given increased popularity of KDnuggets, also how strongly the vendors advocate for their tool. Many vendors have asked their users to vote in this poll, but one vendor has created a special page hardcoded to vote only for their software. In a fair campaign, it is normal to advocate for your candidate, but it not OK to give voters a ballot with only one option. Voters should be able to consider all the choices. The invalid votes from this vendor were removed from the poll, leaving 3,285 valid votes used for this analysis.

We do have advertising from many of the vendors, but those vendors appear in the top, middle, and bottom parts of the poll, and advertising has absolutely no effect on poll results. We are more interested in overall trends that are revealed by this poll - see analysis below.

The average number of tools used was 3.7, significantly higher than 3.0 in 2013.

The separation between commercial and free software continues to shrink. (Note: since RapidMiner has introduced a commercial version relatively recently, we counted RapidMiner as a free software for the analysis below).

This year, 71% of voters used commercial software and 78% used free software. About 22% used only commercial software, down from 29% in 2013 (part of the changes was probably due to confusion between votes for RapidMiner commercial and free versions in 2013). About 28.5% used free-software only, slightly down from 30% in 2013. 49% used both free and commercial software, up from 41% in 2013.

About 17.5% of voters report using Hadoop or other Big data tools, a measurable increase from 14% in 2013 (was 15% in 2012, 3% in 2011).

This suggests that Big Data usage is growing slowly, and still is primarily the domain of a select group of analysts in web giants, government agencies, and very large enterprises. Most data analysis is still done on "medium" and small data.

The following word cloud represents the votes for tools.
Word Cloud for KDnuggets 2014 Software Poll The top 10 tools by share of users were

  1. RapidMiner, 44.2% share ( 39.2% in 2013)
  2. R, 38.5% ( 37.4% in 2013)
  3. Excel, 25.8% ( 28.0% in 2013)
  4. SQL, 25.3% ( na in 2013)
  5. Python, 19.5% ( 13.3% in 2013)
  6. Weka, 17.0% ( 14.3% in 2013)
  7. KNIME, 15.0% ( 5.9% in 2013)
  8. Hadoop, 12.7% ( 9.3% in 2013)
  9. SAS base, 10.9% ( 10.7% in 2013)
  10. Microsoft SQL Server, 10.5% (7.0% in 2013)

  Among tools with at least 2% share, the highest increase in 2014 was for
  • Alteryx, 1079% up, to 3.1% share in 2014, from 0.3% in 2013
  • SAP (including BusinessObjects/Sybase/Hana), 377% up, to 6.8% from 1.4%
  • BayesiaLab, 310% up, to 4.1% from 1.0%
  • KNIME, 156% up, to 15.0% from 5.9%
  • Oracle Data Miner, 117% up in 2014, to 2.2% from 1.0%
  • KXEN (now part of SAP), 104% up, to 3.8% from 1.9%
  • Revolution Analytics R, 102% up, to 9.1% from 4.5%
  • TIBCO Spotfire, up 100%, to 2.8%, from 1.4%
  • Salford SPM/CART/Random Forests/MARS/TreeNet, up 61%, to 3.6% from 2.2%
  • Microsoft SQL Server, up 50%, to 10.5% from 7.0%

 
Revolution Analytics, Salford Systems, and Microsoft SQL server have showed strong increases for 2 years in the row.

The growing analytics market was also reflected in more tools (over 70).
New analytics tools (not counting languages like Perl or SQL) that received at least 1% share in 2014 were
  • Pig 3.5%
  • Alpine Data Labs, 2.7%
  • Pentaho, 2.6%
  • Spark, 2.6%
  • Mahout, 2.5%
  • MLlib, 1.0%

 


Among tools with at least 2% share, the largest decline in 2014 was for
  • StatSoft Statistica (now part of Dell), down 81%, to 1.7% share in 2014, from 9.0% in 2013 (partly due to lack of campaigning for Statistica, now that it is part of Dell)
  • Stata, down 32%, to 1.4% from 2.1%
  • IBM Cognos, down 24%, to 1.8% from 2.4%
  • MATLAB, down 15%, to 8.4% from 9.9%

 
Statistica share has now declined for 2 years in a row (was 14% in 2012). With the recent acquisition by Dell, it seems likely that Statistica will continue to lose market and mind share.

The following table shows results of the poll, with Tool (User-votes), % alone.
% alone is the percent of tool voters used only that tool alone. For example, just 0.9% of Python users have used only Python, while 35.1% of RapidMiner users indicated they used that tool alone.

What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? [3285 voters]
Legend: Red: Free/Open Source tools
Green: Commercial tools
Fuchsia: Hadoop-related tools
% users in 2014
% users in 2013
na - not included in poll in 2013.
RapidMiner (1453), 35.1% alone 44.2%
39.2%
R (1264), 2.1% alone 38.5%
37.4%
Excel (847), 0.1% alone 25.8%
28.0%
SQL (832), 0.1% alone 25.3%
na
Python (639), 0.9% alone 19.5%
13.3%
Weka (558), 0.4% alone 17.0%
14.3%
KNIME (492), 10.6% alone 15.0%
5.9%
Hadoop (416), 0% alone 12.7%
9.3%
SAS base (357), 0% alone 10.9%
10.7%
Microsoft SQL Server (344), 0% alone 10.5%
7.0%
Revolution Analytics R (300), 13.3% alone 9.1%
4.5%
Tableau (298), 1.3% alone 9.1%
6.3%
MATLAB (277), 0% alone 8.4%
9.9%
IBM SPSS Statistics (253), 0.4% alone 7.7%
8.7%
SAS Enterprise Miner (235), 1.3% alone 7.2%
5.9%
SAP (including BusinessObjects/Sybase/Hana) (225), 0% alone 6.8%
1.4%
Unix shell/awk/gawk (190), 0% alone 5.8%
na
IBM SPSS Modeler (187), 3.2% alone 5.7%
6.1%
Other free analytics/data mining tools (168), 1.8% alone 5.1%
3.4%
Rattle (161), 0% alone 4.9%
4.5%
BayesiaLab (136), 23.5% alone 4.1%
1.0%
Other Hadoop/HDFS-based tools (129), 0% alone 3.9%
na
Gnu Octave (128), 0% alone 3.9%
2.9%
JMP (125), 3.2% alone 3.8%
4.1%
KXEN (now part of SAP) (125), 0% alone 3.8%
1.9%
Predixion Software (122), 47.5% alone 3.7%
2.7%
Salford SPM/CART/Random Forests/MARS/TreeNet (118), 31.4% alone 3.6%
2.2%
Pig (116), 0% alone 3.5%
na
Orange (112), 0% alone 3.4%
3.6%
Alteryx (103), 50.5% alone 3.1%
0.3%
Perl (100), 2.0% alone 3.0%
na
Other languages for analytics (98), 0% alone 3.0%
na
QlikView (97), 1.0% alone 3.0%
2.4%
TIBCO Spotfire (91), 25.3% alone 2.8%
1.4%
Alpine Data Labs (88), 52.3% alone 2.7%
na
Pentaho (87), 0% alone 2.6%
na
Spark (87), 0% alone 2.6%
na
Mahout (81), 0% alone 2.5%
na
Mathematica (74), 0% alone 2.3%
2.1%
Oracle Data Miner (72), 5.6% alone 2.2%
1.0%
Other paid analytics/data mining/data science software (62), 0% alone 1.9%
2.4%
IBM Cognos (60), 0% alone 1.8%
2.4%
StatSoft Statistica (now part of Dell) (56), 14.3% alone 1.7%
9.0%
C4.5/C5.0/See5 (49), 0% alone 1.5%
1.1%
Stata (46), 0% alone 1.4%
2.1%
XLSTAT (38), 0% alone 1.2%
0.9%
MLlib (33), 0% alone 1.0%
na
Graphlab (29), 0% alone 0.9%
na
BigML (28), 14.3% alone 0.9%
na
Miner3D (28), 14.3% alone 0.9%
1.8%
Julia (27), 0% alone 0.8%
na
Datameer (26), 34.6% alone 0.8%
na
Zementis (26), 15.4% alone 0.8%
0.9%
Splunk/ Hunk (24), 0% alone 0.7%
na
F# (17), 5.9% alone 0.5%
0.7%
Clojure (16), 0% alone 0.5%
na
Actian (15), 0% alone 0.5%
na
RapidInsight/Veera (15), 0% alone 0.5%
0.5%
Angoss (13), 0% alone 0.4%
0.3%
Lisp (10), 0% alone 0.3%
na
Lavastorm (9), 0% alone 0.3%
0.4%
WPS: World Programming System (8), 0% alone 0.2%
na
FICO Model Builder (7), 0% alone 0.2%
na
WordStat (7), 0% alone 0.2%
0.4%
0xdata and H2O (5), 0% alone 0.2%
na
SciDB from Paradigm4 (5), 0% alone 0.2%
na
Megaputer Polyanalyst/TextAnalyst (4), 0% alone 0.1%
0.1%
SiSense (4), 50.0% alone 0.1%
na
GoodData (3), 0% alone 0.1%
na


Additional tools not in this poll but mentioned in comments were
 
The following table shows breakdown by region and tool type: commercial/free/both. There were only about 12 people who only used Hadoop tools, and they are excluded from analysis below.

While the share of analysts who used both free and commercial tools was about 50% in all regions, US was the only region where more data miners used only commercial tools than only free tools (2/1 ratio). In Europe, Asia, and Latin America, the ratio was reversed, with 2-4 times as many data miners using only free tools vs only commercial tools.
Region, Avg N tools % users who only use commercial tools
% users who only use free tools
% users who use both
US/Canada (39%), 3.8
33%                    17%         49%
Europe (36%), 3.5
14%              38%                 47%
Asia (12%), 4.2
15%              31%                 53%
Latin America (6.1%), 3.8
8.5%                39%                        53%
Africa/MidEast (3.2%), 3.6
12%                    40%              48%
Australia/NZ (3.0%), 4.2
23%              19%                 58%


We also examined the use of Hadoop-related tools (including Spark) across regions, and note that Hadoop usage growth fastest outside US, especially in Asia.

Region% using Hadoop-related
tools in 2014
% using Hadoop-related
tools in 2013
US/Canada 18%17%
Europe 13% 12%
Asia 31%19%
Latin America 16%8%
Africa/MidEast 17%11%
Australia/NZ 12%9%
All 18%14%


Here is additional analysis of KDnuggets Software Poll, including how to download anonymized poll data.

Related:
 

KDnuggets Home » News » 2014 » Jun » News, Features » KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead ( 14:n14 )