KDnuggets Home » Polls » Analytics, Data mining, Big Data software used (May 2012)

What Analytics, Data mining, Big Data software you used in the past 12 months for a real project?


 
  
The 13th annual KDnuggets Software Poll attracted excellent participation. For the first time, the number of users of free/open source software exceeded the number of users of commercial software.

Among voters 28% used commercial software but not free software, 30% used free software but not commercial, and 41% used both.

The usage of big data tools grew five-fold: 15% used them in 2012, vs about 3% in 2011.

R, Excel, and RapidMiner are the most popular tools, with Statsoft Statistica becoming the most popular commercial tool, getting more votes from SAS (in part due to more active campaign from Statsoft users, and lack of such campaign from SAS).

Among those who wrote analytics code in lower-level languages, R, SQL, Java, and Python were most popular.

This poll also had a very large number of participants and used email verification and other measures to remove unnatural votes (*see note below).


What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters]
Legend: Free/Open Source tools
Commercial tools
% users in 2012
% users in 2011
R (245) 30.7%
23.3%
Excel (238) 29.8%
21.8%
Rapid-I RapidMiner (213) 26.7%
27.7%
KNIME (174) 21.8%
12.1%
Weka / Pentaho (118) 14.8%
11.8%
StatSoft Statistica (112) 14.0%
8.5%
SAS (101) 12.7%
13.6%
Rapid-I RapidAnalytics (83) 10.4%
not asked in 2011
MATLAB (80) 10.0%
7.2%
IBM SPSS Statistics (62) 7.8%
7.2%
IBM SPSS Modeler (54) 6.8%
8.3%
SAS Enterprise Miner (46) 5.8%
7.1%
Orange (42) 5.3%
1.3%
Microsoft SQL Server (40) 5.0%
4.9%
Other free analytics/data mining software (39) 4.9%
4.1%
TIBCO Spotfire / S+ / Miner (37) 4.6%
1.7%
Oracle Data Miner (35) 4.4%
0.7%
Tableau (35) 4.4%
2.6%
JMP (32) 4.0%
5.7%
Other commercial analytics/data mining software (32) 4.0%
3.2%
Mathematica (23) 2.9%
1.6%
Miner3D (19) 2.4%
1.3%
IBM Cognos (16) 2.0%
not asked in 2011
Stata (15) 1.9%
0.8%
Bayesia (14) 1.8%
0.8%
KXEN (14) 1.8%
1.4%
Zementis (14) 1.8%
3.7%
C4.5/C5.0/See5 (13) 1.6%
1.9%
Revolution Computing (11) 1.4%
1.4%
Salford SPM/CART/MARS/TreeNet/RF (9) 1.1%
10.6%
Angoss (7) 0.9%
0.8%
SAP (including BusinessObjects/Sybase/Hana) (7) 0.9%
not asked in 2011
XLSTAT (7) 0.9%
0.9%
RapidInsight/Veera (5) 0.6%
not asked in 2011
11 Ants Analytics (4) 0.5%
5.6%
Teradata Miner (4) 0.5%
not asked in 2011
Predixion Software (3) 0.4%
0.5%
WordStat (3) 0.4%
0.5%

Among tools with at least 10 users, the tools with the highest increase in "usage percent" were

  • Oracle Data Miner, 4.4% in from 2012, up from 0.7% in 2011, 505% increase
  • Orange, 5.3% from 1.3%, 315% increase
  • TIBCO Spotfire / S+ / Miner, 4.6% from 1.7%, 169% increase
  • Stata, 1.9% from 0.8%, 130% increase
  • Bayesia, 1.8% from 0.8%, 115% increase

The three tools with highest decrease in usage percent were 11 Ants Analytics, Salford SPM/CART/MARS/TreeNet/RF, and Zementis. Their dramatic decrease is probably due to vendors doing much less (or nothing) to encourage their users to vote in 2012 as compared to 2011.

Note: 3 tools received less than 3 votes and were not included in this table: Clarabridge, Megaputer Polyanalyst/TextAnalyst, Grapheur/LIONsolver.

Big Data

Big data tools use grew 5-fold, from about 3% to about 15% of respondents.
Big Data software you used in the past 12 months
Apache Hadoop/Hbase/Pig/Hive (67) 8.4%
Amazon Web Services (AWS) (36) 4.5%
NoSQL databases (33) 4.1%
Other Big Data Data/Cloud analytics software (21) 2.6%
Other Hadoop-based tools (10) 1.3%

We also asked about the popularity of the individual languages for data mining. Note that we also included R in this table, as well as among higher-level tools

Your own code you used for analytics/data mining in the past 12 months in:
R (245) 30.7%
SQL (185) 23.2%
Java (138) 17.3%
Python (119) 14.9%
C/C++ (66) 8.3%
Other languages (57) 7.1%
Perl (37) 4.6%
Awk/Gawk/Shell (31) 3.9%
F# (5) 0.6%

For comparison here are the recent software polls:

Vote: cleaning: To reduce multiple voting this poll used email verification, which reduced the total number of votes compared to 2011, but made results more representative.
Furthermore, some vendors were much more active than others in recruiting their users, and to give a more objective picture of the tool popularity, a large number (over 100) of the "unnatural" votes were removed, leaving 798 votes. Decline in popularity of some tools, such as Salford and 11 Ants Analytics is probably due to these vendors being less active in 2012 than in 2011 in asking their users to vote.


KDnuggets Home » Polls » Analytics, Data mining, Big Data software used (May 2012)