KDnuggets Home » Polls » Data Mining Tools Used Poll (May 2009)

Data Mining Tools Used Poll


 
  
What data mining tools have you used for a real project (not just for evaluation) in the past 6 months? (364 voters)

SPSS PASW Modeler (formerly Clementine) (68 alone, 52 with other tools, 120 total)
RapidMiner (36 alone, 41 w. other tools, 77 total)
SAS (39 alone or with SAS EM; 36 with other tools, 75 total)
Excel (1 alone, 68 total)
SAS Enterprise Miner (39 alone or with SAS; 28 w/ other tools; 67 total)
R (2 alone, 51 total)
Your own code (3 alone, 44 total)
KXEN (25 alone, 31 total)
Weka (now Pentaho) (0 alone, 31 total)
MATLAB (0 alone, 26 total)
Other commercial tools (0 alone, 19 total)
KNIME (1 alone, 18 total)
Other free tools (0 alone, 15 total)
Microsoft SQL Server (1 alone, 15 total)
Zementis (5 alone, 13 total)
Oracle DM (0 alone, 9 total)
Statsoft Statistica (0 alone, 8 total)
Orange (0 alone, 5 total)
Salford CART, Mars, other (1 alone, 5 total)
C4.5/C5.0 (0 alone, 4 total)
Angoss (0 alone, 4 total)
Inference for R (0 alone, 3 total)
Viscovery (0 alone, 2 total)
Megaputer (0 alone, 2 total)
Insightful Miner/S-Plus (now TIBCO) (0 alone, 2 total)
Bayesia (1 alone, 2 total)
Thinkanalytics (1 total)
Miner3D (1 total)
Clario Analytics (1 total)

Comments Gregory Piatetsky-Shapiro, KDnuggets Editor
Votes from tool vendors were removed, to make the results more representative. Since in the past there was a very strong correlation between use of SPSS Clementine and SPSS statistics, there was no separate category for SPSS statistics this year.

Comparing with 2008 KDnuggets Poll on data mining tools/software used, the big changes are growth in SPSS, RapidMiner, and R.

Furthermore, some vendors were more enthusiastic than others in asking their users to vote. Thus to get a more representative picture, we will compare only the number of votes when the tool was not selected alone, since a great majority of data miners uses more than one tool.

We also limited the comparison to tools with at least 4 votes (in either year). We note that the number of voters was comparable (347 in 2008, 364 in 2009).

With these caveats, the biggest changes are

Commercial tools

  • SPSS PASW Modeler (formerly CLementine), up 148% (joint) (21 to 52)
  • Statsoft Statistica , up 60% (5 to 8)
  • SAS Enterprise Miner , up 56% (18 to 28)
  • Oracle DM , up 29% (7 to 9)
  • MATLAB , up 24% (21 to 26)
  • SAS ( 49 to 36), down 27%
  • Angoss ( 8 to 4), down 50%
  • Viscovery ( 4 to 2), down 50%
  • Insightful Miner/SPlus, (5 to 2), down 60%
  • Salford CART, MARS, TreeNet (38 to 4), down 89%
New tool with significant following in 2009: Zementis

Free / Open Source tools

  • RapidMiner ( 23 to 41), up 78%
  • Orange ( 3 to 5), up 67%
  • R ( 35 to 49), up 40%
  • C4.5/C50 ( 8 to 4), down 50%

Karl Rexer, Tool poll is missing some key tools
I want to point out that this year's KDnuggets tool poll does not list some key tools. This may lead to odd results and make it hard to compare to the results of tool polls from previous years.

- Most importantly, SPSS (now PASW Statistics) is not on the KDnuggets list. In the 2007 and 2008 Rexer Analytics 2008 Data Miner surveys, more data miners said they used SPSS than any other tool (45% in 2008), and it was the 5th most commonly cited "primary tool" in the 2008 survey.

- There are several other tools that are not on the KDnuggets tool list that were selected by more than 5% of respondents in the 2008 Rexer Analytics Data Miner Survey. They are: S-Plus, Teradata, Unica, Knowledge Miner, and Quadstone/Portrait Software.

More info about the annual Rexer Analytics Data Miner Surveys is available at http://www.rexeranalytics.com/Data-Miner-Survey-Results-2008.html.


KDnuggets Home » Polls » Data Mining Tools Used Poll (May 2009)