Rexer Analytics 2013 Data Miner Survey Highlights
Top 5 most used tools were R (used by 70% of data miners), IBM SPSS Statistics, Rapid Miner, SAS, and Weka, while STATISTICA, KNIME, SAS JMP, IBM SPSS Modeler, and RapidMiner had the the highest satisfaction. Big Data is actually used only in a small fraction of projects.
By Gregory Piatetsky, Oct 5, 2013.
Last week at Predictive Analytics World in Boston, Karl Rexer, the president of Rexer Analytics, presented the initial results of the very popular Data Miner Survey his company conducts since 2007. I attended his talk and he kindly shared his findings for publication in KDnuggets.
Full results will be published later in 2013, and results of all past surveys are freely available at www.rexeranalytics.com/ .
This was the 6th survey since 2007, and over 1,200 data miners from 75 countries have responded to 68 questions. The respondents breakdown by occupation was:
- 35%, Corporate
- 26%, Consultants
- 18%, Vendors
- 15%, Academics
- 6%, NGO/Government
While geographic distribution was
- 41% North America
- 41% Europe
- 11% Asia/Pacific
- 4% Central & South America
- 3% Middle East and Africa
Some the highlights from the survey
- Over 85% of data miners working in corporate and consulting settings foresee increases in the number of projects
- data miner job satisfaction is high, highest among vendors, lowest in government/NGO settings
- The most common self-descriptions were Data Scientist, Researcher, Data Analyst, and Business Analyst
The average data miner reports using 5 different software tools. The top 10 most used tools were R (used by 70% of data miners), IBM SPSS Statistics, Rapid Miner, SAS, Weka, Matlab, Microsoft SQL, IBM SPSS Modeler, SAS Enterprise Miner, and KNIME.
Here is the chart:
The top 10 tools with ranked by usage as the primary tool were:
- Rapid Miner
- IBM SPSS Modeler
- IBM SPSS Statistics
- SAS Enterprise Miner
The survey also measured tool satisfaction (with vendors excluded) and STATISTICA, KNIME, SAS JMP, IBM SPSS Modeler, and RapidMiner received the highest satisfaction ratings - see chart below.
The survey also looked at Big Data. While reported data volumes have increased in 2007, only about 8% work with really big data, over 100,000,000 records, vs. 7% in 2007. Only 13% report having an Active Big Data program.
Full results will be freely available at www.rexeranalytics.com/ .