Poll Results: Top Algorithms for Analytics/Data Mining

Latest KDnuggets Poll shows that Decision Trees, Regression, and Clustering are the top algorithms; Uplift modeling has the highest industry affinity. Only 14% have used Cloud analytics so far.

The latest KDnuggets Poll asked:
Which methods/algorithms did you use for data analysis in 2011?

The average number of algorithms per voter was 5.6.

The 10 most popular algorithms (by percent of voters who used that algorithm), are

Algorithm	Usage
Decision Trees/Rules (186)	59.8 %
Regression (180)	57.9 %
Clustering (163)	52.4 %
Statistics (descriptive) (149)	47.9 %
Visualization (119)	38.3 %
Time series/Sequence analysis (92)	29.6 %
Support Vector (SVM) (89)	28.6 %
Association rules (89)	28.6 %
Ensemble methods (88)	28.3 %
Text Mining (86)	27.7 %

Only 14% of voters used analytics in the cloud, Hadoop, EC2, etc in 2011.

Next table shows breakdown by employment type.

Employment type:	Percent all	Avg Num Algorithms
Industry analyst/consultant (172)	55.3%	6.3
Academic researcher (85)	27.3%	5.1
Student (37)	11.9%	4.3
Government/Other (17)	5.5%	5.0

We grouped Industry/Gov in one group and Academic researchers/Students into a second group, and computed the "affinity" of the algorithm to Industry/Gov as

N(Alg,Ind_Gov) / N(Alg,Aca_Stu)
----------------------------------
N(Ind_Gov) / N(Aca_Stu)

Thus algorithm with affinity 1.5 is used 50% more in Industry/Government than by Academic Researchers or students, and the algorithm with affinity 0.6 is used only 60% as much in Industry.

The most "industrial" algorithms ( with the highest Industry / Gov "affinity") are:

Uplift modeling, INF (no academic users)
Survival Analysis, 2.47
Regression, 2.00

The most "academic" algorithms ( with the lowest Industry / Gov "affinity") are:

Genetic algorithms, 0.60
Support Vector (SVM), 0.66
Association Rules, 0.83

Here are full results for KDnuggets 2011 Poll:
Which methods/algorithms did you use for data analysis in 2011?

Comments:

Jia Xin
The real 'top' algorithm is one that is 'garbish in, gold out'. I don't thank that has existed yet (let me just keep some open mind to our future and only look back).

GregoryPS
Garbage in still produces garbage out, most of the time !

Dr Jochen L Leidner
What about

Naive Bayes
HMMs
CRFs
TDF-IDF retrieval?

GregoryPS
CRF, TDF-IDF can be used for Text Mining, and that was used only by about a quarter of respondents. Bayesian algorithms were used a lot and that includes Naive Bayes - see all details at www.kdnuggets.com/polls/2011/algorithms-analytics-data-mining.html

JV
Do you have some statistics about Usage % of spatial algorithms and spetialy GKD algorithms?

GregoryPS
I did not have a special category for spatial algorithms - but it is a great idea for next time