KDnuggets Home » Polls » Algorithms for Data Mining (Nov 2011)

Algorithms for data analysis / data mining


 
  
Which methods/algorithms did you use for data analysis in 2011? [311 voters]
Decision Trees/Rules (186) 59.8 %
Regression (180) 57.9 %
Clustering (163) 52.4 %
Statistics (descriptive) (149) 47.9 %
Visualization (119) 38.3 %
Time series/Sequence analysis (92) 29.6 %
Support Vector (SVM) (89) 28.6 %
Association rules (89) 28.6 %
Ensemble methods (88) 28.3 %
Text Mining (86) 27.7 %
Neural Nets (84) 27.0 %
Boosting (73) 23.5 %
Bayesian (68) 21.9 %
Bagging (63) 20.3 %
Factor Analysis (58) 18.7 %
Anomaly/Deviation detection (51) 16.4 %
Social Network Analysis (44) 14.2 %
Survival Analysis (29) 9.32 %
Genetic algorithms (29) 9.32 %
Uplift modeling (15) 4.82 %


Did you use analytics in the cloud, Hadoop, EC2, etc in 2011?
Yes  14%
No  86%


Employment type: Percent allAvg Num Algorithms
Industry analyst/consultant (172)  55.3% 6.3
Academic researcher (85)  27.3% 5.1
Student (37)  11.9% 4.3
Government/Other (17)  5.5% 5.0

Regional breakdown is

  1. US/Canada, 40.2%
  2. Europe, 37.6%
  3. Asia, 10.3%
  4. Latin America, 5.8%
  5. Africa/Middle East, 3.2%
  6. Australia/NZ 2.9%
We grouped Industry/Gov in one group and Academic researchers/Students into a second group, and computed the "affinity" of the algorithm to Industry/Gov as
N(Alg,Ind_Gov) / N(Alg,Aca_Stu)
----------------------------------
N(Ind_Gov) / N(Aca_Stu)
Thus algorithm with affinity 1.5 is used 50% more in Industry/Government than by Academic Researchers or students, and the algorithm with affinity 0.6 is used only 60% as much in Industry.

The most "industrial" algorithms ( with the highest Industry / Gov "affinity") are:

  1. Uplift modeling, INF (no academic users)
  2. Survival Analysis, 2.47
  3. Regression, 2.00

The most "academic" algorithms ( with the lowest Industry / Gov "affinity") are:

  1. Genetic algorithms, 0.60
  2. Support Vector (SVM), 0.66
  3. Association Rules, 0.83
The following table shows the algorithms ranked by Industry affinity (third column). Second column width shows is proportional to academic affinity (inverse of Industry affinity)
Algorithm Academic/ Student
Affinity
Industry / Gov
Affinity
Uplift modeling INF
Survival Analysis 2.47
Regression 2.00
Visualization 1.55
Statistics 1.54
Boosting 1.50
Time series/Sequence analysis 1.48
Bagging 1.39
Factor Analysis 1.32
Anomaly/Deviation detection 1.29
Text Mining 1.27
Decision Trees 1.20
Neural Nets 1.16
Clustering 1.14
Ensemble methods 1.08
Social Network Analysis 0.93
Bayesian 0.92
Association rules 0.83
Support Vector -SVM 0.66
Genetic algorithms 0.60

KDnuggets Home » Polls » Algorithms for Data Mining (Nov 2011)