*Latest KDnuggets Poll shows that Decision Trees, Regression, and Clustering are the top algorithms; Uplift modeling has the highest industry affinity. Only 14% have used Cloud analytics so far.*

The latest KDnuggets Poll asked:

**Which methods/algorithms did you use for data analysis in 2011?**

The average number of algorithms per voter was 5.6.

The 10 most popular algorithms (by percent of voters who used that algorithm), are

Algorithm | Usage |
---|---|

Decision Trees/Rules (186) | 59.8 % |

Regression (180) | 57.9 % |

Clustering (163) | 52.4 % |

Statistics (descriptive) (149) | 47.9 % |

Visualization (119) | 38.3 % |

Time series/Sequence analysis (92) | 29.6 % |

Support Vector (SVM) (89) | 28.6 % |

Association rules (89) | 28.6 % |

Ensemble methods (88) | 28.3 % |

Text Mining (86) | 27.7 % |

Only 14% of voters used analytics in the cloud, Hadoop, EC2, etc in 2011.

Next table shows breakdown by employment type.

Employment type: | Percent all | Avg Num Algorithms |

Industry analyst/consultant (172) | 55.3% | 6.3 |

Academic researcher (85) | 27.3% | 5.1 |

Student (37) | 11.9% | 4.3 |

Government/Other (17) | 5.5% | 5.0 |

We grouped Industry/Gov in one group and Academic researchers/Students into a second group,
and computed the "affinity" of the algorithm to Industry/Gov as

N(Alg,Ind_Gov) / N(Alg,Aca_Stu)

----------------------------------

N(Ind_Gov) / N(Aca_Stu)

Thus algorithm with affinity 1.5 is used 50% more in Industry/Government than by Academic Researchers or students, and the algorithm with affinity 0.6 is used only 60% as much in Industry.

The most "industrial" algorithms ( with the highest Industry / Gov "affinity") are:

- Uplift modeling, INF (no academic users)
- Survival Analysis, 2.47
- Regression, 2.00

The most "academic" algorithms ( with the lowest Industry / Gov "affinity") are:

- Genetic algorithms, 0.60
- Support Vector (SVM), 0.66
- Association Rules, 0.83

Here are full results for KDnuggets 2011 Poll:

**Which methods/algorithms did you use for data analysis in 2011?**

**Jia Xin**

The real 'top' algorithm is one that is 'garbish in, gold out'. I don't thank that has existed yet (let me just keep some open mind to our future and only look back).

**GregoryPS**

Garbage in still produces garbage out, most of the time !

**Dr Jochen L Leidner**

What about

- Naive Bayes
- HMMs
- CRFs
- TDF-IDF retrieval?

**GregoryPS**

CRF, TDF-IDF can be used for Text Mining, and that was used only by about a quarter of respondents. Bayesian algorithms were used a lot and that includes Naive Bayes - see all details at www.kdnuggets.com/polls/2011/algorithms-analytics-data-mining.html

**JV**

Do you have some statistics about Usage % of spatial algorithms and spetialy GKD algorithms?

**GregoryPS**

I did not have a special category for spatial algorithms - but it is a great idea for next time