# Top Algorithms and Methods Used by Data Scientists

Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industry-oriented algorithms.

Pages: 1 2

- US/Canada, 40%
- Europe, 32%
- Asia, 18%
- Latin America, 5.0%
- Africa/Middle East, 3.4%
- Australia/NZ, 2.2%

N(Alg,Ind_Gov) / N(Alg,Aca_Stu)

------------------------------- - 1

N(Ind_Gov) / N(Aca_Stu)

Thus algorithm with affinity 0 is used equally in Industry/Government and by Academic Researchers or students. The higher IG affinity the more "industrial" is the algorithms, and the lower it is the more "academic" is the algorithm.

The most "Industrial Algorithms" were:

- Uplift modeling, 2.01
- Anomaly Detection, 1.61
- Survival Analysis, 1.39
- Factor Analysis, 0.83
- Time series/Sequences, 0.69
- Association Rules, 0.5

The most academic algorithms were

- Neural networks - regular, -0.35
- Naive Bayes, -0.35
- SVM, -0.24
- Deep Learning, -0.19
- EM, -0.17

**Fig. 3. KDnuggets Poll: Top Algorithms used by Data Scientists: Industry vs Academia**

Next table has the details on the algorithms, % respondents who used them in 2016 and 2011 Poll, change (%2016 / %2011 - 1), and Industry affinity as explained above.

**Table 3: KDnuggets 2016 Poll: Algorithms Used by Data Scientists**

Next table has the details on the algorithms, with columns

- N: Rank according to share of usage
- Algorithm: algorithm name,
- Type: S - Supervised, U - Unsupervised, M - Meta, Z - Other,
- % respondents who used this algorithm in 2016 Poll
- % respondents who used this algorithm in 2011 Poll
- change (%2016 / %2011 - 1), and
- Industry affinity as explained above.

**Table 4: KDnuggets 2016 Poll: Algorithms Used by Data Scientists**

N | Algorithm | Type | 2016 % used | 2011 % used | % Change | Industry Affinity |
---|---|---|---|---|---|---|

1 | Regression | S | 67% | 58% | 16% | 0.21 |

2 | Clustering | U | 57% | 52% | 8.7% | 0.05 |

3 | Decision Trees/Rules | S | 55% | 60% | -7.3% | 0.21 |

4 | Visualization | Z | 49% | 38% | 27% | 0.44 |

5 | K-nearest neighbors | S | 46% | 0.32 | ||

6 | PCA | U | 43% | 0.02 | ||

7 | Statistics | Z | 43% | 48% | -11.0% | 1.39 |

8 | Random Forests | S | 38% | 0.22 | ||

9 | Time series/Sequence analysis | Z | 37% | 30% | 25.0% | 0.69 |

10 | Text Mining | Z | 36% | 28% | 29.8% | 0.01 |

11 | Ensemble methods | M | 34% | 28% | 18.9% | -0.17 |

12 | SVM | S | 34% | 29% | 17.6% | -0.24 |

13 | Boosting | M | 33% | 23% | 40% | 0.24 |

14 | Neural networks - regular | S | 24% | 27% | -10.5% | -0.35 |

15 | Optimization | Z | 24% | 0.07 | ||

16 | Naive Bayes | S | 24% | 22% | 8.9% | -0.02 |

17 | Bagging | M | 22% | 20% | 8.8% | 0.02 |

18 | Anomaly/Deviation detection | Z | 20% | 16% | 19% | 1.61 |

19 | Neural networks - Deep Learning | S | 19% | -0.35 | ||

20 | Singular Value Decomposition | U | 16% | 0.29 | ||

21 | Association rules | Z | 15% | 29% | -47% | 0.50 |

22 | Graph / Link / Social Network Analysis | Z | 15% | 14% | 8.0% | -0.08 |

23 | Factor Analysis | U | 14% | 19% | -23.8% | 0.14 |

24 | Bayesian networks | S | 13% | -0.10 | ||

25 | Genetic algorithms | Z | 8.8% | 9.3% | -6.0% | 0.83 |

26 | Survival Analysis | Z | 7.9% | 9.3% | -14.9% | -0.15 |

27 | EM | U | 6.6% | -0.19 | ||

28 | Other methods | Z | 4.6% | -0.06 | ||

29 | Uplift modeling | S | 3.1% | 4.8% | -36.1% | 2.01 |

**Related:**

**The 10 Algorithms Machine Learning Engineers Need to Know****10 Algorithm Categories for A.I., Big Data, and Data Science****Why Implement Machine Learning Algorithms From Scratch?**

Pages: 1 2