Gold BlogTop Algorithms and Methods Used by Data Scientists

Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industry-oriented algorithms.

Regional distribution of poll participants.
  • US/Canada, 40%
  • Europe, 32%
  • Asia, 18%
  • Latin America, 5.0%
  • Africa/Middle East, 3.4%
  • Australia/NZ, 2.2%
As in 2011 poll, we combined Industry/Government in one group and Academic researchers/Students into a second group, and computed the "affinity" of the algorithm to Industry/Gov as
N(Alg,Ind_Gov) / N(Alg,Aca_Stu)
------------------------------- - 1
N(Ind_Gov) / N(Aca_Stu)

Thus algorithm with affinity 0 is used equally in Industry/Government and by Academic Researchers or students. The higher IG affinity the more "industrial" is the algorithms, and the lower it is the more "academic" is the algorithm.

The most "Industrial Algorithms" were:
  • Uplift modeling, 2.01
  • Anomaly Detection, 1.61
  • Survival Analysis, 1.39
  • Factor Analysis, 0.83
  • Time series/Sequences, 0.69
  • Association Rules, 0.5
While the uplift modeling was again the most "industrial algorithm", the surprising finding is that it is used by so few - only 3.1% - the lowest of any algorithm in this poll.

The most academic algorithms were
  • Neural networks - regular, -0.35
  • Naive Bayes, -0.35
  • SVM, -0.24
  • Deep Learning, -0.19
  • EM, -0.17
Next figure shows all the algorithms and their Industry/Academic affinity.

Poll Algorithms Affinity Industry Academia
Fig. 3. KDnuggets Poll: Top Algorithms used by Data Scientists: Industry vs Academia

Next table has the details on the algorithms, % respondents who used them in 2016 and 2011 Poll, change (%2016 / %2011 - 1), and Industry affinity as explained above.

Table 3: KDnuggets 2016 Poll: Algorithms Used by Data Scientists
Next table has the details on the algorithms, with columns
  • N: Rank according to share of usage
  • Algorithm: algorithm name,
  • Type: S - Supervised, U - Unsupervised, M - Meta, Z - Other,
  • % respondents who used this algorithm in 2016 Poll
  • % respondents who used this algorithm in 2011 Poll
  • change (%2016 / %2011 - 1), and
  • Industry affinity as explained above.
Table 4: KDnuggets 2016 Poll: Algorithms Used by Data Scientists
NAlgorithmType2016 % used2011 % used% ChangeIndustry Affinity
3Decision Trees/RulesS55%60%-7.3%0.21
5K-nearest neighborsS46%0.32
8Random ForestsS38%0.22
9Time series/Sequence analysisZ37%30%25.0%0.69
10Text MiningZ36%28%29.8%0.01
11Ensemble methodsM34%28%18.9%-0.17
14Neural networks - regularS24%27%-10.5%-0.35
16Naive BayesS24%22%8.9%-0.02
18Anomaly/Deviation detectionZ20%16%19%1.61
19Neural networks - Deep LearningS19%-0.35
20Singular Value DecompositionU16%0.29
21Association rulesZ15%29%-47%0.50
22Graph / Link / Social Network AnalysisZ15%14%8.0%-0.08
23Factor AnalysisU14%19%-23.8%0.14
24Bayesian networksS13%-0.10
25Genetic algorithmsZ8.8%9.3%-6.0%0.83
26Survival AnalysisZ7.9%9.3%-14.9%-0.15
28Other methodsZ4.6%-0.06
29Uplift modelingS3.1%4.8%-36.1%2.01