Top Algorithms and Methods Used by Data Scientists
Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industryoriented algorithms.
Latest KDnuggets Poll asked
Which methods/algorithms you used in the past 12 months for an actual Data Sciencerelated application? .
Here are the results, based on 844 voters.
The top 10 algorithms (and methods) and their share of voters are:
Fig. 1: Top 10 algorithms & methods used by Data Scientists.
See full table of all algorithms and methods at the end of the post.
(Note: The goal of the poll was to find the top tools used by Data Scientists, but the word "tools" is ambiguous, so for simplicity I originally called this table top 10 "algorithms". Of course, as many of you justifiably pointed out, Statistics or Visualization ( and several other options) are not algorithms, but can be better described as methods or approaches. I stand corrected and renamed this post to "Top 10 algorithms and methods" .)
The average respondent used 8.1 algorithms/methods, a big increase vs a similar poll in 2011.
Comparing with 2011 Poll Algorithms for data analysis / data mining we note that the top methods are still Regression, Clustering, Decision Trees/Rules, and Visualization. The biggest relative increases, measured by (pct2016 /pct2011  1) are for
Table 1: Algorithm usage by Employment Type
We note that almost everyone uses supervised learning algorithms.
Government and Industry Data Scientists used more different types of algorithms than students or academic researchers,
and Industry Data Scientists were more likely to use Metaalgorithms.
Next, we analyzed the usage of top 10 algorithms + Deep Learning by employment type.
Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type
To make the differences easier to see, we compute the algorithm bias for a particular employment type relative to average algorithm usage as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All)  1.
Fig. 2: Algorithm usage bias by Employment.
We note that Industry Data Scientists are more likely to use Regression, Visualization, Statistics, Random Forests, and Time Series. Government/nonprofit are more likely to use Visualization, PCA, and Time Series. Academic researchers are more likely to use PCA and Deep Learning. Students generally use fewer algorithms, but do more text mining and Deep Learning.
Next, we look at regional participation which was representative of overall KDnuggets visitors.
Which methods/algorithms you used in the past 12 months for an actual Data Sciencerelated application? .
Here are the results, based on 844 voters.
The top 10 algorithms (and methods) and their share of voters are:
Fig. 1: Top 10 algorithms & methods used by Data Scientists.
See full table of all algorithms and methods at the end of the post.
(Note: The goal of the poll was to find the top tools used by Data Scientists, but the word "tools" is ambiguous, so for simplicity I originally called this table top 10 "algorithms". Of course, as many of you justifiably pointed out, Statistics or Visualization ( and several other options) are not algorithms, but can be better described as methods or approaches. I stand corrected and renamed this post to "Top 10 algorithms and methods" .)
The average respondent used 8.1 algorithms/methods, a big increase vs a similar poll in 2011.
Comparing with 2011 Poll Algorithms for data analysis / data mining we note that the top methods are still Regression, Clustering, Decision Trees/Rules, and Visualization. The biggest relative increases, measured by (pct2016 /pct2011  1) are for
 Boosting, up 40% to 32.8% share in 2016 from 23.5% share in 2011
 Text Mining, up 30% to 35.9% from 27.7%
 Visualization, up 27% to 48.7% from 38.3%
 Time series/Sequence analysis, up 25% to 37.0% from 29.6%
 Anomaly/Deviation detection, up 19% to 19.5% from 16.4%
 Ensemble methods, up 19% to 33.6% from 28.3%
 SVM, up 18% to 33.6% from 28.6%
 Regression, up 16% to 67.1% from 57.9%
 Knearest neighbors, 46% share
 PCA, 43%
 Random Forests, 38%
 Optimization, 24%
 Neural networks  Deep Learning, 19%
 Singular Value Decomposition, 16%
 Association rules, down 47% to 15.3% from 28.6%
 Uplift modeling, down 36% to 3.1% from 4.8% (that is a surprise, given strong results published)
 Factor Analysis, down 24% to 14.2% from 18.6%
 Survival Analysis, down 15% to 7.9% from 9.3%
Table 1: Algorithm usage by Employment Type
Employment Type  % Voters  Avg Num Algorithms Used  % Used Super vised 
% Used Unsuper vised  % Used Meta  % Used Other Methods 

Industry  59%  8.4  94%  81%  55%  83% 
Government/Nonprofit  4.1%  9.5  91%  89%  49%  89% 
Student  16%  8.1  94%  76%  47%  77% 
Academia  12%  7.2  95%  81%  44%  77% 
All  8.3  94%  82%  48%  81% 
We note that almost everyone uses supervised learning algorithms.
Government and Industry Data Scientists used more different types of algorithms than students or academic researchers,
and Industry Data Scientists were more likely to use Metaalgorithms.
Next, we analyzed the usage of top 10 algorithms + Deep Learning by employment type.
Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type
Algorithm  Industry  Government/Nonprofit  Academia  Student  All 

Regression  71%  63%  51%  64%  67% 
Clustering  58%  63%  51%  58%  57% 
Decision  59%  63%  38%  57%  55% 
Visualization  55%  71%  28%  47%  49% 
KNN  46%  54%  48%  47%  46% 
PCA  43%  57%  48%  40%  43% 
Statistics  47%  49%  37%  36%  43% 
Random Forests  40%  40%  29%  36%  38% 
Time series  42%  54%  26%  24%  37% 
Text Mining  36%  40%  33%  38%  36% 
Deep Learning  18%  9%  24%  19%  19% 
To make the differences easier to see, we compute the algorithm bias for a particular employment type relative to average algorithm usage as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All)  1.
Fig. 2: Algorithm usage bias by Employment.
We note that Industry Data Scientists are more likely to use Regression, Visualization, Statistics, Random Forests, and Time Series. Government/nonprofit are more likely to use Visualization, PCA, and Time Series. Academic researchers are more likely to use PCA and Deep Learning. Students generally use fewer algorithms, but do more text mining and Deep Learning.
Next, we look at regional participation which was representative of overall KDnuggets visitors.
Pages: 1 2
Top Stories Past 30 Days

