Gold BlogTop Data Science and Machine Learning Methods Used in 2017

The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most "industrial" and most "academic".



Industry vs Academic Affinity

The next chart ranks all methods by their affinity to Industry vs Academia (defined as students + researchers combined) computed as
IndustryAffinity(Method) = Share(Method,Industry)/Share(Method,Academia) - 1


The most "industry" used methods are
  • Uplift modeling (for the second year in a row)
  • Anomaly/Deviation detection
  • Gradient Boosted Machines
The most "academic" methods are advanced topics related to Deep Learning:
  • Generative Adversarial Networks (GAN)
  • Reinforcement Learning
  • Recurrent Neural Networks (RNN)
  • Convolutional Nets
Poll 2017 Data Science Method Industry/Academia Affinity
Fig. 4: Data Science Methods and their Industry/Academia affinity

Bar width corresponds to share of usage. Color corresponds to Industry vs Academia affinity.

Finally, regional participation was:
  • Europe, 39%
  • US/Canada, 33%
  • Asia, 14%
  • Latin America, 6.0%
  • Australia/NZ, 4.8%
  • Africa/Middle East, 3.8%
The following table shows the data for all methods, sorted by overall share of usage.

The columns are:
  • Method: Data Science method
  • %Change 2017 vs 2016: how much the share of usage changed vs 2016 Poll
  • %Usage All: % of respondents who used this method
  • %Usage Industry: % of Industry respondents who used this method
  • %Usage Student: % of Student respondents who used this method
  • %Usage Researcher: % of Researcher respondents who used this method
Table 1: Data Science Methods usage
Method%Change
2017 vs
2016
% Usage
All
% Usage
Industry
% Usage
Student
% Usage
Researcher
Regression-2.4% 60.4%66.4%46.4%51.9%
Clustering5.2% 55.5%60.0%43.6%56.8%
Visualization9.2% 51.0%57.4%38.2%38.3%
Decision Trees/Rules-1.2% 50.8%57.0%28.2%50.6%
Random Forests31.7% 46.2%53.1%23.6%39.5%
Statistics - Descriptive2.0% 41.0%46.3%27.3%40.7%
K-nn-9.7% 38.9%40.6%37.3%34.6%
PCA-14% 34.7%36.5%35.5%33.3%
Text Mining-4.7% 31.8%36.5%21.8%22.2%
Time series-11.3% 30.5%35.6%19.1%34.6%
Ensemble methods-5.8% 29.9%31.7%11.8%38.3%
Support Vector Machine (SVM)-8.6% 28.7%27.3%22.7%44.4%
Boosting-20% 24.6%24.5%19.1%24.7%
Deep Learning20.1% 20.6%19.4%20.0%25.9%
Gradient Boosted Machinesnew 20.4%23.6%9.1%13.6%
Neural networks - not DL-8.9% 20.1%19.7%24.5%16.0%
Bagging-2.7% 19.9%21.0%12.7%19.8%
Anomaly/Deviation detection5.7% 19.5%24.2%6.4%14.8%
Bayesian49.1% 17.5%16.8%11.8%23.5%
Optimization-26% 17.2%17.2%14.5%23.5%
Conv Netsnew 15.8%14.0%18.2%19.8%
Association rules7.7% 15.4%17.0%12.7%14.8%
Factor Analysis-6.5% 11.7%12.2%9.1%13.6%
Recurrent Neural Networks (RNN)new 10.5%9.2%14.5%9.9%
Survival Analysis13.5% 8.5%10.3%3.6%9.9%
Graph / Link / Social Network Analysis-42% 8.1%7.6%6.4%13.6%
Singular Value Decomposition (SVD)-48% 8.1%7.4%10.0%8.6%
Other methods40% 6.1%7.4%1.8%7.4%
Genetic algorithms/Evolutionary methods-42% 4.8%5.2%2.7%7.4%
Hidden Markov Models (HMM)new 4.6%4.6%3.6%6.2%
Reinforcement Learningnew 4.2%3.5%2.7%8.6%
EM-36% 4.1%4.4%3.6%6.2%
Uplift modeling0.3% 2.7%3.5%1.8%0.0%
Markov Logic Networksnew 2.5%2.6%1.8%2.5%
Generative Adversarial Networks (GAN)new 2.3%1.5%2.7%4.9%


Related: