Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2017 » Dec » Opinions, Interviews » Top Data Science and Machine Learning Methods Used in 2017

Gold BlogTop Data Science and Machine Learning Methods Used in 2017


The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most "industrial" and most "academic".



Industry vs Academic Affinity

The next chart ranks all methods by their affinity to Industry vs Academia (defined as students + researchers combined) computed as
IndustryAffinity(Method) = Share(Method,Industry)/Share(Method,Academia) - 1


The most "industry" used methods are
  • Uplift modeling (for the second year in a row)
  • Anomaly/Deviation detection
  • Gradient Boosted Machines
The most "academic" methods are advanced topics related to Deep Learning:
  • Generative Adversarial Networks (GAN)
  • Reinforcement Learning
  • Recurrent Neural Networks (RNN)
  • Convolutional Nets
Poll 2017 Data Science Method Industry/Academia Affinity
Fig. 4: Data Science Methods and their Industry/Academia affinity

Bar width corresponds to share of usage. Color corresponds to Industry vs Academia affinity.

Finally, regional participation was:
  • Europe, 39%
  • US/Canada, 33%
  • Asia, 14%
  • Latin America, 6.0%
  • Australia/NZ, 4.8%
  • Africa/Middle East, 3.8%
The following table shows the data for all methods, sorted by overall share of usage.

The columns are:
  • Method: Data Science method
  • %Change 2017 vs 2016: how much the share of usage changed vs 2016 Poll
  • %Usage All: % of respondents who used this method
  • %Usage Industry: % of Industry respondents who used this method
  • %Usage Student: % of Student respondents who used this method
  • %Usage Researcher: % of Researcher respondents who used this method
Table 1: Data Science Methods usage
Method%Change
2017 vs
2016
% Usage
All
% Usage
Industry
% Usage
Student
% Usage
Researcher
Regression-2.4% 60.4%66.4%46.4%51.9%
Clustering5.2% 55.5%60.0%43.6%56.8%
Visualization9.2% 51.0%57.4%38.2%38.3%
Decision Trees/Rules-1.2% 50.8%57.0%28.2%50.6%
Random Forests31.7% 46.2%53.1%23.6%39.5%
Statistics - Descriptive2.0% 41.0%46.3%27.3%40.7%
K-nn-9.7% 38.9%40.6%37.3%34.6%
PCA-14% 34.7%36.5%35.5%33.3%
Text Mining-4.7% 31.8%36.5%21.8%22.2%
Time series-11.3% 30.5%35.6%19.1%34.6%
Ensemble methods-5.8% 29.9%31.7%11.8%38.3%
Support Vector Machine (SVM)-8.6% 28.7%27.3%22.7%44.4%
Boosting-20% 24.6%24.5%19.1%24.7%
Deep Learning20.1% 20.6%19.4%20.0%25.9%
Gradient Boosted Machinesnew 20.4%23.6%9.1%13.6%
Neural networks - not DL-8.9% 20.1%19.7%24.5%16.0%
Bagging-2.7% 19.9%21.0%12.7%19.8%
Anomaly/Deviation detection5.7% 19.5%24.2%6.4%14.8%
Bayesian49.1% 17.5%16.8%11.8%23.5%
Optimization-26% 17.2%17.2%14.5%23.5%
Conv Netsnew 15.8%14.0%18.2%19.8%
Association rules7.7% 15.4%17.0%12.7%14.8%
Factor Analysis-6.5% 11.7%12.2%9.1%13.6%
Recurrent Neural Networks (RNN)new 10.5%9.2%14.5%9.9%
Survival Analysis13.5% 8.5%10.3%3.6%9.9%
Graph / Link / Social Network Analysis-42% 8.1%7.6%6.4%13.6%
Singular Value Decomposition (SVD)-48% 8.1%7.4%10.0%8.6%
Other methods40% 6.1%7.4%1.8%7.4%
Genetic algorithms/Evolutionary methods-42% 4.8%5.2%2.7%7.4%
Hidden Markov Models (HMM)new 4.6%4.6%3.6%6.2%
Reinforcement Learningnew 4.2%3.5%2.7%8.6%
EM-36% 4.1%4.4%3.6%6.2%
Uplift modeling0.3% 2.7%3.5%1.8%0.0%
Markov Logic Networksnew 2.5%2.6%1.8%2.5%
Generative Adversarial Networks (GAN)new 2.3%1.5%2.7%4.9%


Related:


By subscribing you accept KDnuggets Privacy Policy