Top Data Science and Machine Learning Methods Used in 2017
The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most "industrial" and most "academic".
Industry vs Academic Affinity
The next chart ranks all methods by their affinity to Industry vs Academia (defined as students + researchers combined) computed asIndustryAffinity(Method) = Share(Method,Industry)/Share(Method,Academia) - 1
The most "industry" used methods are
- Uplift modeling (for the second year in a row)
- Anomaly/Deviation detection
- Gradient Boosted Machines
- Generative Adversarial Networks (GAN)
- Reinforcement Learning
- Recurrent Neural Networks (RNN)
- Convolutional Nets
Fig. 4: Data Science Methods and their Industry/Academia affinity
Bar width corresponds to share of usage. Color corresponds to Industry vs Academia affinity.
Finally, regional participation was:
- Europe, 39%
- US/Canada, 33%
- Asia, 14%
- Latin America, 6.0%
- Australia/NZ, 4.8%
- Africa/Middle East, 3.8%
The columns are:
- Method: Data Science method
- %Change 2017 vs 2016: how much the share of usage changed vs 2016 Poll
- %Usage All: % of respondents who used this method
- %Usage Industry: % of Industry respondents who used this method
- %Usage Student: % of Student respondents who used this method
- %Usage Researcher: % of Researcher respondents who used this method
Method | %Change 2017 vs 2016 | % Usage All | % Usage Industry | % Usage Student | % Usage Researcher |
---|---|---|---|---|---|
Regression | -2.4% | 60.4% | 66.4% | 46.4% | 51.9% |
Clustering | 5.2% | 55.5% | 60.0% | 43.6% | 56.8% |
Visualization | 9.2% | 51.0% | 57.4% | 38.2% | 38.3% |
Decision Trees/Rules | -1.2% | 50.8% | 57.0% | 28.2% | 50.6% |
Random Forests | 31.7% | 46.2% | 53.1% | 23.6% | 39.5% |
Statistics - Descriptive | 2.0% | 41.0% | 46.3% | 27.3% | 40.7% |
K-nn | -9.7% | 38.9% | 40.6% | 37.3% | 34.6% |
PCA | -14% | 34.7% | 36.5% | 35.5% | 33.3% |
Text Mining | -4.7% | 31.8% | 36.5% | 21.8% | 22.2% |
Time series | -11.3% | 30.5% | 35.6% | 19.1% | 34.6% |
Ensemble methods | -5.8% | 29.9% | 31.7% | 11.8% | 38.3% |
Support Vector Machine (SVM) | -8.6% | 28.7% | 27.3% | 22.7% | 44.4% |
Boosting | -20% | 24.6% | 24.5% | 19.1% | 24.7% |
Deep Learning | 20.1% | 20.6% | 19.4% | 20.0% | 25.9% |
Gradient Boosted Machines | new | 20.4% | 23.6% | 9.1% | 13.6% |
Neural networks - not DL | -8.9% | 20.1% | 19.7% | 24.5% | 16.0% |
Bagging | -2.7% | 19.9% | 21.0% | 12.7% | 19.8% |
Anomaly/Deviation detection | 5.7% | 19.5% | 24.2% | 6.4% | 14.8% |
Bayesian | 49.1% | 17.5% | 16.8% | 11.8% | 23.5% |
Optimization | -26% | 17.2% | 17.2% | 14.5% | 23.5% |
Conv Nets | new | 15.8% | 14.0% | 18.2% | 19.8% |
Association rules | 7.7% | 15.4% | 17.0% | 12.7% | 14.8% |
Factor Analysis | -6.5% | 11.7% | 12.2% | 9.1% | 13.6% |
Recurrent Neural Networks (RNN) | new | 10.5% | 9.2% | 14.5% | 9.9% |
Survival Analysis | 13.5% | 8.5% | 10.3% | 3.6% | 9.9% |
Graph / Link / Social Network Analysis | -42% | 8.1% | 7.6% | 6.4% | 13.6% |
Singular Value Decomposition (SVD) | -48% | 8.1% | 7.4% | 10.0% | 8.6% |
Other methods | 40% | 6.1% | 7.4% | 1.8% | 7.4% |
Genetic algorithms/Evolutionary methods | -42% | 4.8% | 5.2% | 2.7% | 7.4% |
Hidden Markov Models (HMM) | new | 4.6% | 4.6% | 3.6% | 6.2% |
Reinforcement Learning | new | 4.2% | 3.5% | 2.7% | 8.6% |
EM | -36% | 4.1% | 4.4% | 3.6% | 6.2% |
Uplift modeling | 0.3% | 2.7% | 3.5% | 1.8% | 0.0% |
Markov Logic Networks | new | 2.5% | 2.6% | 1.8% | 2.5% |
Generative Adversarial Networks (GAN) | new | 2.3% | 1.5% | 2.7% | 4.9% |
Related:
- Top Algorithms and Methods Used by Data Scientists, 2016 KDnuggets Poll
- Python overtakes R, becomes the leader in Data Science, Machine Learning platforms
- New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll