Gold BlogTop Data Science and Machine Learning Methods Used in 2018, 2019

Once again, the most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests. The greatest relative increases this year are overwhelmingly Deep Learning techniques, while SVD, SVMs and Association Rules show the greatest decline.



In the latest KDnuggets poll, readers were asked:

Which Data Science / Machine Learning methods and algorithms did you use in 2018/2019 for a real-world application?

The results, based on 833 voters, show that the top 17 methods are the same as in the 2017 poll, although in slightly different order:

Figure

Fig. 1: Top Data Science, Machine Learning Methods Used, 2018/2019

 

The top 17 methods by respondent usage this year (spot #16 was a tie, by percentage) are:

  1. Regression
  2. Decision Trees / Rules
  3. Clustering
  4. Visualization
  5. Random Forests
  6. Statistics - Descriptive
  7. K-Nearest Neighbours
  8. Time Series
  9. Ensemble Methods
  10. Text Mining
  11. PCA
  12. Boosting
  13. Neural Networks - Deep Learning
  14. Gradient Boosted Machines
  15. Anomaly / Deviation Detection
  16. Neural Networks - Convolutional Neural Networks (CNNs)
  17. Support Vector Machine (SVM)

This, in turn, mirrors the results of the 2017 poll, which found that the top 10 methods remained unchanged from the 2016 poll (although, again, they were in a different order).

The average respondent used 7.4 methods/algorithms, which is in-line with both the 2017 and 2016 results.

Below is a comparison of the top methods and algorithms in this year's poll with their 2017 shares.

Figure

Fig. 2: Top Data Science, Machine Learning Methods Used, 2018/9 vs 2017

 

The most notable increases this year were found in the usage of various neural network technologies, including GANs, RNNs, CNNs, reinforcement learning, and vanilla deep neural networks. Genetic and evolutionary algorithms have also found an increase, along with the catch-all "Other Methods."

The methods and algorithms with the greatest relative increases measured by (share2019 / share2017 - 1) over 2017 are:

  • Generative Adversarial Networks (GAN), 101.8% up, from 2.3% share in 2017 to 4.7% share in 2018/9
  • Neural Networks - Recurrent Neural Networks (RNN), 56.5% up, from 10.5% to 16.5%
  • Reinforcement Learning, 56.1% up, from 4.2% to 6.6%
  • Neural Nets - Convolution, 38.8% up, from 15.8% to 22%
  • Other Methods, 27.1% up, from 6.1% to 7.8%
  • Genetic / Evolutionary Algorithms & Methods, 25.7% up, from 4.8% to 6.0%
  • Neural Networks - Deep Learning, 23.5% up, from 20.6% to 25%

The greatest declines this year were:

  • SVD, 25.4% down, from 8.1% share in 2017 to 6.0% share in 2018/9
  • Support Vector Machine (SVM), 23.3% down, from 28.7% to 22%
  • Factor Analysis, 21.2% down, from 11.7% to 9.3%
  • Association Rules, 20.6% down, from 15.4% to 12.3%
  • PCA, 19.3% down, from 34.7% to 28%
  • Survival Analysis, 19.1% down, from 8.5% to 6.9%

 

Affiliation

 
Participation by affiliation was:

  • Industry / Self employed, 64.9% (541 responses), 7.8 avg. tools used
  • Student, 15.8% (132 responses), 6.1 avg. tools used
  • Academia, 9.2% (77 responses), 7.9 avg. tools used
  • Government / Non-profit, 7.3% (61 responses), 5.8 avg. tools used

Note that only 22 respondents (<3%) selected "Other", 7.5 avg. tools used.
 

Regional Participation

 
Finally, regional participation was:

Europe, 35.4%
US/Canada, 30.1%
Asia, 18.8%
Latin America, 6.1%
Australia/NZ, 2%
Africa/Middle East, 4.2%
Other, 3.2%

Figure

Fig. 3: Regional Participation, 2018/2019

 

Thanks to everyone who participated in the poll, and for making this compilation and analysis of trends possible.

Here are the results of past polls: