KDnuggets Home » News » 2017 » Dec » Opinions, Interviews » Top Data Science and Machine Learning Methods Used in 2017 ( 17:n47 )

Gold BlogTop Data Science and Machine Learning Methods Used in 2017


The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most "industrial" and most "academic".



Latest KDnuggets Poll asked:
Which Data Science / Machine Learning methods and tools you used in the past 12 months for a real-world application?


The results, based on 732 voters, show that the top 10 methods are the same as in 2016 poll, although in slightly different order:

Top 10 Data Science Methods Used 2017
Fig. 1: Top 10 Data Science, Machine Learning Methods Used, 2017


The average respondent used 7.7 tools/methods, similar to 2016 poll.

Next, we compared the top 16 methods in this year's poll with their share last year - see Fig. 2.

Top 16 Data Science Methods 2017 Vs 2016 676
Fig. 2: Top 16 Data Science, Machine Learning Methods Used, 2017 vs 2016


We note a significant increase in Random Forests, Visualization, and Deep Learning share of usage, and decline in K-nn, PCA, and Boosting. Gradient Boosting Machines was a new entry in 2017.

Deep Learning, despite its amazing successes, is reported used by only about 20% of KDnuggets readers.

The biggest relative increases, measured by (share2017 /share2016 - 1) are for
  • Bayesian methods, 49% up, from 11.7% share in 2016 to 17.5% share in 2017
  • Random Forests, 32% up, from 35.1% to 46.2%
  • Deep Learning, 20% up, from 17.2% to 20.6%
  • Survival Analysis, 13.5% up, from 7.5% to 8.5%
  • Visualization, 9% up, from 46.7% to 51.0%
We also added new methods and here is their share in 2017:
  • Gradient Boosted Machines, 20.4%
  • Conv Nets, 15.8%
  • Recurrent Neural Networks (RNN), 10.5%
  • Hidden Markov Models (HMM), 4.6%
  • Reinforcement Learning, 4.2%
  • Markov Logic Networks, 2.5%
  • Generative Adversarial Networks (GAN), 2.3%
The largest decline in share of usage was for
  • Singular Value Decomposition (SVD), 48% down, from 15.4% share in 2017 to 8.1% share in 2016
  • Graph / Link / Social Network Analysis, 42% down, from 14.0% to 8.1%
  • Genetic algorithms/Evolutionary methods, 42% down, from 8.3% to 4.8%
  • EM, 36% down, from 6.4% to 4.1%
  • Optimization, 26% down, from 23.2% to 17.2%
  • Boosting, 20% down, from 30.6% to 24.6%
  • PCA, 14% down, from 40.5% to 34.7%

Affiliation

Participation by affiliation was
  • Industry/Self-Employed, 63%, 8.3 avg. tools used
  • Student, 15%, 5.7 avg. tools used
  • Researcher/Academia, 11%, 7.8 avg. tools used
  • other, 11%, 7.1 avg. tools
Note: Only about 35 voters selected Government/Non-profit affiliation - too small a sample to analyze separately, so we merged them with the affiliation "other".

Here are the top 16 methods and their bias by affiliation, computed as
Bias(Method,Affiliation) = Share(Method,Affiliation)/Share(Method) - 1


If Bias positive, it means this method is used more by this group than average If negative, it is used less by this group than average.

For example, support vector machines (SVM) are used by 28.7% of all respondents, but by 44.4% of Researchers, so Bias(SVM,Researcher)=44.4%/28.7% - 1 = 54.9%.

Poll Data Science Method Bias Affiliation
Fig. 3: Top 16 Data Science Methods and their bias by Affiliation


Next, we examine all methods their affinity to Industry vs Academia.