Ensemble Learning to Improve Machine Learning Results

Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking).



Stacking

 
Stacking is an ensemble learning technique that combines multiple classification or regression models via a meta-classifier or a meta-regressor. The base level models are trained based on a complete training set, then the meta-model is trained on the outputs of the base level model as features.

The base level often consists of different learning algorithms and therefore stacking ensembles are often heterogeneous. The algorithm below summarizes stacking.

The following accuracy is visualized in the top right plot of the figure above:

  Accuracy: 0.91 (+/- 0.01) [KNN]
  Accuracy: 0.91 (+/- 0.06) [Random Forest]
  Accuracy: 0.92 (+/- 0.03) [Naive Bayes]
  Accuracy: 0.95 (+/- 0.03) [Stacking Classifier]

The stacking ensemble is illustrated in the figure above. It consists of k-NN, Random Forest, and Naive Bayes base classifiers whose predictions are combined by Logistic Regression as a meta-classifier. We can see the blending of decision boundaries achieved by the stacking classifier. The figure also shows that stacking achieves higher accuracy than individual classifiers and based on learning curves, it shows no signs of overfitting.

Stacking is a commonly used technique for winning the Kaggle data science competition. For example, the first place for the Otto Group Product Classification challenge was won by a stacking ensemble of over 30 models whose output was used as features for three meta-classifiers: XGBoost, Neural Network, and Adaboost. See the following link for details.

 

Code

 
In order to view the code used to generate all figures, have a look at the following ipython notebook.

 

Conclusion

 
In addition to the methods studied in this article, it is common to use ensembles in deep learning by training diverse and accurate classifiers. Diversity can be achieved by varying architectures, hyper-parameter settings, and training techniques.

Ensemble methods have been very successful in setting record performance on challenging datasets and are among the top winners of Kaggle data science competitions.

Recommended reading

 
Bio: Vadim Smolyakov is passionate about data science and machine learning. Check out his Github.

Original. Reposted with permission.

Related: