When Would Ensemble Techniques be a Good Choice?

When would ensemble techniques be a good choice? When you want to improve the performance of machine learning models - it’s that simple.

By Nisha Arya, Contributing Editor & Marketing and Client Success Manager on July 18, 2022 in Machine Learning

Brett Jordan via Unsplash

In any decision making process, you need to gather all the different types of data you have, all the information, pull it apart, learn more about it, get experts in, and more before making a solid decision.

This is similar in the Machine Learning process with Ensemble techniques. Ensemble models combine a variety of models together to help the prediction (decision making) process. A single model may not have the capabilities of producing the right prediction for a specific data set - this raises the chance of high variance, low accuracy and noise and bias. By combining multiple models, we effectively have a higher chance to improve the level of accuracy.

The easiest example is Decision Trees - a probability tree-like structure model that continuously splits data to make predictions based on the previous set of questions that were answered.

Why Would You Use An Ensemble Technique?

To answer the question of this article, “When would ensemble techniques be a good choice?” When you want to improve the performance of machine learning models - it’s that simple.

For example, if you’re working on a classification task and you wish to increase the accuracy of your model - use ensemble techniques. If you want to reduce your mean error for your regression task - use ensemble techniques.

The main 2 reasons for using an ensemble learning algorithms is:

Improve predictions - you will achieve better predictive skill rather than just using a single model.
Improve robustness - you will achieve better stable predictions rather than just using a single model.

Your overall aim when using ensemble techniques should be to reduce the generalization error of the prediction. Therefore, using a variety of base models which are diverse will automatically decrease your prediction error.

It is essentially building a more stable, reliable and accurate model that you trust.

There are 3 types of ensemble modeling techniques:

Bagging
Boosting
Stacking

Bagging

Short for Bootstrap Aggregation, as the ensemble modeling technique combines Bootstrapping and Aggregation to form one ensemble model. It is based on creating multiple sets of the original training data, creating tree-like structure probability models which then aggregate to conclude to a final prediction.

Each model learns about the errors produced in the previous model and uses a different subset of the training data set. Bagging aims to avoid overfitting of data and reduce the variance in the predictions and can be used for both regression and classification models.

Random Forest

Random Forest is an algorithm of Bagging but with a slight difference. It uses a subset of samples of the training data and a subset of features to build multiple trees that split. You can see it as multiple decision trees which fit each training set in a random mode.

The decision on the split is based on a random selection of features causing a differentiation between each tree. This produces a more accurate aggregated result and final prediction.

Other example algorithm are:

Bagged Decision Trees
Extra Trees
Custom Bagging

Boosting

Boosting is the act of converting weak learners to strong learners. A weak learner fails to make accurate predictions due to their capabilities. A new weak prediction rule is generated by applying base learning algorithms. This is done by taking a random sample of data which is then inputted into a model and then trained sequentially which aims to train the weak learners and try to correct its predecessor.

An example of Boosting is AdaBoost and XGBoost

AdaBoost

AdaBoost is short for Adaptive Boosting and is used as a technique to boost the performance of a machine learning algorithm. It takes the notion of Random Forests weak learners and builds models on top of several weak learners.

class sklearn.ensemble.AdaBoostClassifier(base_estimator=None, *, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)

XGBoost

XGboosts stands for Extreme Gradient Boosting and is one of the most popular boosting algorithms that can be used for both regression and classification tasks. It is a type of supervised learning machine learning algorithm which aims to accurately predict a target variable by combining a set of weaker models.

Other example algorithms are:

Gradient Boosting Machine
Stochastic Gradient Boosting
LightGBM

When should I use Bagging vs Boosting?

The simplest way to determine when to use bagging or boosting is:

If the classifier is unstable and has high variance - use Bagging
If the classifier is stable, however has high bias - use Boosting

Stacking

Stacking is short for Stacked Generalization and is similar to boosting; with the aim to produce more robust predictors. This is done by taking the predictions from weak learners and using that to create a strong model. It does this by figuring out how to best combine the predictions from multiple models on the same dataset.

It is basically asking you ‘If you had a variety of machine learning models that perform well on a specific problem, how do you choose which model is the best to trust?’

Voting

Voting is an example of Stacking, however it is different for both classification and regression tasks.

For regression, the prediction is made based on the average of other regression models.

class sklearn.ensemble.VotingRegressor(estimators, *, weights=None, n_jobs=None, verbose=False)

For classification, there can either be hard voting or soft voting. Hard voting is essentially picking the prediction with the highest number of votes, whereas soft voting is combining the probabilities of each prediction in each of the models and then picking the prediction with the highest total probability.

class sklearn.ensemble.VotingClassifier(estimators, *, voting='hard', weights=None, n_jobs=None, flatten_transform=True, verbose=False)

Other example algorithms are:

Weighted Average
Blending
Stacking
Super Learner

What’s the Difference Between Stacking and Bagging/Boosting?

Bagging uses decision trees, where stacking uses different models. Bagging takes samples from the training dataset, where stacking fits on the same dataset.

Boosting uses a sequence of models that converts weak learners to strong learners to correct the prior models predicting, whereas stacking uses a single model to learn how to combine the predictions from the contributing models in the best way.

Aggregating everything

You will always need to understand what you’re trying to achieve before you attempt to solve a task. Once you do that, you will be able to determine if your task is a classification or regression task - in which you can then choose which ensemble algorithm will be the best to use to improve your models predictions and robustness.

Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.