Ensemble Methods: Elegant Techniques to Produce Improved Machine Learning Results
Get a handle on ensemble methods from voting and weighting to stacking and boosting, with this well-written overview that includes numerous Python-style pseudocode examples for reinforcement.
Bootstrap Aggregating
The name Bootstrap Aggregating, also known as “Bagging”, summarizes the key elements of this strategy. In the bagging algorithm, the first step involves creating multiple models. These models are generated using the same algorithm with random sub-samples of the dataset which are drawn from the original dataset randomly with bootstrap sampling method. In bootstrap sampling, some original examples appear more than once and some original examples are not present in the sample. If you want to create a sub-dataset with m elements, you should select a random element from the original dataset m times. And if the goal is generating n dataset, you follow this step n times.
At the end, we have n datasets where the number of elements in each dataset is m. The following Python-esque pseudocode show bootstrap sampling:
def bootstrap_sample(original_dataset, m): sub_dataset = [] for i in range(m): sub_dataset.append( random_one_element(original_dataset) ) return sub_dataset
The second step in bagging is aggregating the generated models. Well known methods, such as voting and averaging, are used for this purpose.
The overall pseudocode look like this:
def bagging(n, m, base_algorithm, train_dataset, target, test_dataset): predictions = matrix(row_length=len(target), column_length=n) for i in range(n): sub_dataset = bootstrap_sample(train_dataset, m) predictions[,i] = base_algorithm.fit(original_dataset, target).predict(test_dataset) final_predictions = voting(predictions) # for classification final_predictions = averaging(predictions) # for regression return final_predictions
In bagging, each sub-samples can be generated independently from each other. So generation and training can be done in parallel.
You can also find implementation of the bagging strategy in some algorithms. For example, Random Forest algorithm uses the bagging technique with some differences. Random Forest uses random feature selection, and the base algorithm of it is a decision tree algorithm.
Boosting: Converting Weak Models to Strong Ones
The term “boosting” is used to describe a family of algorithms which are able to convert weak models to strong models. The model is weak if it has a substantial error rate, but the performance is not random (resulting in an error rate of 0.5 for binary classification). Boosting incrementally builds an ensemble by training each model with the same dataset but where the weights of instances are adjusted according to the error of the last prediction. The main idea is forcing the models to focus on the instances which are hard. Unlike bagging, boosting is a sequential method, and so you can not use parallel operations here.
The general procedure of the boosting algorithm is defined as follows:
def adjust_dataset(_train, errors): #create a new dataset by using the hardest instances ix = get_highest_errors_index(train) return concat(_train[ix], random_select(train)) models = [] _train = random_select(train) for i in range(n): #n rounds model = base_algorithm.fit(_train) predictions = model.predict(_train) models.append(model) errors = calculate_error(predictions) _train = adjust_dataset(_train, errors) final_predictions = combine(models, test)
The adjust_dataset function returns a new dataset containing the hardest instances, which can then be used to force the base algorithm to learn from.
Adaboost is a widely known algorithm which is a boosting method. The founders of Adaboost won the Gödel Prize for their work. Mostly, decision tree algorithm is preferred as a base algorithm for Adaboost and in sklearn library the default base algorithm for Adaboost is decision tree (AdaBoostRegressor and AdaBoostClassifier). As we discussed in the previous paragraph, the same incremental method applies for Adaboost. Information gathered at each step of the AdaBoost algorithm about the ‘hardness’ of each training sample is fed into the model. The ‘adjusting dataset’ step is different from the one described above and the ‘combining models’ step is calculated by using weighted voting.
Conclusion
Although ensemble methods can help you win machine learning competitions by devising sophisticated algorithms and producing results with high accuracy, it is often not preferred in the industries where interpretability is more important. Nonetheless, the effectiveness of these methods are undeniable, and their benefits in appropriate applications can be tremendous. In fields such as healthcare, even the smallest amount of improvement in the accuracy of machine learning algorithms can be something truly valuable.
Bio: Necati Demir is a PhD candidate studying machine learning. He has 10 years of experience in industry and providing freelancer consultancy for machine learning/data mining projects.
Original. Reposted with permission.
Related: