What are the Assumptions of XGBoost?

In this article, you will learn: how boosting relates to XGBoost; the features of XGBoost; how it reduces the loss function value and overfitting.



What are the Assumptions of XGBoost?
Faye Cornish via Unsplash

 

Before we get into the assumptions of XGBoost, I will do an overview of the algorithm.

XGBoost stands for Extreme Gradient Boosting and is a supervised learning algorithm and falls under the gradient-boosted decision tree (GBDT) family of machine learning algorithms.

Let’s dive into boosting first.

 

From Boosting to XGBoost

 

Boosting is the method of combining a set of weak learners into a strong learner to reduce the level of training errors. Boosting helps to deal with bias-variance trade-off making it more effective. There are different types of boosting algorithms, such as AdaBoost (Adaptive Boosting), Gradient Boosting, XGBoost, and more.

Now let’s go into XGBoost

As mentioned before, XGBoost is an extension to gradient boosted decision trees (GBM) and is well known for its speed and performance.

Predictions are made based on combining a set of simpler, weaker models - these models are decision trees that are created in sequential form. These models make predictions based on evaluating other decision trees through if-then-else true/false feature questions, using these to assess and estimate the probability of producing a correct decision. 

It consists of these three elements:

  1. A loss function that is to be optimized.
  2. A weak learner to make predictions.
  3. An additive model to add to the weak learners to reduce errors

 

Features of XGBoost

 

There are 3 features of XGBoost:

 

1. Gradient Tree Boosting

 

The tree ensemble model needs to be trained in an additive manner. Meaning that it is an iterative and sequential process where decision trees are added one step at a time. There are a fixed number of trees added and with each iteration, there should be a reduction in loss function value. 

 

2. Regularized Learning 

 

Regularized Learning helps to minimize the loss function and prevent overfitting or underfitting from occurring - helping to smooth out the final learnt weight. 

 

3. Shrinkage and Feature Subsampling

 

These two techniques are used to further prevent overfitting. 

Shrinkage reduces the level of influence each tree has on the overall model and allows room for future trees to potentially improve it. 

Feature Subsampling is something that you may have seen in the Random Forest algorithm. The features are in the column section of the data and not only does it prevent overfitting, but it also speeds up computations of the parallel algorithm.

 

XGBoost Hyperparameters

 

import xgboost as xgb


XGBoost hyperparameters are divided into 4 groups:

  1. General parameters
  2. Booster parameters
  3. Learning task parameters
  4. Command line parameters

General parameters, Booster parameters and Task parameters are set before running the XGBoost model. The Command line parameters are only used in the console version of XGBoost.

If the parameters are not tuned properly, it can easily lead to overfitting. However, it is difficult to tune the parameters of an XGBoost model.

Stay tuned for an upcoming article about Tuning XGBoost Hyperparameters.

 

What Are the Assumptions of XGBoost?

 

The main assumptions of XGBoost are:

  • XGBoost may assume that encoded integer values for each input variable have an ordinal relationship
  • XGBoost assume that your data may not be complete (i.e. it can deal with missing values)

As it DOES NOT assume that all values are present, the algorithm can handle missing values by default. When working with tree based algorithms, missing values are learned during the training phase. This then leads to the fact that:

  • XGBoost can handle sparsity

XGBoost manages only numeric vectors, therefore if you have categorical variables they will need to be converted into numeric variables. 

You will have to transform a dense dataframe with few zeroes in the matrix to a very sparse matrix which has lots of zero in the matrix. This means that XGBoost has the ability to convert variables into a sparse matrix format as an input.

 

Conclusion

 

In this blog, you have come to understand: how boosting relates to XGBoost; the features of XGBoost; how it reduces the loss function value and overfitting. We have briefly gone over the 4 hyperparameters which will be followed up with an article which solely focuses on Tuning XGBoost Hyperparameters.

 
 
Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.