Fast Gradient Boosting with CatBoost
In this piece, we’ll take a closer look at a gradient boosting library called CatBoost.
In gradient boosting, predictions are made from an ensemble of weak learners. Unlike a random forest that creates a decision tree for each sample, in gradient boosting, trees are created one after the other. Previous trees in the model are not altered. Results from the previous tree are used to improve the next one. In this piece, we’ll take a closer look at a gradient boosting library called CatBoost.
CatBoost is a depthwise gradient boosting library developed by Yandex. It uses oblivious decision trees to grow a balanced tree. The same features are used to make left and right splits for each level of the tree.
As compared to classic trees, the oblivious trees are more efficient to implement on CPU and are simple to fit.
Dealing with Categorical Features
The common ways of handling categorical in machine learning are onehot encoding and label encoding. CatBoost allows you to use categorical features without the need to preprocess them.
When using CatBoost, we shouldn’t use onehot encoding, as this will affect the training speed, as well as the quality of predictions. Instead, we simply specify the categorical features using the cat_features
parameter.
Advantages of using CatBoost
Here are a few reasons to consider using CatBoost:
 CatBoost allows for training of data on several GPUs.
 It provides great results with default parameters, hence reducing the time needed for parameter tuning.
 Offers improved accuracy due to reduced overfitting.
 Use of CatBoost’s model applier for fast prediction.
 Trained CatBoost models can be exported to Core ML for ondevice inference (iOS).
 Can handle missing values internally.
 Can be used for regression and classification problems.
Training Parameters
Let’s look at the common parameters in CatBoost:
loss_function
alias asobjective
— Metric used for training. These are regression metrics such as root mean squared error for regression and logloss for classification.eval_metric
— Metric used for detecting overfitting.iterations
— The maximum number of trees to be built, defaults to 1000. It aliases arenum_boost_round
,n_estimators
, andnum_trees
.learning_rate
aliaseta
— The learning rate that determines how fast or slow the model will learn. The default is usually 0.03.random_seed
aliasrandom_state
— The random seed used for training.l2_leaf_reg
aliasreg_lambda
— Coefficient at the L2 regularization term of the cost function. The default is 3.0.bootstrap_type
— Determines the sampling method for the weights of the objects, e.g Bayesian, Bernoulli, MVS, and Poisson.depth
—The depth of the tree.grow_policy
— Determines how the greedy search algorithm will be applied. It can be eitherSymmetricTree
,Depthwise
, orLossguide
.SymmetricTree
is the default. InSymmetricTree
, the tree is built levelbylevel until the depth is attained. In every step, leaves from the previous tree are split with the same condition. WhenDepthwise
is chosen, a tree is built stepbystep until the specified depth is achieved. On each step, all nonterminal leaves from the last tree level are split. The leaves are split using the condition that leads to the best loss improvement. InLossguide
, the tree is built leafbyleaf until the specified number of leaves is attained. On each step, the nonterminal leaf with the best loss improvement is splitmin_data_in_leaf
aliasmin_child_samples
— This is the minimum number of training samples in a leaf. This parameter is only used with theLossguide
andDepthwise
growing policies.max_leaves
aliasnum_leaves
— This parameter is used only with theLossguide
policy and determines the number of leaves in the tree.ignored_features
— Indicates the features that should be ignored in the training process.nan_mode
— The method for dealing with missing values. The options areForbidden
,Min
, andMax
. The default isMin
. WhenForbidden
is used, the presence of missing values leads to errors. WithMin
, the missing values are taken as the minimum values for that feature. InMax
, the missing values are treated as the maximum value for the feature.leaf_estimation_method
— The method used to calculate values in leaves. In classification, 10Newton
iterations are used. Regression problems using quantile or MAE loss use oneExact
iteration. Multi classification uses oneNetwon
iteration.leaf_estimation_backtracking
— The type of backtracking to be used during gradient descent. The default isAnyImprovement
.AnyImprovement
decreases the descent step, up to where the loss function value is smaller than it was in the last iteration.Armijo
reduces the descent step until the Armijo condition is met.boosting_type
— The boosting scheme. It can beplain
for the classic gradient boosting scheme, orordered
, which offers better quality on smaller datasets.score_function
— The score type used to select the next split during tree construction.Cosine
is the default option. The other available options areL2
,NewtonL2
, andNewtonCosine
.early_stopping_rounds
— WhenTrue
, sets the overfitting detector type toIter
and stops the training when the optimal metric is achieved.classes_count
— The number of classes for multiclassification problems.task_type
— Whether you are using a CPU or GPU. CPU is the default.devices
— The IDs of the GPU devices to be used for training.cat_features
— The array with the categorical columns.text_features
—Used to declare text columns in classification problems.
Regression Example
CatBoost uses the scikitlearn standard in its implementation. Let’s see how we can use it for regression.
The first step — as always — is to import the regressor and instantiate it.
from catboost import CatBoostRegressor
cat = CatBoostRegressor()
When fitting the model, CatBoost also enables use to visualize it by setting plot=true
:
cat.fit(X_train,y_train,verbose=False, plot=True)
It also allows you to perform crossvalidation and visualize the process:
Similarly, you can also perform grid search and visualize it:
We can also use CatBoost to plot a tree. Here’s the plot is for the first tree. As you can see from the tree, the leaves on every level are being split on the same condition—e.g 297, value >0.5.
cat.plot_tree(tree_idx=0)
CatBoost also gives us a dictionary with all the model parameters. We can print them by iterating through the dictionary.
for key,value in cat.get_all_params().items():
print(‘{}, {}’.format(key,value))
Final Thoughts
In this piece, we’ve explored the benefits and limitations of CatBoost, along with its primary training parameters. Then, we worked through a simple regression implementation with scikitlearn. Hopefully this gives you enough information on the library so that you can explore it further.
CatBoost  stateoftheart opensource gradient boosting library with categorical features support
CatBoost is an algorithm for gradient boosting on decision trees. It is developed by Yandex researchers and engineers...
The Data Science Bootcamp in Python
Learn Python for Data Science,NumPy,Pandas,Matplotlib,Seaborn,Scikitlearn, Dask,LightGBM,XGBoost,CatBoost and much...
Bio: Derrick Mwiti is a data analyst, a writer, and a mentor. He is driven by delivering great results in every task, and is a mentor at Lapid Leaders Africa.
Original. Reposted with permission.
Related:
 LightGBM: A HighlyEfficient Gradient Boosting Decision Tree
 Understanding Gradient Boosting Machines
 Mastering Fast Gradient Boosting on Google Colaboratory with free GPU
Top Stories Past 30 Days

