Generalized and Scalable Optimal Sparse Decision Trees(GOSDT)

A simple method to solve complex real-life problems.

Image by fabrikasimf on Freepik

I often talk about explainable AI(XAI) methods and how they can be adapted to address a few pain points that prohibit companies from building and deploying AI solutions. You can check my blog if you need a quick refresher on XAI methods.

One such XAI method is Decision Trees. They have gained significant traction historically because of their interpretability and simplicity. However, many think that decision trees cannot be accurate because they look simple, and greedy algorithms like C4.5 and CART don’t optimize them well.

The claim is partially valid as some variants of decision trees, such as C4.5 and CART, have the following disadvantages:

1. Prone to overfitting, particularly when the tree becomes too deep with too many branches. This can result in poor performance on new, unseen data.
2. It can be slower to evaluate and make predictions with large datasets because they require making multiple decisions based on the values of the input features.
3. It can be difficult for them to deal with continuous variables as they require the tree to split the variable into multiple, smaller intervals, which can increase the complexity of the tree and make it difficult to identify meaningful patterns in the data.
4. Often known as the “greedy” algorithm, it makes the locally optimal decision at each step without considering the consequences of those decisions on future steps. Sub Optimal Trees are an output of CART, but no “real” metric exists to measure it.

More sophisticated algorithms, such as Ensemble Learning Methods, are available to address these issues. But often can be considered a “black box” because of the underlined functioning of the algorithms.

However, recent work has shown that if you optimize decision trees (rather than using greedy methods like C4.5 and CART), they can be surprisingly accurate, in many cases, as accurate as the black box. One such algorithm that can help optimize and address some of the disadvantages mentioned above is GOSDT. GOSDT is an algorithm for producing sparse optimal decision trees.

The blog aims to give a gentle introduction to GOSDT and present an example of how it can be implemented on a dataset.

This blog is based on a research paper published by a few fantastic folks. You can read the paper here. This blog is not a substitute for this paper, nor will it touch on extremely mathematical details. This is a guide for data science practitioners to learn about this algorithm and leverage it in their daily use cases.

In a nutshell, GOSDT addresses a few major issues:

1. Handle Imbalanced datasets well and optimize various objective functions (not just accuracy).
2. Fully optimizes trees and does not greedily construct them.
3. It is almost as fast as greedy algorithms as it solves NP-hard optimization problems for decision trees.

How do GOSDT trees solve the above issues?

1. GOSDT trees use a dynamic search space through hash trees to improve the model’s efficiency. By limiting the search space and using bounds to identify similar variables, GOSDT trees can reduce the number of calculations needed to find the optimal split. This can significantly improve the computation time, mainly when working with continuous variables.
2. In GOSDT trees, the bounds for splitting are applied to partial trees, and they are used to eliminate many trees from the search space. This allows the model to focus on one of the remaining trees (which can be a partial tree) and evaluate it more efficiently. By reducing the search space, GOSDT trees can quickly find the optimal split and generate a more accurate and interpretable model.
3. GOSDT trees are designed to handle imbalanced data, a common challenge in many real-world applications. GOSDT trees address imbalanced data using a weighted accuracy metric that considers the relative importance of different classes in the dataset. This can be particularly useful when there is a pre-determined threshold for the desired level of accuracy, as it allows the model to focus on correctly classifying samples that are more critical to the application.

Summarizing the observations from GOSDT

1. These trees directly optimize the trade-off between training accuracy and the number of leaves.
2. Produces excellent training and test accuracy with a reasonable number of leaves
3. Perfect for highly non-convex problems
4. Most effective for small or medium number of features. But it can handle up to tens of thousands of observations while maintaining its speed and accuracy.

Time to see it all in action!! In my previous blog, I solved a loan application approval problem using Keras Classification. We will use the same dataset to build a classification tree using GOSDT.

Code Example

Code by Author

Supreet Kaur is an AVP at Morgan Stanley. She is a fitness and tech enthusiast. She is the founder of community called DataBuzz.