Decision Trees — An Intuitive Introduction

An extensive introduction including a look at decision tree classification, data distribution, decision tree regression, decision tree learning, information gain, and more.

By Prateek Karkare, Research Associate at Nanyang Technological University.

Imagine you are out to buy a cell phone for yourself.

Shopkeeper asks,“How can I help you Ma’am?”
“I am looking for a cell phone”
“You are at the right place, we have over 300 different types of cell phones, what kind of phone would you like to buy today?”
Decision paralysis hits you, totally confused among so many choices of phones you go blank!


“Let me help you choose a phone ma’am. What screen size would you like?”
“Umm… larger than 5.9 inches”
“Perfect, and how about the camera?”
“Definitely more than 14 Megapixels”
“Alright, and any preferences on the processor?”
“I want a quad core processor with at least 1.2 GHz speed”
“Sure ma’am, I’ve got the perfect phone for you I am sure you will like this one” And he hands over a phone to you.
“Oh Thank You so much this one is good”
“Great choice ma’am, congratulations for your new phone”

What the shopkeeper just did was to help you walk through a decision tree to narrow down your choices. Pictorially it would look like —

Mobile phone buying decision tree

The “Yes” and “No” arrows are branches and the questions are our nodes with leaves as a buy or Don’t buy decision and hence we call it a decision tree.

Decision trees are everywhere. Knowingly or unknowingly we use them every day. When you say, “If it’s raining, I will bring an umbrella,” you’ve just constructed a simple decision tree like this —

It is a very simplistic decision tree and doesn’t account for too many situations. For instance, what if it is windy and raining outside? You would rather take a rain jacket instead of an umbrella and your decision tree would look like —

What if it is snowing? or the wind is too strong? You can keep adding conditions to this tree and it can keep getting bigger (deeper) with more branches to handle more situations. Simple enough, right? Well, that is the power of decision trees! Decision trees are very simple and understandable. They allow you to see exactly how a particular decision is reached.

Decision trees can be used for both classification and regression tasks. They comprise a supervised learning algorithm like a Neural Network. Let’s see how classification and regression works for decision trees.

Decision Tree Classification

Let’s consider a classification task where you have to classify an Iris flower data set into 3 different categories based on values of some attributes — Iris Setosa, Iris Versicolor and Iris Virginica; Setosa, Versicolor and Virginica are different varieties of Iris flowers. There are some attribute values which differ among these. We will use their petal length and petal width as attributes or features of interest.

Data distribution for Iris flowers

We can separate these varieties of iris flowers with two different lines into 3 boxy regions. We can separate Setosa from the other two with a horizontal line which corresponds to length = 2.45 cm. An Iris flower with petal length less than 2.45 cm is a Setosa variety. Above this line would either be a Versicolor or a Virginica.

Let‘s draw another line to separate the other two varieties. Versicolor and Virginica can be separated by a vertical line corresponding to petal width = 1.75 cm. Note that this condition is on top of the earlier condition on petal length.

Now we have our lines which divide the data into 3 different parts. The decision tree for this looks like —

Now when you see a new Iris flower just measure its petal length and width and run it down the decision tree and you can classify which variety of Iris is this. For example, an Iris flower you found while on a hike had petal length 2.48 cm and petal width 1.86 cm, your decision tree tells you that this is Iris Virginica since petal length >2.45 cm and petal width > 1.75 cm.

Decision trees are good at classification but as I said earlier they can also be used for regression tasks. How does a decision tree do regression?

Decision Tree Regression

Regression unlike classification predicts continuous values. You can read about a simple regression algorithm called linear regression for a primer on what is regression.

Regression works similar to classification in decision trees, we choose the values to partition our data set but instead of assigning class to a particular region or a partitioned area, we return the average of all the data points in that region. The average value minimizes the prediction error in a decision tree. An example would make it clearer.

Predicting rainfall for a particular season is a regression problem since rainfall is a continuous quantity. Given rainfall stats like in the figure below how can a decision tree predict rainfall value for a specific season?

Rainfall for each quarter

A decision tree to predict rainfall for a quarter would just return the average value for that quarter.

Average values of rainfall for a quarter

Decision tree for predicting the rainfall for a quarter

A variant of a decision tree can also fit a line to each of the section (quarter) which looks like —

Linear fit for rainfall values

Decision tree for a linear approximation of rainfall

Decision trees which return the linear fit are usually more prone to overfitting specially in regions with less data points.

So far so good. Our decision tree is making predictions of continuous values and classifications as well. But being a supervised learning algorithm how does it learn to do so; in other words how do we build a decision tree? Who tells the tree to pick a particular attribute first and then another attribute and then yet another? How does the decision tree know when to stop branching further? Just like how we train a neural network before using it for making predictions we have to train (build) a decision tree before prediction.