Structure

Tree is a greedy algorithm that splits data and minimizes error in each decision node. Most used form is a binary tree

feature/variable: all variables
target/class attribute: the variable goal of classification
root node: the entire sample that gets split into two or more homogeneous sets.
decision nodes: nodes that split into further sub-nodes
leaves (terminal nodes): nodes that do not split
error: proportion of points misclassified
- for example we have a region of 4 blue points and 2 red points, this region should be classified as blue (to maximize purity); but then 2 red points are misclassified →
depth of a tree (vertical depth) = the length of the longest path from the root to a leaf (number of splits, NOT nodes)

Usage

Work with structured data, in both classification and regression tasks.

Strength

Fast
Easy to understand/interpret by humans
Useful in data exploration
Less pre-processing (normalization, scaling, )
Handle both quantitative (discrete) and qualitative variables
Non-parametric: no assumptions about the data distribution & classifier structure

Weakness

Computationally expensive relative to other algorithms like Naive Bayes
Highly sensitive to small changes of the data ➡️ ensemble method
Likely to overfit ➡️ tree pruning, ensemble trees such as Random Forest or XGBoost

Process

Pre-processing
1. Categorical/Qualitative features:
2. Continuous variables: get where is the target range
Splitting: How to choose what feature to split on at each node?
Tree pruning: When do we stop splitting?

Pre-processing

Work with categorical features

label encoding
one hot encoding

Work with continuous valued features

(ML Spec) Say, we want to make a binary split for training examples of feature

Sort all training examples by feature
Take all the values that are midpoints between the sorted list

Splitting

A general purpose is to have more data points of one class (correctly classified) in each sub-node. In other words, we want to maximize the purity of a node.

There are several possible metrics

minimize entropy (maximize information gain)
minimize gini impurity (i.e. maximize Gini purity)

Information Gain

Transclude of entropy#formula

Example

Before a split, the entropy of a group of 5 cats and 5 non-cats is . After splitting on a particular feature, a group of animals (4 of which are cats) has an entropy of . The other group of animals (1 is a cat) and has an entropy of . What is the expression for information gain?

Gini (Binary tree)

Transclude of gini-impurity

Tree pruning

We should stop splitting when improvements in purity are insignificant, to avoid overfitting, by setting constraints for one or more of these hyperparameters:

Minimum number of samples required in a node to be considered for splitting. min_samples_split
Minimum samples per leaf. min_samples_leaf
Maximum depth max_depth
Maximum number of leaves max_leaf_nodes
Maximum features to consider for split max_features: As a thumb-rule, square root of the total number of features, but we should check up to 30-40% of the total number of features.

Tree ensemble

Advantage of tree-based models

High accuracy
Require minimal pre-processing (e.g. scaling, normalizing)

Code

Decision Tree sklearn GridSearch

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
 
# Read the data.
data = pd.read_csv('data.csv', header=None)
# Assign the features to the variable X, and the labels to the variable y. 
X = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
 
# Create & fit the model 
model = DecisionTreeClassifier(criterion = 'entropy', max_depth = 5, min_samples_leaf = 5)
model.fit(X, y)
 
# Make predictions
y_pred = model.predict(X)
 
# Calculate the accuracy 
acc = accuracy_score(y, y_pred)

My (Chiffon) Nguyen

Explorer

decision tree (ML)

Structure

Usage

Strength

Weakness

Process

Pre-processing

Work with categorical features

Work with continuous valued features

Splitting

Information Gain

Gini (Binary tree)

Tree pruning

Tree ensemble

Advantage of tree-based models

Code

Graph View

Table of Contents

Backlinks