Open In App

Iterative Dichotomiser 3 (ID3) Algorithm From Scratch

In the realm of machine learning and data mining, decision trees stand as versatile tools for classification and prediction tasks. The ID3 (Iterative Dichotomiser 3) algorithm serves as one of the foundational pillars upon which decision tree learning is built. Developed by Ross Quinlan in the 1980s, ID3 remains a fundamental algorithm, forming the basis for subsequent tree-based methods like C4.5 and CART (Classification and Regression Trees).

Introduction to Decision Trees

Machine learning models called decision trees divide the input data recursively according to features to arrive at a decision. Every internal node symbolizes a feature, and every branch denotes a potential result of that feature. It is simple to interpret and visualize thanks to the tree structure. Every leaf node makes a judgment call or forecast. To optimize information acquisition or limit impurity, the best feature is chosen at each stage of creation. Decision trees are adaptable and can be used for both regression and classification applications. Although they can overfit, this is frequently avoided by employing strategies like pruning.



Decision Trees

Before delving into the intricacies of the ID3 algorithm, let’s grasp the essence of decision trees. Picture a tree-like structure where each internal node represents a test on an attribute, each branch signifies an outcome of that test, and each leaf node denotes a class label or a decision. Decision trees mimic human decision-making processes by recursively splitting data based on different attributes to create a flowchart-like structure for classification or regression.

ID3 Algorithm

A well-known decision tree approach for machine learning is the Iterative Dichotomiser 3 (ID3) algorithm. By choosing the best characteristic at each node to partition the data depending on information gain, it recursively constructs a tree. The goal is to make the final subsets as homogeneous as possible. By choosing features that offer the greatest reduction in entropy or uncertainty, ID3 iteratively grows the tree. The procedure keeps going until a halting requirement is satisfied, like a minimum subset size or a maximum tree depth. Although ID3 is a fundamental method, other iterations such as C4.5 and CART have addresse



How ID3 Works

The ID3 algorithm is specifically designed for building decision trees from a given dataset. Its primary objective is to construct a tree that best explains the relationship between attributes in the data and their corresponding class labels.

1. Selecting the Best Attribute

2. Creating Tree Nodes

3. Stopping Criteria

4. Handling Missing Values

5. Tree Pruning

Mathematical Concepts of ID3 Algorithm

Now let’s examine the formulas linked to the main theoretical ideas in the ID3 algorithm:

1. Entropy

A measure of disorder or uncertainty in a set of data is called entropy. Entropy is a tool used in ID3 to measure a dataset’s disorder or impurity. By dividing the data into as homogenous subsets as feasible, the objective is to minimize entropy.

For a set S with classes {c1, c2, …, cn}, the entropy is calculated as:

Where, pi is the proportion of instances of class ci in the set.

2. Information Gain

A measure of how well a certain quality reduces uncertainty is called Information Gain. ID3 splits the data at each stage, choosing the property that maximizes Information Gain. It is computed using the distinction between entropy prior to and following the split.

Information Gain measures the effectiveness of an attribute A in reducing uncertainty in set S.

Where, |Sv | is the size of the subset of S for which attribute A has value v.

3. Gain Ratio

Gain Ratio is an improvement on Information Gain that considers the inherent worth of characteristics that have a wide range of possible values. It deals with the bias of Information Gain in favor of characteristics with more pronounced values.

Iterative Dichotomiser 3 (ID3) Implementation using Python

Let’s create a simplified version of the ID3 algorithm from scratch using Python.

Importing Libraries

Importing the necessary libraries:

from collections import Counter
import numpy as np

                    

Defining Node Class

class Node:
    def __init__(self, feature=None, value=None, results=None, true_branch=None, false_branch=None):
        self.feature = feature  # Feature to split on
        self.value = value      # Value of the feature to split on
        self.results = results  # Stores class labels if node is a leaf node
        self.true_branch = true_branch  # Branch for values that are True for the feature
        self.false_branch = false_branch  # Branch for values that are False for the feature

                    

The provided Python code defines a class called Node for constructing nodes in a decision tree. Each node encapsulates information crucial for decision-making within the tree. The feature attribute signifies the feature used for splitting, while value stores the specific value of that feature for the split. In the case of a leaf node, results holds class labels. The node also has branches, with true_branch representing the path for values evaluating to True for the feature, and false_branch for values evaluating to False. This class forms a fundamental building block for creating decision trees, enabling the representation of decision points and outcomes in a hierarchical structure.

Entropy Calculation Function

def entropy(data):
    counts = np.bincount(data)
    probabilities = counts / len(data)
    entropy = -np.sum([p * np.log2(p) for p in probabilities if p > 0])
    return entropy

                    

The entropy function calculates the entropy of a given dataset using the formula for information entropy. It first computes the counts of occurrences for each unique element in the dataset using np.bincount. Then, it calculates the probabilities of each element and uses these probabilities to compute the entropy using the standard formula – . The function ensures that the logarithm is not taken for zero probabilities, avoiding mathematical errors. The result is the entropy value for the input dataset, reflecting its degree of disorder or uncertainty.

Splitting Data Function

def split_data(X, y, feature, value):
    true_indices = np.where(X[:, feature] <= value)[0]
    false_indices = np.where(X[:, feature] > value)[0]
    true_X, true_y = X[true_indices], y[true_indices]
    false_X, false_y = X[false_indices], y[false_indices]
    return true_X, true_y, false_X, false_y

                    

The split_data function divides a dataset into two subsets based on a specified feature and threshold value. It uses NumPy to identify indices where the feature values satisfy the condition (<= value for the true branch and > value for the false branch). Then, it extracts the corresponding subsets for features (true_X and false_X) and labels (true_y and false_y). The function returns these subsets, enabling the partitioning of data for further use in constructing a decision tree.

Building the Tree Function

def build_tree(X, y):
    if len(set(y)) == 1:
        return Node(results=y[0])
 
    best_gain = 0
    best_criteria = None
    best_sets = None
    n_features = X.shape[1]
 
    current_entropy = entropy(y)
 
    for feature in range(n_features):
        feature_values = set(X[:, feature])
        for value in feature_values:
            true_X, true_y, false_X, false_y = split_data(X, y, feature, value)
            true_entropy = entropy(true_y)
            false_entropy = entropy(false_y)
            p = len(true_y) / len(y)
            gain = current_entropy - p * true_entropy - (1 - p) * false_entropy
 
            if gain > best_gain:
                best_gain = gain
                best_criteria = (feature, value)
                best_sets = (true_X, true_y, false_X, false_y)
 
    if best_gain > 0:
        true_branch = build_tree(best_sets[0], best_sets[1])
        false_branch = build_tree(best_sets[2], best_sets[3])
        return Node(feature=best_criteria[0], value=best_criteria[1], true_branch=true_branch, false_branch=false_branch)
 
    return Node(results=y[0])

                    

The build_tree function recursively constructs a decision tree using the ID3 algorithm. It first checks if the labels in the current subset are homogenous; if so, it creates a leaf node with the corresponding class label. Otherwise, it iterates through all features and values, calculating information gain for each split and identifying the one with the highest gain. The function then recursively calls itself to build the true and false branches using the best split criteria. The resulting decision tree is constructed and returned. The process continues until further splits do not yield positive information gain, resulting in the creation of leaf nodes.

Prediction Function

def predict(tree, sample):
    if tree.results is not None:
        return tree.results
    else:
        branch = tree.false_branch
        if sample[tree.feature] <= tree.value:
            branch = tree.true_branch
        return predict(branch, sample)

                    

The predict function uses a trained decision tree to predict the class label for a given sample. It recursively navigates the tree by checking if the current node is a leaf node (indicated by non-None results). If it is a leaf, it returns the class labels. Otherwise, it determines the next branch to traverse based on the feature value of the sample compared to the node’s splitting criteria. The function then calls itself with the appropriate branch until a leaf node is reached, providing the final predicted class labels for the input sample.

Dataset and Tree Building

X = np.array([[1, 1], [1, 0], [0, 1], [0, 0]])
y = np.array([1, 1, 0, 0])
 
# Building the tree
decision_tree = build_tree(X, y)

                    

The code creates a dataset X with binary features and their corresponding labels y. Then, it constructs a decision tree using the build_tree function, which recursively builds the tree using the ID3 algorithm based on the provided dataset. The resulting decision_tree is the root node of the constructed decision tree.

Prediction

sample = np.array([1, 0])
prediction = predict(decision_tree, sample)
print(f"Prediction for sample {sample}: {prediction}")

                    

Output:

Prediction for sample [1 0]: 1


Advantages and Limitations of ID3

Advantages

Limitations

Conclusion

The ID3 algorithm laid the groundwork for decision tree learning, providing a robust framework for understanding attribute selection and recursive partitioning. Despite its limitations, ID3’s simplicity and interpretability have paved the way for more sophisticated algorithms that address its drawbacks while retaining its essence.

As machine learning continues to evolve, the ID3 algorithm remains a crucial piece in the mosaic of tree-based methods, serving as a stepping stone for developing more advanced and accurate models in the quest for efficient data analysis and pattern recognition.


Article Tags :