Open In App

Sklearn | Iterative Dichotomiser 3 (ID3) Algorithms

Last Updated : 26 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

The ID3 algorithm is a popular decision tree algorithm used in machine learning. It aims to build a decision tree by iteratively selecting the best attribute to split the data based on information gain. Each node represents a test on an attribute, and each branch represents a possible outcome of the test. The leaf nodes of the tree represent the final classifications. In this article, we will learn how to use the ID3 algorithm to build a decision tree to predict the output in detail.

What is a Decision Tree?

A decision tree is a flowchart-like representation, with internal nodes representing features, branches representing rules, and leaf nodes representing algorithm results This versatile supervised machine-learning algorithm applies to both classification and regression problems, ie and power. Decision trees are valued for their interpretability, as the rules they generate are easy to understand.

What is Iterative Dichotomiser3 Algorithm?

ID3 or Iterative Dichotomiser3 Algorithm is used in machine learning for building decision trees from a given dataset. It was developed in 1986 by Ross Quinlan. It is a greedy algorithm that builds a decision tree by recursively partitioning the data set into smaller and smaller subsets until all data points in each subset belong to the same class. It employs a top-down approach, recursively selecting features to split the dataset based on information gain.

Thе ID3 (Iterative Dichotomiser 3) algorithm is a classic decision tree algorithm used for both classification and regression tasks.ID3 deals primarily with categorical properties, which means that it can efficiently handle objects with a discrete set of values. This property is consistent with its suitability for problems where the input features are categorical rather than continuous.One of the strengths of ID3 is its ability to generate interpretable decision trees. The resulting tree structure is easily understood and visualized, providing insight into the decision-making process. However, ID3 can be sensitive to noisy data and prone to overfitting, capturing details in the training data that may not adequately account for new unseen data.

How ID3 Algorithms work?

The ID3 algorithm works by building a decision tree, which is a hierarchical structure that classifies data points into different categories and splits the dataset into smaller subsets based on the values of the features in the dataset. The ID3 algorithm then selects the feature that provides the most information about the target variable. The decision tree is built top-down, starting with the root node, which represents the entire dataset. At each node, the ID3 algorithm selects the attribute that provides the most information gain about the target variable. The attribute with the highest information gain is the one that best separates the data points into different categories.

ID3 metrices

The ID3 algorithm utilizes metrics related to information theory, particularly entropy and information gain, to make decisions during the tree-building process.

Information Gain and Attribute Selection

The ID3 algorithm uses a measure of impurity, such as entropy or Gini impurity, to calculate the information gain of each attribute. Entropy is a measure of disorder in a dataset. A dataset with high entropy is a dataset where the data points are evenly distributed across the different categories. A dataset with low entropy is a dataset where the data points are concentrated in one or a few categories.

H(S)= Σ -(P_i * log_2(P_i))

  • where, P_i    represents the fraction of the sample within a particular node.
  • S – The current dataset.
  • i – Set of classes in S

If entropy is low, data is well understood; if high, more information is needed. Preprocessing data before using ID3 can enhance accuracy. In sum, ID3 seeks to reduce uncertainty and make informed decisions by picking attributes that offer the most insight in a dataset.

Information gain assesses how much valuable information an attribute can provide. We select the attribute with the highest information gain, which signifies its potential to contribute the most to understanding the data. If information gain is high, it implies that the attribute offers a significant insight. ID3 acts like an investigator, making choices that maximize the information gain in each step. This approach aims to minimize uncertainty and make well-informed decisions, which can be further enhanced by preprocessing the data.

IG(A,D) = H(S) - \sum_v \frac{|S_v|}{|S|} \times H(S_v)]

  • where, | S|  is the total number of instances in dataset.
  • |S_v|  is the number of instances in dataset for which attribute D has values v.
  • H(S)  is the entropy of dataset.

What are the steps in ID3 algorithm?

  1. Determine entropy for the overall the dataset using class distribution.
  2. For each feature.
    • Calculate Entropy for Categorical Values.
    • Assess information gain for each unique categorical value of the feature.
  3. Choose the feature that generates highest information gain.
  4. Iteratively apply all above steps to build the decision tree structure.

Pseudocode of ID3

def ID3(D, A):
  if D is pure or A is empty:
    return a leaf node with the majority class in D
  else:
    A_best = argmax(InformationGain(D, A))
    root = Node(A_best)
    for v in values(A_best):
      D_v = subset(D, A_best, v)
      child = ID3(D_v, A - {A_best})
      root.add_child(v, child)
    return root

Advantages of ID3

  • Simple and easy to understand.
  • Requires little training data.
  • Can work well with data with discrete and continuous attributes.

Disadvantages of ID3

  • Can lead to overfitting.
  • May not be effective with data with many attributes.

Applications of ID3

  1. Fraud detection: ID3 can be used to develop models that can detect fraudulent transactions or activities.
  2. Medical diagnosis: ID3 can be used to develop models that can diagnose diseases or medical conditions.
  3. Customer segmentation: ID3 can be used to segment customers into different groups based on their demographics, purchase history, or other factors.
  4. Risk assessment: ID3 can be used to assess risk in a variety of different areas, such as insurance, finance, and healthcare.
  5. Recommendation systems: ID3 can be used to develop recommendation systems that can recommend products, services, or content to users based on their past behavior or preferences.
    • Amazon uses ID3 to recommend products to its customers.
    • Netflix uses ID3 to recommend movies and TV shows to its users.
    • Spotify uses ID3 to recommend songs and playlists to its users.
    • LendingClub uses ID3 to assess the risk of approving loans to borrowers.
    • Healthcare organizations use ID3 to diagnose diseases, predict patient outcomes, and develop personalized treatment plans.

Python Implementation for ID3 algorithm

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code. Python is a programming language that is widely used for machine learning, data analysis, and visualization. To use Python for the ID3 decision tree algorithm, we need to import the following libraries:

  • pandas: For data analysis and manipulation
  • scikit-learn: For machine learning algorithms

Python3

import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt
import math

                    

Importing Dataset

You can download the .csv file from: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

Python3

df = pd.read_csv('/content/diabetes.csv')
df.head()

                    

Output:

Pregnancies    Glucose    BloodPressure    SkinThickness    Insulin    BMI    DiabetesPedigreeFunction    Age    Outcome
0    6    148    72    35    0    33.6    0.627    50    1
1    1    85    66    29    0    26.6    0.351    31    0
2    8    183    64    0    0    23.3    0.672    32    1
3    1    89    66    23    94    28.1    0.167    21    0
4    0    137    40    35    168    43.1    2.288    33    1


Step 1: Calculating Entropy for dataset

The code defines a function, calculate_entropy, which computes entropy for a dataset based on a specified target column. It starts by determining the total number of rows and unique values in the target column. Then, it iterates through these values, calculating the proportion of instances for each value and updating the entropy accordingly.

Python3

def calculate_entropy(data, target_column):
    total_rows = len(data)
    target_values = data[target_column].unique()
 
    entropy = 0
    for value in target_values:
        # Calculate the proportion of instances with the current value
        value_count = len(data[data[target_column] == value])
        proportion = value_count / total_rows
        entropy -= proportion * math.log2(proportion)
 
    return entropy
 
entropy_outcome = calculate_entropy(df, 'Outcome')
print(f"Entropy of the dataset: {entropy_outcome}")

                    

Output:

Entropy of the dataset: 0.9331343166407831

Step 2: Calculating Entropy and Information Gain

Two functions are defined below for calculating entropy and information gain. The `calculate_entropy` function computes entropy for a given dataset and target column by iterating over unique values in the target column, determining the proportion of instances for each value, and using the log formula to update the entropy.

The `calculate_information_gain` function calculates the information gain for a specified feature by computing the weighted average entropy of subsets created by splitting the data based on that feature. The final information gain is obtained by subtracting this weighted entropy from the overall entropy of the dataset. This approach helps assess the effectiveness of a feature in reducing uncertainty about the target variable..

Python3

def calculate_entropy(data, target_column): # for each categorical variable
    total_rows = len(data)
    target_values = data[target_column].unique()
 
    entropy = 0
    for value in target_values:
        # Calculate the proportion of instances with the current value
        value_count = len(data[data[target_column] == value])
        proportion = value_count / total_rows
        entropy -= proportion * math.log2(proportion) if proportion != 0 else 0
 
    return entropy
   
  def calculate_information_gain(data, feature, target_column):
 
    # Calculate weighted average entropy for the feature
    unique_values = data[feature].unique()
    weighted_entropy = 0
 
    for value in unique_values:
        subset = data[data[feature] == value]
        proportion = len(subset) / len(data)
        weighted_entropy += proportion * calculate_entropy(subset, target_column)
 
    # Calculate information gain
    information_gain = entropy_outcome - weighted_entropy
 
    return information_gain

                    

Step 3: Assessing best feature with highest information gain

The code iterates over each column in the DataFrame, excluding the last column (‘Outcome’), and calculates both entropy and information gain for each column concerning the target variable (‘Outcome’). For each iteration, it computes the entropy using the calculate_entropy function and the information gain using the calculate_information_gain function. Hence, feature with highest information gain is attained.

Python3

for column in df.columns[:-1]:
    entropy = calculate_entropy(df, column)
    information_gain = calculate_information_gain(df, column, 'Outcome')
    print(f"{column} - Entropy: {entropy:.3f}, Information Gain: {information_gain:.3f}")

                    

Output:

Pregnancies - Entropy: 3.482, Information Gain: 0.062
Glucose - Entropy: 6.751, Information Gain: 0.304
BloodPressure - Entropy: 4.792, Information Gain: 0.059
SkinThickness - Entropy: 4.586, Information Gain: 0.082
Insulin - Entropy: 4.682, Information Gain: 0.277
BMI - Entropy: 7.594, Information Gain: 0.344
DiabetesPedigreeFunction - Entropy: 8.829, Information Gain: 0.651
Age - Entropy: 5.029, Information Gain: 0.141

Once the ID3 algorithm has selected the attribute with the highest information gain, it splits the data set on that attribute. The data points with each value of the attribute are placed in a separate subset. The ID3 algorithm is then recursively applied to each subset, until all of the subsets are pure, meaning that all of the data points in each subset belong to the same category.

Let’s Plot the decision tree built so far-

Python3

# Feature selection for the first step in making decision tree
selected_feature = 'DiabetesPedigreeFunction'
 
# Create a decision tree
clf = DecisionTreeClassifier(criterion='entropy', max_depth=1)
X = df[[selected_feature]]
y = df['Outcome']
clf.fit(X, y)
 
plt.figure(figsize=(8, 6))
plot_tree(clf, feature_names=[selected_feature], class_names=['0', '1'], filled=True, rounded=True)
plt.show()

                    

Output:


download-(10)

Step 4: Built ID3 Algorithm

Python3

def id3(data, target_column, features):
    if len(data[target_column].unique()) == 1:
        return data[target_column].iloc[0]
 
  
    if len(features) == 0:
        return data[target_column].mode().iloc[0]
 
    best_feature = max(features, key=lambda x: calculate_information_gain(data, x, target_column))
 
    tree = {best_feature: {}}
 
    features = [f for f in features if f != best_feature]
 
    for value in data[best_feature].unique():
        subset = data[data[best_feature] == value]
        tree[best_feature][value] = id3(subset, target_column, features)
 
    return tree

                    

Output:

{‘DiabetesPedigreeFunction’: {0.627: 1, 0.351: 0, 0.672: 1, 0.167: 0, 2.288: 1, 0.201: 0, 0.248: {‘Pregnancies’: {3: 1, 1: 0}}, 0.134: 0, 0.158: ………………………. {‘Pregnancies’: {0: 1, 2: 0}}, 0.133: 0, 0.155: 0, 1.162: 0, 1.292: 1, 0.182: 0, 1.394: 1, 0.217: 0, 0.631: 0, 0.88: 0, 0.614: 0, 0.332: 0, 0.366: 0, 0.181: 0, 0.828: 0, 0.335: 1, 0.856: 0, 0.886: 0, 0.439: {‘Pregnancies’: {7: 1, 5: 0}}, 0.253: 0, 0.598: 0, 0.904: 0, 0.483: 0, 0.565: 1, 0.118: 0, 0.177: 0, 0.176: 0, 0.295: 0, 0.441: 1, 0.352: 0, 0.826: 1, 0.97: 1, 0.595: 0, 0.317: 0, 0.265: 0, 0.646: 1, 0.426: 0, 0.56: 0, 0.515: 0, 0.453: 0, 0.785: 1, 0.734: 1, 1.174: 0, 0.488: 0, 0.358: 1, 1.096: 0, 0.408: 1, 1.182: 1, 0.222: 1, 1.057: 1, 0.766: 0, 0.171: 0}}

Conclusion

In conclusion, ID3 is a way of solving large problems by breaking them into small parts. It starts with a big puzzle, and in each step, it looks at the best clues to figure things out. For example, it may find that age is the best clue to predict if someone will buy a game. Then it goes deeper, looking at income or other clues to make the final decision. It keeps on doing this until it can’t find better clues. ID3 works by making sense of how messy the puzzle is. If a group of people has a clear pattern, like mostly saying “Yes” to buying a game, that becomes a rule for solving the mystery. So, ID3 helps us find answers in complex data.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads