Skip to content
Related Articles

Related Articles

Improve Article

Implementing the AdaBoost Algorithm From Scratch

  • Last Updated : 16 Mar, 2021

AdaBoost models belong to a class of ensemble machine learning models. From the literal meaning of the word ‘ensemble’, we can easily have much better intuition of how this model works. Ensemble models take the onus of combining different models and later produce an advanced/more accurate meta model. This meta model has comparatively high accuracy in terms of prediction as compared to their corresponding counterparts. We have read about the working of these ensemble models in the article Ensemble Classifier | Data Mining.

AdaBoost algorithm falls under ensemble boosting techniques, as discussed it combines multiple models to produce more accurate results and this is done in two phases:

  1. Multiple weak learners are allowed to learn on training data
  2. Combining these models to generate a meta-model, this meta-model aims to resolve the errors as performed by the individual weak learners.

Note: For more information, refer Boosting ensemble models

In this article, we are going to learn about the practical implementation of AdaBoost classifier over a dataset.

In this problem, we are given a dataset containing 3 species of flowers and features of these flowers such as- sepal length, sepal width, petal length, and petal width, and we have to classify the flowers into these species. The dataset can be downloaded from here



Let’s begin with importing important libraries that we will require to do our classification task:

Python




import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
import warnings
warnings.filterwarnings("ignore")

 After, importing the libraries we will load our dataset using the pandas read_csv method as:

Python




# Reading the dataset from the csv file
# separator is a vertical line, as seen in the dataset
data = pd.read_csv("Iris.csv")
  
# Printing the shape of the dataset
print(data.shape)
(150, 6)

We can see our dataset contains 150 rows and 6 columns. Let us take a look at our actual content in the dataset using head() method as:

Python




data.head()
 IdSepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCm Species
015.13.51.40.2Iris-setosa
124.93.01.40.2Iris-setosa
234.73.21.30.2Iris-setosa
344.63.11.50.2Iris-setosa
455.03.61.40.2Iris-setosa

The first column is the Id column which has no relevance with flowers so, we will drop it. The Species column is our target feature and tells us about the species to which the flowers belong.

Python






data = data.drop('Id',axis=1)
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
print("Shape of X is %s and shape of y is %s"%(X.shape,y.shape))
Shape of X is (150, 4) and shape of y is (150,)

Python




total_classes = y.nunique()
print("Number of unique species in dataset are: ",total_classes)
Number of unique species in dataset are: 3

Python




distribution = y.value_counts()
print(distribution)
Iris-virginica     50
Iris-setosa        50
Iris-versicolor    50
Name: Species, dtype: int64

Let’s dig deep in our dataset, and we can see in the above image that our dataset contains 3 classes into which our flowers are distributed also, since we have 150 samples all three species have an equal number of samples in the dataset, so we have no class imbalance.

Now, we will split the dataset for training and validation purpose, the validation set is 25% of the total dataset.

Python




X_train,X_val,Y_train,Y_val = train_test_split(X,y,test_size=0.25,random_state=28)

After creating the training and validation set we will build our AdaBoost classifier model and fit it over the train set for learning.

Python




# Creating adaboost classifier model
adb = AdaBoostClassifier()
adb_model = adb.fit(X_train,Y_train)

As we fit our model on the train set, we will check the accuracy of our model on the validation set.

Python




print("The accuracy of the model on validation set is", adb_model.score(X_val,Y_val))
The accuracy of the model on validation set is 0.9210526315789473

As we can see the model has an accuracy of 92% on the validation set which is quite good with no hyper parameter tuning and feature engineering.

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up
Recommended Articles
Page :