ML – Nearest Centroid Classifier

The Nearest Centroid (NC) Classifier is one of the most underrated and underutilised classifiers in Machine Learning. However, it is quite powerful and is highly efficient for certain Machine Learning classification tasks. The Nearest Centroid classifier is somewhat similar to the K-Nearest Neighbours classifier. To know more about the K-Nearest Neighbours (KNN) classifier, you can refer to the link below :
K-Nearest-Neighbours/

An often-overlooked principle in Machine Learning is to build simple algorithms off of simple, yet meaningful data, that can do specific tasks efficiently, instead of using complex models. This is also called the principle of sufficiency in statistics. The Nearest Centroid classifier is arguably the simplest Classification algorithm in Machine Learning. The Nearest Centroid classifier works on a simple principle : Given a data point (observation), the Nearest Centroid classifier simply assign it the label (class) of the training sample whose mean or centroid is closest to it.
When applied on text classification, the Nearest Centroid classifier is also called the Rochhio classifier. The scikit-learn library in Python offers a simple function to implement the Nearest Centroid Classifier.
How the nearest centroid classifier works?
Basically, what the nearest centroid classifier does can be explained in three steps:

  • The centroid for each target class is computed while training.
  • After training, given any point, say ‘X’. The distances between the point X and each class’ centroid is calculated.
  • Out of all the calculated distances, the minimum distance is picked. The centroid to which the given point’s distance is minimum, it’s class is assigned to the given point.

The Nearest Centroid Classifier is quite easy to understand and is one of the simplest classifier algorithms.

Implementation of Nearest Centroid Classifier in Python:
For this example, we will be using the popular ‘iris’ dataset that is available in the scikit-learn library. After training the classifier, we will print the accuracy of the classifier on the training and test sets. Then, we print the classifier report.

Code: Python code implementing NearestCentroid classifier

filter_none

edit
close

play_arrow

link
brightness_4
code

# Importing the required libraries
from sklearn.neighbors import NearestCentroid
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import pandas as pd
  
# Loading the dataset
dataset = load_iris()
  
# Seperating data and target labels
X = pd.DataFrame(dataset.data)
y = pd.DataFrame(dataset.target)
  
# Splitting training and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle = True, random_state = 0)
  
# Creating the Nearest Centroid Clissifier
model = NearestCentroid()
  
# Training the classifier
model.fit(X_train, y_train.values.ravel())
  
# Printing Accuracy on Training and Test sets
print(f"Training Set Score : {model.score(X_train, y_train) * 100} %")
print(f"Test Set Score : {model.score(X_test, y_test) * 100} %")
  
# Printing classification report of classifier on the test set set data
print(f"Model Classification Report : \n{classification_report(y_test, model.predict(X_test))}")

chevron_right


Output:

Training Set Score : 94.16666666666667 %
Test Set Score : 90.0 %
Model Classification Report : 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       0.86      0.92      0.89        13
           2       0.80      0.67      0.73         6

    accuracy                           0.90        30
   macro avg       0.89      0.86      0.87        30
weighted avg       0.90      0.90      0.90        30

So, we have managed to achieve an accuracy of 94.17% and 90% on the training and test sets respectively.
Conclusion:
Now that you know what a Nearest Centroid Classifier is and how to implement it, you should try using it next time when you have some simple classification tasks that require a light-weight and simple classifier.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.