ML – Nearest Centroid Classifier
The Nearest Centroid (NC) Classifier is one of the most underrated and underutilised classifiers in Machine Learning. However, it is quite powerful and is highly efficient for certain Machine Learning classification tasks. The Nearest Centroid classifier is somewhat similar to the K-Nearest Neighbours classifier. To know more about the K-Nearest Neighbours (KNN) classifier, you can refer to the link below :
An often-overlooked principle in Machine Learning is to build simple algorithms off of simple, yet meaningful data, that can do specific tasks efficiently, instead of using complex models. This is also called the principle of sufficiency in statistics. The Nearest Centroid classifier is arguably the simplest Classification algorithm in Machine Learning. The Nearest Centroid classifier works on a simple principle : Given a data point (observation), the Nearest Centroid classifier simply assign it the label (class) of the training sample whose mean or centroid is closest to it.
When applied on text classification, the Nearest Centroid classifier is also called the Rocchio classifier. The scikit-learn library in Python offers a simple function to implement the Nearest Centroid Classifier.
How the nearest centroid classifier works?
Basically, what the nearest centroid classifier does can be explained in three steps:
- The centroid for each target class is computed while training.
- After training, given any point, say ‘X’. The distances between the point X and each class’ centroid is calculated.
- Out of all the calculated distances, the minimum distance is picked. The centroid to which the given point’s distance is minimum, it’s class is assigned to the given point.
The Nearest Centroid Classifier is quite easy to understand and is one of the simplest classifier algorithms.
Implementation of Nearest Centroid Classifier in Python:
For this example, we will be using the popular ‘iris’ dataset that is available in the scikit-learn library. After training the classifier, we will print the accuracy of the classifier on the training and test sets. Then, we print the classifier report.
Code: Python code implementing NearestCentroid classifier
Training Set Score : 94.16666666666667 % Test Set Score : 90.0 % Model Classification Report : precision recall f1-score support 0 1.00 1.00 1.00 11 1 0.86 0.92 0.89 13 2 0.80 0.67 0.73 6 accuracy 0.90 30 macro avg 0.89 0.86 0.87 30 weighted avg 0.90 0.90 0.90 30
So, we have managed to achieve an accuracy of 94.17% and 90% on the training and test sets respectively.
Now that you know what a Nearest Centroid Classifier is and how to implement it, you should try using it next time when you have some simple classification tasks that require a light-weight and simple classifier.