Classifier Comparison in Scikit Learn
Last Updated :
09 Jan, 2023
In scikit-learn, a classifier is an estimator that is used to predict the label or class of an input sample. There are many different types of classifiers that can be used in scikit-learn, each with its own strengths and weaknesses.
Let’s load the iris datasets from the sklearn.datasets and then train different types of classifier using it.
Python3
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
|
Support Vector Machines (SVMs)
SVMs are a popular classification algorithm that uses a hyperplane to separate classes in the feature space. They are effective for high-dimensional data and can handle non-linear boundaries.
Python3
from sklearn.svm import SVC
svm = SVC()
svm_scores = cross_val_score(svm, X, y, cv = 10 )
print ( 'SVM score: %0.3f' % svm_scores.mean())
|
Output:
SVM score: 0.973
Naive Bayes Classifier
Naive Bayes is a simple but powerful classification algorithm that assumes independence between features. It is fast and efficient, making it a good choice for large datasets.
Python3
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
nb_scores = cross_val_score(nb, X, y, cv = 10 )
print ( 'Naive Bayes score: %0.3f' % nb_scores.mean())
|
Output:
Naive Bayes score: 0.953
Random Forest Classifier
Random forest is an ensemble method that uses multiple decision trees to make predictions. It is often more accurate than a single decision tree and can handle large datasets and complex boundaries.
Python3
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf_scores = cross_val_score(rf, X, y, cv = 10 )
print ( 'Random Forest score: %0.3f' % rf_scores.mean())
|
Output:
Random Forest score: 0.967
K-Nearest Neighbors (KNN)
KNN is a non-parametric classification algorithm that uses the K nearest data points to a given point to make a prediction. It is simple to implement and can handle both numerical and categorical data.
Python3
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn_scores = cross_val_score(knn, X, y, cv = 10 )
print ( 'KNN score: %0.3f' % knn_scores.mean())
|
Output:
KNN score: 0.967
Overall, the best classifier will depend on the specific dataset and the desired outcome. It may be necessary to try multiple classifiers to find the most effective one for a given problem.
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...