Techniques To Evaluate Accuracy of Classifier in Data Mining

Last Updated : 30 Jan, 2023

Data Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. In this article, we will see techniques to evaluate the accuracy of classifiers.

HoldOut

In the holdout method, the largest dataset is randomly divided into three subsets:

A training set is a subset of the dataset which are been used to build predictive models.
The validation set is a subset of the dataset which is been used to assess the performance of the model built in the training phase. It provides a test platform for fine-tuning of the model’s parameters and selecting the best-performing model. It is not necessary for all modeling algorithms to need a validation set.
Test sets or unseen examples are the subset of the dataset to assess the likely future performance of the model. If a model is fitting into the training set much better than it fits into the test set, then overfitting is probably the cause that occurred here.

Basically, two-thirds of the data are been allocated to the training set and the remaining one-third is been allocated to the test set.

Random Subsampling

Random subsampling is a variation of the holdout method. The holdout method is been repeated K times.
The holdout subsampling involves randomly splitting the data into a training set and a test set.
On the training set the data is been trained and the mean square error (MSE) is been obtained from the predictions on the test set.
As MSE is dependent on the split, this method is not recommended. So a new split can give you a new MSE.
The overall accuracy is been calculated as E = 1/K \sum_{k}^{i=1} E_{i}

Cross-Validation

K-fold cross-validation is been used when there is only a limited amount of data available, to achieve an unbiased estimation of the performance of the model.
Here, we divide the data into K subsets of equal sizes.
We build models K times, each time leaving out one of the subsets from the training, and use it as the test set.
If K equals the sample size, then this is called a “Leave-One-Out”

Bootstrapping

Bootstrapping is one of the techniques which is used to make the estimations from the data by taking an average of the estimates from smaller data samples.
The bootstrapping method involves the iterative resampling of a dataset with replacement.
On resampling instead of only estimating the statistics once on complete data, we can do it many times.
Repeating this multiple times helps to obtain a vector of estimates.
Bootstrapping can compute variance, expected value, and other relevant statistics of these estimates.

Suggest improvement

Confusion Matrix in Machine Learning

Precision Handling in Python

Share your thoughts in the comments

Linear Model Regression

Linear Model Classification

Regularization

K-Nearest Neighbors (KNN)

Support Vector Machines

Decision Tree

Ensemble Learning

Generative Model

Time Series Forecasting

Supervised Dimensionality Reduction Technique

Metrics for Classification & Regression Algorithms

Cross Validation Technique

Optimization Technique

Clustering

Association Rule Mining

Anomaly Detection

Dimensionality Reduction Technique

Model-Based Methods

Model-Free Methods

Techniques To Evaluate Accuracy of Classifier in Data Mining

HoldOut

Random Subsampling

Cross-Validation

Bootstrapping

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?