Difference Between Classification and Prediction methods in Data Mining
Classification and prediction are two main methods used to mine the data. We use these two techniques to analyze the data, to explore more about unknown data.
Classification is the process of finding a good model that describes the data classes or concepts, and the purpose of classification is to predict the class of objects whose class label is unknown. In simple terms, we can think of Classification as categorizing the incoming new data based on our current or past assumptions that we have made and the data that we already have with us.
We can think of prediction is like something that may go to happen in the future. And just like that in prediction, we identify or predict the missing or unavailable data for a new observation based on the previous data that we have and based on the future assumptions. In prediction, the output is a continuous value.
Difference between Prediction and Classification:
|1.||Prediction is about predicting a missing/unknown element(continuous value) of a dataset||Classification is about determining a (categorial) class (or label) for an element in a dataset|
|2.||Eg. We can think of prediction as predicting the correct treatment for a particular disease for an individual person.||Eg. Whereas the grouping of patients based on their medical records can be considered classification.|
|3.||The model used to predict the unknown value is called a predictor.||The model used to classify the unknown value is called a classifier.|
|4.||The predictor is constructed from a training set and its accuracy refers to how well it can estimate the value of new data.||A classifier is also constructed from a training set composed of the records of databases and their corresponding class names|
Comparison of Classification and Prediction Methods:
Here are the few criteria that we will be used for comparing the methods of Classification and Prediction:
- Accuracy: Accuracy of the classifier can be referred to as the ability of the classifier to predicts the class label correctly, and the accuracy of the predictor can be referred to as how well a given predictor can estimate the unknown value.
- Speed: The speed of the method depends on the computational cost of generating and using the classifier/predictor.
- Robustness: Robustness is the ability to make correct predictions or classifications, in the context of data mining robustness is the ability of the classifier or predictor to make correct predictions from incoming unknown data.
- Scalability: Scalability is referring to an increase or decrease in performance of the classifier or predictor based on the given data.
- Interpretability: Interpretability can be referred to as how readily we can understand the reasoning behind predictions or classification made by the predictor or classifier.
Issues regarding Classification and Prediction:
There are primarily 2 main operations we have to perform on data before applying classification or prediction methods:
- Data Cleaning: In Layman’s terms, data cleaning is referred to as the preprocessing of the data, removing the noise from the data, cleaning the data, and fixing the missing or unknown values from the data.
- Relevance Analysis: After cleaning the data, we have to do an analysis on data to find the relevant data according to the problem. For example, we use correlation analysis to compare the various classes in the classification method. After cleaning the data and analyzing the data, we might need to normalize the resultant data, because normalized data gives more accuracy while predicting an unknown value. Normalization can be achieved by scaling all the values in the dataset from 0 to 1 in the range.