Tasks and Functionalities of Data Mining
Data Mining functions are used to define the trends or correlations contained in data mining activities.
In comparison, data mining activities can be divided into 2 categories:
- Descriptive Data Mining:
It includes certain knowledge to understand what is happening within the data without a previous idea. The common data features are highlighted in the data set.
For examples: count, average etc.
- Predictive Data Mining:
It helps developers to provide unlabeled definitions of attributes. Based on previous tests, the software estimates the characteristics that are absent.
For example: Judging from the findings of a patient’s medical examinations that is he suffering from any particular disease.
Data Mining Functionality:
1. Class/Concept Descriptions:
Classes or definitions can be correlated with results. In simplified, descriptive and yet accurate ways, it can be helpful to define individual groups and concepts.
These class or concept definitions are referred to as class/concept descriptions.
- Data Characterization:
This refers to the summary of general characteristics or features of the class that is under the study. For example. To study the characteristics of a software product whose sales increased by 15% two years ago, anyone can collect these type of data related to such products by running SQL queries.
- Data Discrimination:
It compares common features of class which is under study. The output of this process can be represented in many forms. Eg., bar charts, curves and pie charts.
2. Mining Frequent Patterns, Associations, and Correlations:
Frequent patterns are nothing but things that are found to be most common in the data.
There are different kinds of frequency that can be observed in the dataset.
- Frequent item set:
This applies to a number of items that can be seen together regularly for eg: milk and sugar.
- Frequent Subsequence:
This refers to the pattern series that often occurs regularly such as purchasing a phone followed by a back cover.
- Frequent Substructure:
It refers to the different kinds of data structures such as trees and graphs that may be combined with the itemset or subsequence.
The process involves uncovering the relationship between data and deciding the rules of the association. It is a way of discovering the relationship between various items. for example, it can be used to determine the sales of items that are frequently purchased together.
Correlation is a mathematical technique that can show whether and how strongly the pairs of attributes are related to each other. For example, Highted people tend to have more weight.