KDD Process in Data Mining
Data Mining – Knowledge Discovery in Databases(KDD).
Why we need Data Mining?
Volume of information is increasing everyday that we can handle from business transactions, scientific data, sensor data, Pictures, videos, etc. So, we need a system that will be capable of extracting essence of information available and that can automatically generate report,
views or summary of data for better decision-making.
Why Data Mining is used in Business?
Data mining is used in business to make better managerial decisions by:
- Automatic summarization of data
- Extracting essence of information stored.
- Discovering patterns in raw data.
Data Mining also known as Knowledge Discovery in Databases, refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data stored in databases.
Steps Involved in KDD Process:
- Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from collection.
- Cleaning in case of Missing values.
- Cleaning noisy data, where noise is a random or variance error.
- Cleaning with Data discrepancy detection and Data transformation tools.
- Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in a common source(DataWarehouse).
- Data integration using Data Migration tools.
- Data integration using Data Synchronization tools.
- Data integration using ETL(Extract-Load-Transformation) process.
- Data Selection: Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collection.
- Data selection using Neural network.
- Data selection using Decision Trees.
- Data selection using Naive bayes.
- Data selection using Clustering, Regression, etc.
- Data Transformation: Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure.
Data Transformation is a two step process:
- Data Mapping: Assigning elements from source base to destination to capture transformations.
- Code generation: Creation of the actual transformation program.
- Data Mining: Data mining is defined as clever techniques that are applied to extract patterns potentially useful.
- Transforms task relevant data into patterns.
- Decides purpose of model using classification or characterization.
- Pattern Evaluation: Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on given measures.
- Find interestingness score of each pattern.
- Uses summarization and Visualization to make data understandable by user.
- Knowledge representation: Knowledge representation is defined as technique which utilizes visualization tools to represent data mining results.
- Generate reports.
- Generate tables.
- Generate discriminant rules, classification rules, characterization rules, etc.
- KDD is an iterative process where evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.
- Preprocessing of databases consists of Data cleaning and Data Integration.
Data Mining: Concepts and Techniques