Data Mining: Data Warehouse Process
Data Warehouses are information gathered from multiple sources and saved under a schema that is living on the identical site. It is made with the aid of diverse techniques inclusive of the following processes :
1. Data Cleanup:
Data Cleaning is the way of preparing statistics for analysis with the help of getting rid of or enhancing incorrect, incomplete, irrelevant, duplicate or irregularly formatted information. This fact is commonly no longer necessary or beneficial if you want to research the statistics because it is able to interrupt the technique or supply false results.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
2. Data Integration:
Data integration is the process of integrating data from different assets right into a unified view. The integration manner starts with a startup and includes steps which include refinement, ETL mapping, and conversion. Data integration ultimately permits analytics tools to create powerful and cheap enterprise intelligence.
In a typical data integration procedure, the client sends a request for information to the master server. The master server prepares the vital records from internal and external assets. Extracts facts from sources and then integrates them into a single information set. It is then returned again to the client for use.
3. Data Transformation:
Data transformation is the manner of converting information from one layout or shape to another layout or structure. Data Transformation is critical for features which include data integration and information management. Data transformation has different capabilities: you could alternate the records types relying on the desires of your project, enrich or aggregate the records through casting off invalid or duplicate data.
Generally, the technique consists of two stages.
In the first step, you should:
- Perform an information search that identifies assets and data types.
- Determine the structure and information changes that occur.
- Mapping data to discover how character fields are mapped, edited, inserted, filtered, and stored.
In the second step, you must:
- Extract data from the original source. The size of the supply can range from a connected tool to a dependable useful resource along with a database or streaming resources, including telemetry or logging files from clients who use your web application.
- Send data to the target site.
- The target may be a database or a data warehouse that manages structured and unstructured records.
4. Loading Data:
Data loading is the manner of copying and loading data from a report, folder or application to a database or similar utility. This is usually done via copying digital data from the source and pasting or loading the records into a data warehouse or processing tools.
Data-loading is used in data extraction and loading methods. Typically, such information is loaded in a different format than the original location of the source.
5. Data Refreshing:
In this process, the data stored in the warehouse is periodically refreshed so that they maintain its integrity.
A data warehouse is a model of Multidimensional data structures that are known as “Data Cube” in which every dimension represents an attribute or different set of attributes in the schema of the data and each cell is used to store the value. Data is gathered from various sources such as Hospitals, Banks, Organizations and many more and goes through a process called ETL(Extract, Transform, Load).
- Extract: This process reads the data from the database of various sources.
- Transform: It transforms the data stored inside the databases into data cubes so that it can be loaded inside the warehouse.
- Load: It is a process of writing the transformed data into the data warehouse.
This process can be seen in the illustration below:
Features of Data Warehouse:
Please refer – Features of Data Warehouse.