Implementation and Components in Data Warehouse
Prerequisite – Architecture of Data Warehouse
Data Warehouse is used to store historical data which helps to make strategic decisions for the business. It is used for Online Analytical Processing (OLAP) which helps to analyze the data. The data warehouse contributes to business executives in systematically organizing, accepting, and using their data to make strategic decisions.
What exactly is a Data Warehouse?
Data Warehouse has been defined in many ways, making it difficult to formulate a rigorous definition. Gradually speaking, a data warehouse is a data repository that is kept separate from an organization’s operational database. Data warehouse systems allow the integration of a wide variety of application systems. They support information processing by providing a solid plan of aggregated historical data for analysis.
Data in a data warehouse comes from the organization’s operational systems as well as other external sources. These are collectively referred to as the source systems. The data extracted from the source systems is stored in an area called the data staging area, where the data is cleaned, transformed, assembled, and duplicated to prepare the data in the data warehouse.
The data staging area is usually a set of machines where simple activities like sorting and sequential processing take place. The data staging area dost not provide. As soon as possible a system provides query or presentation services, it is classified as a presentation server. A presentation server is the destination machine on which data is loaded from the data staging area and directly stored for query by end-users, report authors, and other applications.
There are three different types of systems required for a data warehouse –
- Source Systems
- Data Staging Area
- Presentation Server
The data moves from the data source area through the staging area to the presentation server. The entire process is better known as ETL (extract, transform, and load) or ETT (extract, transform, and transfer).
Components of Data Warehouse Architecture and their tasks :
1. Operational Source –
- An operational Source is a data source consists of Operational Data and External Data.
- Data can come from Relational DBMS like Informix, Oracle.
2. Load Manager –
- The Load Manager performs all operations associated with the extraction of loading data in the data warehouse.
- These tasks include the simple transformation of data to prepare data for entry into the warehouse.
3. Warehouse Manage –
- The warehouse manager is responsible for the warehouse management process.
- The operations performed by the warehouse manager are the analysis, aggregation, backup and collection of data, de-normalization of the data.
4. Query Manager –
- Query Manager performs all the tasks associated with the management of user queries.
- The complexity of the query manager is determined by the end-user access operations tool and the features provided by the database.
5. Detailed Data –
- It is used to store all the detailed data in the database schema.
- Detailed data is loaded into the data warehouse to complement the data collected.
6. Summarized Data –
- Summarized Data is a part of the data warehouse that stores predefined aggregations
- These aggregations are generated by the warehouse manager.
7. Archive and Backup Data –
- The Detailed and Summarized Data are stored for the purpose of archiving and backup.
- The data is relocated to storage archives such as magnetic tapes or optical disks.
8. Metadata –
- Metadata is basically data stored above data.
- It is used for extraction and loading process, warehouse, management process, and query management process.
9. End User Access Tools –
- End-User Access Tools consist of Analysis, Reporting, and mining.
- By using end-user access tools users can link with the warehouse.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.