Open In App

Implementation and Components in Data Warehouse

Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite –  Architecture of Data Warehouse

Data Warehouse is used to store historical data which helps to make strategic decisions for the business. It is used for Online Analytical Processing (OLAP) which helps to analyze the data. The data warehouse contributes to business executives in systematically organizing, accepting, and using their data to make strategic decisions.

What exactly is a Data Warehouse?

Data Warehouse has been defined in many ways, making it difficult to formulate a rigorous definition. Gradually speaking, a data warehouse is a data repository that is kept separate from an organization’s operational database. Data warehouse systems allow the integration of a wide variety of application systems. They support information processing by providing a solid plan of aggregated historical data for analysis.

Data in a data warehouse comes from the organization’s operational systems as well as other external sources. These are collectively referred to as the source systems. The data extracted from the source systems is stored in an area called the data staging area, where the data is cleaned, transformed, assembled, and duplicated to prepare the data in the data warehouse.

The data staging area is usually a set of machines where simple activities like sorting and sequential processing take place. The data staging area does not provide as soon as possible a system provides query or presentation services, it is classified as a presentation server. A presentation server is the destination machine on which data is loaded from the data staging area and directly stored for query by end-users, report authors, and other applications.

There are three different types of systems required for a data warehouse –

  1. Source Systems
  2. Data Staging Area
  3. Presentation Server
     

Data Warehouse Architecture

The data moves from the data source area through the staging area to the presentation server. The entire process is better known as ETL (extract, transform, and load) or ETT (extract, transform, and transfer).

Components of Data Warehouse Architecture and their tasks :

1. Operational Source –

  • An operational Source is a data source consists of Operational Data and External Data.
  • Data can come from Relational DBMS like Informix, Oracle.

2. Load Manager –

  • The Load Manager performs all operations associated with the extraction of loading data in the data warehouse.
  • These tasks include the simple transformation of data to prepare data for entry into the warehouse.

3. Warehouse Manage –

  • The warehouse manager is responsible for the warehouse management process.
  • The operations performed by the warehouse manager are the analysis, aggregation, backup and collection of data, de-normalization of the data.

4. Query Manager –

  • Query Manager performs all the tasks associated with the management of user queries.
  • The complexity of the query manager is determined by the end-user access operations tool and the features provided by the database.

5. Detailed Data –

  • It is used to store all the detailed data in the database schema.
  • Detailed data is loaded into the data warehouse to complement the data collected.

6. Summarized Data –

  • Summarized Data is a part of the data warehouse that stores predefined aggregations
  • These aggregations are generated by the warehouse manager.

7. Archive and Backup Data –

  • The Detailed and Summarized Data are stored for the purpose of archiving and backup.
  • The data is relocated to storage archives such as magnetic tapes or optical disks.

8. Metadata –

  • Metadata is basically data stored above data.
  • It is used for extraction and loading process, warehouse, management process, and query management process.

9. End User Access Tools –

  • End-User Access Tools consist of Analysis, Reporting, and mining.
  • By using end-user access tools users can link with the warehouse.

 Advantages and disadvantages of the components commonly found in data warehouses:

Data sources: Data sources are the systems or databases that provide data to the data warehouse. Advantages of using multiple data sources include increased data coverage and the ability to integrate diverse data types. However, disadvantages include potential data quality issues, data inconsistencies, and increased complexity in data integration.

ETL (Extract, Transform, Load) processes: ETL processes are used to extract data from source systems, transform it to conform to the data warehouse schema, and load it into the data warehouse. Advantages of ETL processes include efficient data integration and improved data quality. However, disadvantages include potential data loss or corruption, increased processing time and complexity, and potential data inconsistency due to data transformations.

Data storage: Data storage is the component of the data warehouse that stores the data. Advantages of data storage in a data warehouse include the ability to store large amounts of data in a single location, fast and efficient data retrieval, and improved data quality due to data cleansing and standardization. Disadvantages include the high cost of data storage, potential data loss or corruption, and potential security risks associated with storing large amounts of sensitive data in a single location.

Data modeling: Data modeling is the process of designing the structure of the data warehouse. Advantages of data modeling include the ability to organize and structure data in a way that is optimized for BI activities, improved data quality due to data cleansing and standardization, and increased scalability and flexibility. However, disadvantages include the potential for complex data relationships and the need for specialized skills and knowledge to design and implement an effective data model.

Data access tools: Data access tools are used to access and analyze data in the data warehouse. Advantages of data access tools include the ability to easily access and analyze data, improved data quality due to data cleansing and standardization, and increased speed and efficiency of BI activities. Disadvantages include the potential for user error, the need for specialized skills and knowledge to use the tools effectively, and potential security risks associated with data access.


Last Updated : 25 Apr, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads