Building a Data Warehouse in DBMS

Last Updated : 25 Apr, 2023

A Data warehouse is a heterogeneous collection of different data sources organized under unified schema. Builders should take a broad view of the anticipated use of the warehouse while constructing a data warehouse. During the design phase, there is no way to anticipate all possible queries or analyses. Some characteristic of Data warehouse are:

Subject oriented
Integrated
Time Variant
Non-volatile

Building a Data Warehouse – Some steps that are needed for building any data warehouse are as following below:

To extract the data (transnational) from different data sources: For building a data warehouse, a data is extracted from various data sources and that data is stored in central storage area. For extraction of the data Microsoft has come up with an excellent tool. When you purchase Microsoft SQL Server, then this tool will be available at free of cost.
To transform the transnational data: There are various DBMS where many of the companies stores their data. Some of them are: MS Access, MS SQL Server, Oracle, Sybase etc. Also these companies saves the data in spreadsheets, flat files, mail systems etc. Relating a data from all these sources is done while building a data warehouse.
To load the data (transformed) into the dimensional database: After building a dimensional model, the data is loaded in the dimensional database. This process combines the several columns together or it may split one field into the several columns. There are two stages at which transformation of the data can be performed and they are: while loading the data into the dimensional model or while data extraction from their origins.
To purchase a front-end reporting tool: There are top notch analytical tools are available in the market. These tools are provided by the several major vendors. A cost effective tool and Data Analyzer is released by the Microsoft on its own.

For the warehouse there is an acquisition of the data. There must be a use of multiple and heterogeneous sources for the data extraction, example databases. There is a need for the consistency for which formation of data must be done within the warehouse. Reconciliation of names, meanings and domains of data must be done from unrelated sources. There is also a need for the installation of the data from various sources in the data model of the warehouse. Conversion of the data might be done from object oriented, relational or legacy databases to a multidimensional model. One of the largest labor demanding component of data warehouse construction is data cleaning, which is one of the complex process. Before loading of the data in the warehouse, there should be cleaning of the data. All the work of loading must be done in warehouse for better performance. The only feasible and better approach for it is incremental updating. Data storage in the data warehouse:

Refresh the data
To provide the time variant data
To store the data as per the data model of the warehouse
Purging the data
To support the updating of the warehouse data

Some of the important designs for the data warehouse are:

Modular component design
Consideration of the parallel architecture
Consideration of the distributed architecture
Usage protection
Characteristics of available sources
Design of the metadata component
The fit of the data model

The major determining characteristics for the design of the warehouse is the architecture of the organizations distributed computing environment. The distributed warehouse and the federated warehouse are the two basic distributed architecture.There are some benefits from the distributed warehouse, some of them are:

Improved load balancing
Scalability of performance
Higher availability

Federated warehouse is a decentralized confederation of autonomous data warehouses. Each of them has its own metadata repository.Now a days large organizations start choosing a federated data marts instead of building a huge data warehouse.

Advantages:

Familiarity: Building a data warehouse in a DBMS that an organization is already using can be advantageous, as it allows developers to use existing skills and knowledge to build and maintain the data warehouse.

Scalability: DBMSs are designed to handle large amounts of data, making them well-suited for building data warehouses that require storage and analysis of large volumes of data.

Integration: A data warehouse built in a DBMS can be integrated with other databases and applications in the organization, allowing for seamless data flow between systems.

Security: DBMSs offer robust security features that can help ensure the confidentiality, integrity, and availability of data in the data warehouse.

Performance: DBMSs are optimized for performance, which can result in faster data retrieval and processing times.

Disadvantages:

Complexity: Building a data warehouse in a DBMS can be complex, as it involves designing and implementing a database schema that is optimized for analytical processing.

Cost: DBMSs can be expensive, particularly for large-scale data warehouses that require high levels of processing power and storage.

Learning curve: Building a data warehouse in a DBMS may require specialized skills and knowledge, which can result in a steep learning curve for developers who are not familiar with the technology.

Maintenance: Maintaining a data warehouse in a DBMS requires ongoing effort, including monitoring for performance issues, ensuring data quality, and making updates as needed.

Data silos: Building a data warehouse in a DBMS can result in data silos if the data warehouse is not integrated with other databases and applications in the organization.

Suggest improvement

Attributes of Data Warehouse

Share your thoughts in the comments