Open In App

Building a Data Warehouse in DBMS

Last Updated : 25 Apr, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A Data warehouse is a heterogeneous collection of different data sources organized under unified schema. Builders should take a broad view of the anticipated use of the warehouse while constructing a data warehouse. During the design phase, there is no way to anticipate all possible queries or analyses. Some characteristic of Data warehouse are:

  • Subject oriented
  • Integrated
  • Time Variant
  • Non-volatile

Building a Data Warehouse – Some steps that are needed for building any data warehouse are as following below:

  1. To extract the data (transnational) from different data sources: For building a data warehouse, a data is extracted from various data sources and that data is stored in central storage area. For extraction of the data Microsoft has come up with an excellent tool. When you purchase Microsoft SQL Server, then this tool will be available at free of cost.
  2. To transform the transnational data: There are various DBMS where many of the companies stores their data. Some of them are: MS Access, MS SQL Server, Oracle, Sybase etc. Also these companies saves the data in spreadsheets, flat files, mail systems etc. Relating a data from all these sources is done while building a data warehouse.
  3. To load the data (transformed) into the dimensional database: After building a dimensional model, the data is loaded in the dimensional database. This process combines the several columns together or it may split one field into the several columns. There are two stages at which transformation of the data can be performed and they are: while loading the data into the dimensional model or while data extraction from their origins.
  4. To purchase a front-end reporting tool: There are top notch analytical tools are available in the market. These tools are provided by the several major vendors. A cost effective tool and Data Analyzer is released by the Microsoft on its own.

For the warehouse there is an acquisition of the data. There must be a use of multiple and heterogeneous sources for the data extraction, example databases. There is a need for the consistency for which formation of data must be done within the warehouse. Reconciliation of names, meanings and domains of data must be done from unrelated sources. There is also a need for the installation of the data from various sources in the data model of the warehouse. Conversion of the data might be done from object oriented, relational or legacy databases to a multidimensional model. One of the largest labor demanding component of data warehouse construction is data cleaning, which is one of the complex process. Before loading of the data in the warehouse, there should be cleaning of the data. All the work of loading must be done in warehouse for better performance. The only feasible and better approach for it is incremental updating. Data storage in the data warehouse:

  • Refresh the data
  • To provide the time variant data
  • To store the data as per the data model of the warehouse
  • Purging the data
  • To support the updating of the warehouse data

Some of the important designs for the data warehouse are:

  • Modular component design
  • Consideration of the parallel architecture
  • Consideration of the distributed architecture
  • Usage protection
  • Characteristics of available sources
  • Design of the metadata component
  • The fit of the data model

The major determining characteristics for the design of the warehouse is the architecture of the organizations distributed computing environment. The distributed warehouse and the federated warehouse are the two basic distributed architecture.There are some benefits from the distributed warehouse, some of them are: 

  • Improved load balancing
  • Scalability of performance
  • Higher availability

Federated warehouse is a decentralized confederation of autonomous data warehouses. Each of them has its own metadata repository.Now a days large organizations start choosing a federated data marts instead of building a huge data warehouse.

Advantages:

Familiarity: Building a data warehouse in a DBMS that an organization is already using can be advantageous, as it allows developers to use existing skills and knowledge to build and maintain the data warehouse.

Scalability: DBMSs are designed to handle large amounts of data, making them well-suited for building data warehouses that require storage and analysis of large volumes of data.

Integration: A data warehouse built in a DBMS can be integrated with other databases and applications in the organization, allowing for seamless data flow between systems.

Security: DBMSs offer robust security features that can help ensure the confidentiality, integrity, and availability of data in the data warehouse.

Performance: DBMSs are optimized for performance, which can result in faster data retrieval and processing times.

Disadvantages:

Complexity: Building a data warehouse in a DBMS can be complex, as it involves designing and implementing a database schema that is optimized for analytical processing.

Cost: DBMSs can be expensive, particularly for large-scale data warehouses that require high levels of processing power and storage.

Learning curve: Building a data warehouse in a DBMS may require specialized skills and knowledge, which can result in a steep learning curve for developers who are not familiar with the technology.

Maintenance: Maintaining a data warehouse in a DBMS requires ongoing effort, including monitoring for performance issues, ensuring data quality, and making updates as needed.

Data silos: Building a data warehouse in a DBMS can result in data silos if the data warehouse is not integrated with other databases and applications in the organization.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads