Open In App

Data Warehousing

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

A Database Management System (DBMS) stores data in the form of tables and uses an ER model and the goal is ACID properties. For example, a DBMS of a college has tables for students, faculty, etc. 

A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to produce statistical results that may help in decision-making. For example, a college might want to see quick different results, like how the placement of CS students has improved over the last 10 years, in terms of salaries, counts, etc. 

Issues Occur while Building the Warehouse

  • When and how to gather data: In a source-driven architecture for gathering data, the data sources transmit new information, either continually (as transaction processing takes place), or periodically (nightly, for example). In a destination-driven architecture, the data warehouse periodically sends requests for new data to the sources. Unless updates at the sources are replicated at the warehouse via two phase commit, the warehouse will never be quite up to-date with the sources. Two-phase commit is usually far too expensive to be an option, so data warehouses typically have slightly out-of-date data. That, however, is usually not a problem for decision-support systems.
  • What schema to use: Data sources that have been constructed independently are likely to have different schemas. In fact, they may even use different data models. Part of the task of a warehouse is to perform schema integration, and to convert data to the integrated schema before they are stored. As a result, the data stored in the warehouse are not just a copy of the data at the sources. Instead, they can be thought of as a materialized view of the data at the sources.
  • Data transformation and cleansing: The task of correcting and preprocessing data is called data cleansing. Data sources often deliver data with numerous minor inconsistencies, which can be corrected. For example, names are often misspelled, and addresses may have street, area, or city names misspelled, or postal codes entered incorrectly. These can be corrected to a reasonable extent by consulting a database of street names and postal codes in each city. The approximate matching of data required for this task is referred to as fuzzy lookup.
  • How to propagate update: Updates on relations at the data sources must be propagated to the data warehouse. If the relations at the data warehouse are exactly the same as those at the data source, the propagation is straightforward. If they are not, the problem of propagating updates is basically the view-maintenance problem.
  • What data to summarize: The raw data generated by a transaction-processing system may be too large to store online. However, we can answer many queries by maintaining just summary data obtained by aggregation on a relation, rather than maintaining the entire relation. For example, instead of storing data about every sale of clothing, we can store total sales of clothing by item name and category.

Need for Data Warehouse 

An ordinary Database can store MBs to GBs of data and that too for a specific purpose. For storing data of TB size, the storage shifted to the Data Warehouse. Besides this, a transactional database doesn’t offer itself to analytics. To effectively perform analytics, an organization keeps a central Data Warehouse to closely study its business by organizing, understanding, and using its historical data for making strategic decisions and analyzing trends. 

Benefits of Data Warehouse

  • Better business analytics: Data warehouse plays an important role in every business to store and analysis of all the past data and records of the company. which can further increase the understanding or analysis of data for the company.
  • Faster Queries: The data warehouse is designed to handle large queries that’s why it runs queries faster than the database.
  • Improved data Quality: In the data warehouse the data you gathered from different sources is being stored and analyzed it does not interfere with or add data by itself so your quality of data is maintained and if you get any issue regarding data quality then the data warehouse team will solve this.
  • Historical Insight: The warehouse stores all your historical data which contains details about the business so that one can analyze it at any time and extract insights from it.

Data Warehouse vs DBMS 

Database

Data Warehouse

A common Database is based on operational or transactional processing. Each operation is an indivisible transaction.

A data Warehouse is based on analytical processing.

Generally, a Database stores current and up-to-date data which is used for daily operations.

A Data Warehouse maintains historical data over time. Historical data is the data kept over years and can used for trend analysis, make future predictions and decision support.

A database is generally application specific.

Example – A database stores related data, such as the student details in a school.

A Data Warehouse is integrated generally at the organization level, by combining data from different databases.

Example – A data warehouse integrates the data from one or more databases , so that analysis can be done to get results , such as the best performing school in a city.

Constructing a Database is not so expensive.

Constructing a Data Warehouse can be expensive.

Example Applications of Data Warehousing 

Data Warehousing can be applied anywhere where we have a huge amount of data and we want to see statistical results that help in decision making. 

  • Social Media Websites: The social networking websites like Facebook, Twitter, Linkedin, etc. are based on analyzing large data sets. These sites gather data related to members, groups, locations, etc., and store it in a single central repository. Being a large amount of data, Data Warehouse is needed for implementing the same.
  • Banking: Most of the banks these days use warehouses to see the spending patterns of account/cardholders. They use this to provide them with special offers, deals, etc.
  • Government: Government uses a data warehouse to store and analyze tax payments which are used to detect tax thefts.

Features of Data Warehousing

Data warehousing is essential for modern data management, providing a strong foundation for organizations to consolidate and analyze data strategically. Its distinguishing features empower businesses with the tools to make informed decisions and extract valuable insights from their data.

  • Centralized Data Repository: Data warehousing provides a centralized repository for all enterprise data from various sources, such as transactional databases, operational systems, and external sources. This enables organizations to have a comprehensive view of their data, which can help in making informed business decisions.
  • Data Integration: Data warehousing integrates data from different sources into a single, unified view, which can help in eliminating data silos and reducing data inconsistencies.
  • Historical Data Storage: Data warehousing stores historical data, which enables organizations to analyze data trends over time. This can help in identifying patterns and anomalies in the data, which can be used to improve business performance.
  • Query and Analysis: Data warehousing provides powerful query and analysis capabilities that enable users to explore and analyze data in different ways. This can help in identifying patterns and trends, and can also help in making informed business decisions.
  • Data Transformation: Data warehousing includes a process of data transformation, which involves cleaning, filtering, and formatting data from various sources to make it consistent and usable. This can help in improving data quality and reducing data inconsistencies.
  • Data Mining: Data warehousing provides data mining capabilities, which enable organizations to discover hidden patterns and relationships in their data. This can help in identifying new opportunities, predicting future trends, and mitigating risks.
  • Data Security: Data warehousing provides robust data security features, such as access controls, data encryption, and data backups, which ensure that the data is secure and protected from unauthorized access.

Advantages of Data Warehousing

  • Intelligent Decision-Making: With centralized data in warehouses, decisions may be made more quickly and intelligently.
  • Business Intelligence: Provides strong operational insights through business intelligence.
  • Historical Analysis: Predictions and trend analysis are made easier by storing past data.
  • Data Quality: Guarantees data quality and consistency for trustworthy reporting.
  • Scalability: Capable of managing massive data volumes and expanding to meet changing requirements.
  • Effective Queries: Fast and effective data retrieval is made possible by an optimized structure.
  • Cost reductions: Data warehousing can result in cost savings over time by reducing data management procedures and increasing overall efficiency, even when there are setup costs initially.
  • Data security: Data warehouses employ security protocols to safeguard confidential information, guaranteeing that only authorized personnel are granted access to certain data.

Disadvantages of Data Warehousing

  • Cost: Building a data warehouse can be expensive, requiring significant investments in hardware, software, and personnel.
  • Complexity: Data warehousing can be complex, and businesses may need to hire specialized personnel to manage the system.
  • Time-consuming: Building a data warehouse can take a significant amount of time, requiring businesses to be patient and committed to the process.
  • Data integration challenges: Data from different sources can be challenging to integrate, requiring significant effort to ensure consistency and accuracy.
  • Data security: Data warehousing can pose data security risks, and businesses must take measures to protect sensitive data from unauthorized access or breaches.

There can be many more applications in different sectors like E-Commerce, telecommunications, Transportation Services, Marketing and Distribution, Healthcare, and Retail. 

Conclusion

Data warehousing in database management systems (DBMS) enables integrated data management, providing scalable solutions for enhanced business intelligence and decision-making within businesses. Its advantages in data quality, historical analysis, and scalability highlight its critical role in deriving important insights for a competitive edge, even in the face of implementation problems.
 



Last Updated : 04 Feb, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads