A data-warehouse is a heterogeneous collection of different data sources organised under a unified schema. There are 2 approaches for constructing data-warehouse: Top-down approach and Bottom-up approach are explained as below.
1. Top-down approach:
The essential components are discussed below:
- External Sources –
External source is a source from where data is collected irrespective of the type of data. Data can be structured, semi structured and unstructured as well.
- Stage Area –
Since the data, extracted from the external sources does not follow a particular format, so there is a need to validate this data to load into datawarehouse. For this purpose, it is recommended to use ETL tool.
- E(Extracted): Data is extracted from External data source.
- T(Transform): Data is transformed into the standard format.
- L(Load): Data is loaded into datawarehouse after transforming it into the standard format.
- Data-warehouse –
After cleansing of data, it is stored in the datawarehouse as central repository. It actually stores the meta data and the actual data gets stored in the data marts. Note that datawarehouse stores the data in its purest form in this top-down approach.
- Data Marts –
Data mart is also a part of storage component. It stores the information of a particular function of an organisation which is handled by single authority. There can be as many number of data marts in an organisation depending upon the functions. We can also say that data mart contains subset of the data stored in datawarehouse.
- Data Mining –
The practice of analysing the big data present in datawarehouse is data mining. It is used to find the hidden patterns that are present in the database or in datawarehouse with the help of algorithm of data mining.
This approach is defined by Inmon as – datawarehouse as a central repository for the complete organisation and data marts are created from it after the complete datawarehouse has been created.
Advantages of Top-Down Approach –
- Since the data marts are created from the datawarehouse, provides consistent dimensional view of data marts.
- Also, this model is considered as the strongest model for business changes. That’s why, big organisations prefer to follow this approach.
- Creating data mart from datawarehouse is easy.
Disadvantages of Top-Down Approach –
- The cost, time taken in designing and its maintainence is very high.
2. Bottom-up approach:
- First, the data is extracted from external soures (same as happens in top-down approach).
- Then, the data go through the staging area (as explained above) and loaded into data marts instead of datawarehouse. The data marts are created first and provide reporting capability. It addresses a single business area.
- These data marts are then integrated into datawarehouse.
This approach is given by Kinball as – data marts are created first and provides a thin view for analyses and datawarehouse is created after complete data marts have been created.
Advantages of Bottom-Up Approach –
- As the data marts are created first, so the reports are quickly generated.
- We can accomodate more number of data marts here and in this way datawarehouse can be extended.
- Also, the cost and time taken in designing this model is low comparatively.
Disadvantage of Bottom-Up Approach –
- This model is not strong as top-down approach as dimensional view of data marts is not consistent as it is in above approach.
Don’t stop now and take your learning to the next level. Learn all the important concepts of Data Structures and Algorithms with the help of the most trusted course: DSA Self Paced. Become industry ready at a student-friendly price.
- Data Mining
- Data Warehousing
- Data Abstraction and Data Independence
- DBMS Architecture 2-Level, 3-Level
- Introduction of 3-Tier Architecture in DBMS | Set 2
- Data Replication in DBMS
- Types of Sources of Data in Data Mining
- KDD Process in Data Mining
- Snowflake Schema in Data Warehouse Model
- Difference between Row oriented and Column oriented data stores in DBMS
- Star Schema in Data Warehouse modeling
- Dimensional Data Modeling
- Characteristics and Functions of Data warehouse
- Difference between Structured, Semi-structured and Unstructured data
- Data Marts (storage component of HDFS)
- Difficulties of Implementing Data Warehouses
- Data Management issues in Mobile database
- Characteristics of Biological Data (Genome Data Management)
- Difference between Data Warehousing and Data Mining
- ETL Process in Data Warehouse
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.