A data-warehouse is a heterogeneous collection of different data sources organised under a unified schema. There are 2 approaches for constructing data-warehouse: Top-down approach and Bottom-up approach are explained as below.
1. Top-down approach:
The essential components are discussed below:
- External Sources –
External source is a source from where data is collected irrespective of the type of data. Data can be structured, semi structured and unstructured as well.
- Stage Area –
Since the data, extracted from the external sources does not follow a particular format, so there is a need to validate this data to load into datawarehouse. For this purpose, it is recommended to use ETL tool.
- E(Extracted): Data is extracted from External data source.
- T(Transform): Data is transformed into the standard format.
- L(Load): Data is loaded into datawarehouse after transforming it into the standard format.
- Data-warehouse –
After cleansing of data, it is stored in the datawarehouse as central repository. It actually stores the meta data and the actual data gets stored in the data marts. Note that datawarehouse stores the data in its purest form in this top-down approach.
- Data Marts –
Data mart is also a part of storage component. It stores the information of a particular function of an organisation which is handled by single authority. There can be as many number of data marts in an organisation depending upon the functions. We can also say that data mart contains subset of the data stored in datawarehouse.
- Data Mining –
The practice of analysing the big data present in datawarehouse is data mining. It is used to find the hidden patterns that are present in the database or in datawarehouse with the help of algorithm of data mining.
This approach is defined by Inmon as – datawarehouse as a central repository for the complete organisation and data marts are created from it after the complete datawarehouse has been created.
Advantages of Top-Down Approach –
- Since the data marts are created from the datawarehouse, provides consistent dimensional view of data marts.
- Also, this model is considered as the strongest model for business changes. That’s why, big organisations prefer to follow this approach.
- Creating data mart from datawarehouse is easy.
Disadvantages of Top-Down Approach –
- The cost, time taken in designing and its maintainence is very high.
2. Bottom-up approach:
- First, the data is extracted from external soures (same as happens in top-down approach).
- Then, the data go through the staging area (as explained above) and loaded into data marts instead of datawarehouse. The data marts are created first and provide reporting capability. It addresses a single business area.
- These data marts are then integrated into datawarehouse.
This approach is given by Kinball as – data marts are created first and provides a thin view for analyses and datawarehouse is created after complete data marts have been created.
Advantages of Bottom-Up Approach –
- As the data marts are created first, so the reports are quickly generated.
- We can accomodate more number of data marts here and in this way datawarehouse can be extended.
- Also, the cost and time taken in designing this model is low comparatively.
Disadvantage of Bottom-Up Approach –
- This model is not strong as top-down approach as dimensional view of data marts is not consistent as it is in above approach.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.