Open In App

Why to Use Surrogate Keys and Slowly Changing Dimensions in Star Schema?

The surrogate key is a unique identifier of a row/record in a relational table. It is added to each dimension table which does not include a single unique column to support star modeling. Examples of a surrogate key in dimension table:

Dimension: Customer
Attributes:
Customer_id (Primary key and surrogate key)
Name
Age
Gender
Address
Email
Marital Status
city

Here, “Customer_id” is the surrogate key. It is not a piece of useful information about individual customers but gives a unique identification to every individual customer when stored in the databases.



 Use of surrogate keys in the Star Schema:

Therefore, we don’t use unique business keys as the primary keys. We prefer surrogate keys as the primary keys in each dimension table.

 Slowly Changing Dimensions (SCD):

In OLTP systems, overwriting of records is allowed. But in data warehouse systems historical data is stored for analysis purposes. So, this data can not be overwritten.



Types of SCDs:

Type-1 SCD: In Type -1 SCD Overwriting is allowed. Here, New data overwrites the existing data. Existing data is lost and not stored anywhere else. This is the default type of dimension we create. No need to specify any additional information to create them. Example: Email and Marital_status of the customer dimension. Marital status does not change frequently. It changes very slowly over time once or twice.

Disadvantage:

The dimensions will always contain the current values for each attribute and historical data is lost.

Type-2 SCD: It creates another dimension record. It retains the full history of values. When the value of a chosen attribute changes, the current record is closed. A new record is created with the changed data values and this new record becomes the current record. Each record contains the effective time and expiration time to identify the time period between which the record was active. Three extra columns should be added to the dimension table: 1. Start_Date. 2. End_Date, 3. Active_Flag. Example: Salary of the Employees Dimension table. If salary changes overtime then it is not overwritten rather an active time period is added to old salary information.

Type-3 SCD: Creating a current value field. Stores two versions of values for certain selected level attributes. Each record stores the previous value and the current value of the selected attribute. When the value of any selected attribute changes then the current value is stored in previous_column and the new value is stored in current_value_column. Example: City of the customer dimension. If a customer changes his city over time, then we neither overwrite the city name nor keep the active flag for that. We simply create two rows stating what was the old city and what is the new city.

Note: 
SCD-1 approach is commonly used for supplementary columns such as email id, 
mobile number of customers.
SCD-2 is not a good approach for Rapidly Changing Dimensions. 
Hence, separate rapidly changing attributes by implementing junk dimensions.

Junk dimension: The junk dimension can be defined as a place to store junk attributes.The collection of random transactional codes, flags, etc. that are not related to any particular dimensions. Attributes such as delivered, shipped, received, packed, and returned can have indicator flags like Yes/No, Y/N, True/False. It reduces the size of the fact table.

Factless table: A fact table that does not contain any fact or measure. It contains only the primary key from its dimensional tables. It is used to resolve many-to-many cardinality issues.

 

Article Tags :