Slowly Changing Dimensions

Slowly Changing Dimensions

Published: None

Source: https://www.linkedin.com/pulse/slowly-changing-dimensions-arabinda-mohapatra-xmezc?trackingId=XFuzrPv8SFqHvSecoJC58A%3D%3D


Slowly Changing Dimensions

Running Kafka streams after dark, diving into genetic code by daylight, and wrestling with Databricks and Tableflow in every spare moment—sleep is optional


What are slowly changing dimensions?

Slowly changing dimensions refer to how data in your data warehouse changes over time. Slowly changing dimensions have the same natural key but other data columns that may or may not change over time depending on the type of dimensions that it is.


Understanding Slowly Changing Dimensions (SCD) in Data Warehousing

Slowly Changing Dimensions (SCD) are a cornerstone in the field of data warehousing, acting like a time-travel agent for your data. Managing changes in dimension data is crucial for maintaining historical accuracy and supporting analytical needs. Here’s a breakdown of commonly used SCD types and their differences:

🔖 SCD Type 0 (Fixed Dimension): This type retains the original values of dimension attributes, meaning no changes are allowed after the initial load. It’s ideal for attributes that should never change, such as a birth date.

🔖 SCD Type 1 (Overwrite): This approach updates the dimension with new values, overwriting the old ones. It doesn’t preserve any history, making it suitable for correcting errors where historical tracking isn’t needed.

🔖 SCD Type 2 (Row Versioning): This method tracks changes by creating a new record for each change, maintaining both current and historical data. It uses additional columns like start and end dates to manage the history, allowing for comprehensive historical analysis.

🔖 SCD Type 3 (Previous Value Column): This type adds a new column to store the previous value of the attribute. It’s useful for tracking limited historical data, such as the last change.

🔖 SCD Type 4 (History Table): This approach involves maintaining a separate history table to store historical data. The main dimension table keeps only the current data.

🔖 SCD Type 6 (Hybrid): This combines SCD Types 1, 2, and 3 to provide a comprehensive solution for tracking changes, maintaining history, and storing previous values.

Real-Time Use Case

Imagine a retail company that needs to track customer address changes over time:

  • SCD Type 1: Used for correcting errors in customer addresses without maintaining history.
  • SCD Type 2: Used to track all historical address changes, creating a new record for each change.
  • SCD Type 3: Used to store the previous address in an additional column for quick reference.
  • SCD Type 4: Used to maintain a separate history table for all past addresses.
  • SCD Type 6: Used to combine the benefits of Types 1, 2, and 3 for a comprehensive solution
  • Reference : Wikipedia > Dimension (data warehouse)
  • Wikipedia > Slowly changing dimension

Comments