Posts so far:
Data Warehouse Design Considerations
Getting Started - Picking a Dataset
A Data Warehouse is a central repository of data that should be acting as the single source of truth for Business intelligence reporting throughout an organisation. In some places the Enterprise Data Warehouse is often the only place that some data sources have been integrated together and might be the easiest place for management to obtain the information they need to drive the business.
One of the key choices the Warehouse and Business teams need to make is how are they going to cope with changing dimensional data. If Betty has an address of 28 Street X for orders before 201910 and then moves to 5 Street Y, you would want the orders in the relevant time frames associated with the correct address at the time, well probably. but what about date of birth? If this had been changed from 1972 to 1992 then this would make a difference to metrics where you have been aggregating by age ranges, would you want your facts to be summed by incorrect data just because that is how it was at the time or would you want it reflecting the new and most accurate data?
There are different "types" of ways to cater for Slowly Changing Dimensions, SCD0, SCD1, SCD2 etc, they have different benefits and personally I have found using SCD2 processing with business modelling generating a latest view of the dimension a very flexible way to cater for this. Firstly lets discuss the different types and then I will explain the the use of SCD2 for the above example.
Types of SCD processing:
- Type 0 – No change or fixed Dimension
- Dimension never changes, so Betty will always have been born in 1972 and live at the first address she reported.
- Type 1 – Latest Only Dimension
- Historical Data is removed or update to reflect the latest, so Betty is always born in 1992 and lives at the last submitted address.
- Type 2 – Time framed dimension
- Track changes with to and from effective dates and often using current flag or latest flags with the active dates.
Why I rely on SCD2 processing:
From the source data we can create a dimension of the latest data, obviously this can change each month to reflect the latest data available and as such reporting would update accordingly.
No comments:
Post a Comment