Wednesday 31 July 2019

Slowly Changing Dimensions

Thought I would do a quick post on the different types of SCD i.e. Slowly Changing Dimensions.

SCDs are dimensions that change slowly but in an unexpected or unknown way. Say for example you have a list of contracts with various clients, this table could contain the clients delivery address, if they change address then this address will change in your dimension. If you have data that is aggregated by post-code region e.g. the first 3/4 characters of the post-code e.g. BN7, then you need this change reflected in your dimension for the aggregation of your fact data.

You have several different ways that you can handle this change in your slowly changing dimension. If you want a fairly in depth look then I suggest the Wikipedia page, although it can be changed it tends to be a very good source for this information.

Typically the main ones used are SCD0, SCD1 and my preference most of the time is SCD2.

In SCD0 you just retain the original, not ideal in most business scenarios. In the above example your column would like be called original address instead of just address.

SCD1 you replace the existing data, so the column would be current address as you know that it only contains the latest address.

SCD2 create a new row for the new information with additional columns for start and end dates of the row. So for the original address you have a start date of, for example, 1st Jan 2010 (when your Data Warehouse started) and you have an end date of 31st July 2018, as the new address came in during the month of August 2018. Your new row then has a start date of 1st August 2018 and no end date, at least until it also gets replace.

The reason that SCD2 is so good is that you keep the entire history of the data, this is essential for a data warehouse, you need to the data to be able to show the end user exactly what they want. There are various ways to control this, ODI, SSIS etc. have built in ways and others include things like merge templates and I would like to explore a way of using python for this as well.

Anyways, as always life is busy and that is all I have time to post now. Hopefully I will come back to this and post a picture.

No comments:

Post a Comment