tirsdag den 8. november 2011

SCD Type 1 , 2, 3 or 1 + 2 + 3

Question: When you define a SCD (slow changing dimension) as type 1, 2 or 3 ?

I experienced some confusion during the past, so that is why I'd like to put the 'selection' process into perspective, although the discussion may seem a little bit theoretical:
For each SCD dimension :

  • Define the business key, the key that identifies your identity, as for instance the Social Security Number ( US) or Cpr. nr (DK) for a customer dimension.
  • Pick-up each attribute you have to track changes over time. Do not include her the business key, the surrogate key, and focus only to the attributes that describes, in other words add pieces of information to your business key. ( Addresses, birth day, number of children, marital status , etc. for a customer.)
  • Start by diving the attributes you idendified from above in two categories first : attributes where you are not interested at all in tracking the changes (birth date for instance), let' call it category 1 and attributes where you are interested in tracking the changes ( name, adresses, ) , as category 2
  • For the attributes in category 2, make a clear decision from the start whether:
  • when an attribute value changes you are only interested in the last value. That is an in place override of the previous value and no posibility to track the historical changes. Such an attribute is a type 1 attribute and might be a name or address.
  • if you have to keep the historical chages, then you have to choose between two practical ways to implement it :
  • type 2 - the most common . You may add either ( StartDate, EndDate) or IsCurrent or even both of them.
  • type 3 - less common
Keep in mind that the business requirements decide whether your attribute falls into one category or type.

You will often end with a dimension having attributes of category 1 and category 2 and the latter having both type 1 and type 2 and in rarely cases a mix of type 1, 2 and 3.
So the answer to the original question may sound like:

If you keep all the attributes in your dimension, and your dimension has at least one type 2 attribute , than people call it a SCD type 2 dimension.
    
But what about if you have at least one type 2 and at least one type 3 attribute ?       
It's both a type 2 and type 3 and often not to mention type 1.
In that case, by only calling it type 2, you may hide the representation of the two other types of attributes.
So why not call it : 1 + 2 + 3 ......

Ingen kommentarer:

Send en kommentar