SSIS Dimension Merge SCD v1.6 – Duplicate Rows OuputOctober 18th, 2011 | Posted by in Integration Services
Recently, a couple of people I’m working with came across a strange issue using the new version of the Dimension Merge SCD component. During the first load all rows would be output as new records (as you’d expect), but when running the same load again all rows would be output as both unchanged and new (not as you’d expect!). The behavior seemed very strange, only a couple of packages exhibited the issue.
Initially I had little joy trying to replicate the issues with Adventure Works, but found the source of the problem after chatting to Bhavik Merchant about it. The problem arose due to two main reasons…
- Effective and expiry dates were being managed externally of the Dimension Merge SCD component.
- The default for the last records expiry date in the Dimension Merge SCD is the SQL Server datetime maximum value, while the warehouse was using 2199-12-31.
So, in the initial load the records where being inserted with an expiry date of 2199-12-31 which from the Dimension Merge SCD components point of view means the records aren’t current until the end of time. For the second load the Dimension Merge SCD would assign an expiry date of 9999-12-31 to the same source records, making them more recent than those inserted in the first load. So the second load would result in all the records being output again with the different active to date. Since the effective and expiry dates were then managed externally this resulted in two identical records being inserted into the destination table.
The “fix” was simple. Either change the default for the last records expiry date in the data warehouse to 9999-12-31, or set the last records expiry date in the Dimension Merge SCD to the same value being used in the data warehouse. The screenshot below shows where to change this value in the component.