Monday, November 09, 2009

In both SSIS and Informatica, the union transformation doesn't delete duplicate rows. In SSIS it's clear because the transformation is called "Union all", which reminds us the union all statement in SQL which doesn't remove duplicate rows. The confusing part is in Informatica, where the transformation is called Union although it doesn't remove duplicates. One of my teammates got confused this week so I thought it's worth blogging about.

One more tip about ETLs with union transformations: When you'll investigate, maintain or fix ETL it will be much help if you'll know where each row came from. That's why I recommend adding a column named "src" or something like that and store there the source of the row before the union happened.

Monday, November 09, 2009 11:41:31 PM (Jerusalem Standard Time, UTC+02:00)