This is the fourth post about Itay Braun's seminar in the SQL & BI conference. In this post, I'll write some notes I collected from the third part of the seminar which discussed SSIS.
- Log everything that happens inside the package. Use the SSIS log providers and the event handlers. The effect on performance is not significant.
- Analyize the log data. You can analize total execution time, SSAS partition processing time and much more.
- Log detailed error information about rejected rows. You can even log the actual error row.
- Monitor also the execution of the ETL jobs.
- Data Profiling is a new feature in SSIS 2008. It's used for analysis of the data's quality in the DWH. Go and learn.
- Configure everything you can in the package configuration, such as: tasks, containers, variables, connection managers and data flow components. You can use XML config file, config from parent package, registry configuration or even configuration from a table.
- The Cache Transformation saves data in the cache. It's very useful in lookups.
- Data types: Make them as "small" as possible (better performance). Be aware of precision issues when using money, float and decimal types.
- Don't sort in SSIS unless it's absolutely necessary.
- Sometimes using T-SQL will be faster then running it in SSIS.
Some of the tips are also relevant in Informatica. For example, I always let the DB make the sort whenever possible. Sometimes it's also works with joins. Running packages or procedures can be much more faster than running it in the ETL, but if you look at it from maintenance point of view you'll do it in the ETL. It depends on much factors, such as the time you can have to run the ETL, the knowledge in your team and more.