Friday, October 17, 2008

This month we're really busy with a very important project and a short schedule. This made me think of ideas for agile development for BI, but I'll leave it for other time for now. In order to make us better BI developers, I decided to take one Pragmatic Programmer principle and use it. I took one of the most important (for my opinion) principles - DRY (Don't Repeat Yourself). The DRY principle says that "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system". In classic programming it's simple to use: Use methods and generic classes to implements logic that will repeat itself in the project. But how do you do it in BI development? Here are some ideas I thought and even implemented some of them in my environment. Every layer/step in the BI development has it's own bulletin. I'll be happy to hear more from you.

  • First of all - use functions in your DataWarehouse's database. Do it as much as you can. Do not repeat any logic twice or more, no matter if it's in procedures, views or even CLR functions.
  • We all have much logic that repeats itself in the ETL process. For example, we found ourselves doing over and over the next process: When we build a fact table, we take every cell that points to a dimension table by a foreign key and "looking up" if it's found in the dimension table. If it's not there we replace it with Undefined, UD or null. That makes us feel very bad because we feel that we're doing the same all the time and it gives us the feeling of machines rather than programmers. The solution for this problem (and many other) is to build our own tasks (in SSIS) or transformations (in SSIS & Informatica). Alberto Ferrari did a beautiful work in this field in SSIS. I'll add some transformations of my own once I'll have release-ready versions of them.
  • My co-workers just loves the Calculated Member feature in the Data Source View in SSAS. In enables them to make a new column without making a view and with no touch in the underlying database. The problem here is that after a while we have a LOT of calculated members, many of them repeats themselves and when you look for logic you lost, you can look for hours in the never-ending DSV. The solution here is not using calculated members at all. Put all your logic in the database (and as I said - in functions). The only place where you should use calculated members is where you must - when you have no write permission to the DataWarehouse or when you build your DSV over an operative database and you don't have write permissions.
  • The same is with Names Queries in the Data Source View in SSAS. Don't use it.
  • There's much logic that you can do only in MDX. Here, the problem is that MDX scripts are defined over cubes and not over dimensions, meaning that if a dimension has MDX logic you have to repeat it in every cube's MDX script. The solution is to add the MDX programically using AMO. Every time the ETL process ends, it should run a program that takes the MDX script from a single file and place it in every relevant cube. I know it sounds a little bit wacky and I even didn't do it myself, but for what I know, it's the only solutions for DRY in MDX.

As I said, I'll love to hear your ideas about this topic.

Saturday, October 18, 2008 7:49:53 AM (Jerusalem Standard Time, UTC+02:00)
> Here, the problem
is that MDX scripts are defined over cubes and not over dimensions, meaning that if
a dimension has MDX logic you have to repeat it in every cube’s MDX script

Not quite correct - you can define calculations in dimensions using custom member formulas. The same calculations will be used in every cube this dimension is used.
Mosha
Monday, October 20, 2008 4:34:38 AM (Jerusalem Standard Time, UTC+02:00)
All good points, Miky, I 100% agree. I don't like custom member formulas precisely because you end up having some MDX in your dimension tables, and some in the MDX Script - I want all my MDX code in one place. I'd like to be able to have a global MDX Script and be able to do something like a #include to bring calculations into specific cubes. One to add to my wishlist for the next version...
Sunday, October 26, 2008 5:31:49 PM (Jerusalem Standard Time, UTC+02:00)
Hi there,

As I written in the post - I don't like to use the custom member formulas because it's a death sentence to the DRY principle in the project. I'm glad to see that Chris agrees with me with this point.
Chris - You don't have to add it to your with list. I think I might come with a solution in the next days. If I'll have one - you'll see it in the next post.
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):