Saturday, January 03, 2009

You should know that when you run a program by starting a job with a CmdExec step, the directory in which the program is running in will be c:\<windows dir>\system32. How can this affect you? For example, I created a .Net console application that has a settings file with it. When I ran it using the SQL Server Agent, it couldn't find the settings file (worse - it used the default settings and that caused many trouble finding the problem). After some research, I found that it's looking for it in the directory I mentioned.

Sunday, January 04, 2009 6:24:58 AM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, December 03, 2008

I've been thinking lately about the new Microsoft Chart controls which are based on the Dundas acquisition (made in April 2007). What is the meaning of this to us, the BI developers?

Until now we were always counting on the abilities of our BI products. Let's take Hyperion/Oracle Essbase for example. Let's say I want to have a special graph of a new type or a new feature in a graph. I couldn't do it at all, because the product's code is closed (someone has to do money, doesn't it?) and I can't add any more graphs or features. There are some products where I can do things like this. For example, in Panorama NovaView I can build a new KPI type or doing a sophisticated visualization using JavaScript and Panorama SDK, but that's a lot of coding.

Now, we have the ability to write graphs by coding them without large amount of code. We can customize them as we want and we're not limited by any product. The drawbacks are maintenance and knowledge that we need to have here, but these are things that we need in every product anyway. I didn't learned this framework yet so I can't tell where are the limits, but they seems pretty far. Alex Gorev is writing about it in his blog (web, rss) so you can learn more about it there. It will take time to see if it affects the BI development world, so all what left to do is to sit and wait.

Thursday, December 04, 2008 5:46:20 AM (Jerusalem Standard Time, UTC+02:00)

Today I had a very disturbing coincidence.
My friend Ariel worked on a SSAS solution with no version control (we're using VSS). Instead of using that, he developed by opening the database on the server. I told him that he must fix it and we must have a recent version-controlled solution. In the past we asked Microsoft support how to do that (we lost all our vss files and had only the databases). They simply said that it's not possible. Ariel has found today that it can be made very easily using File -> New Project -> Import Analysis Services Database, as you can see in the picture:

Thursday, December 04, 2008 5:09:58 AM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, November 18, 2008

This is a little bit tricky. Unlike the AdomdClient assembly, the AdomdServer assembly  doesn't have a descriptive name. It's called msmgdsrv.dll and it is located in Program Files\Microsoft SQL Server\MSSQL.2\OLAP\bin. Why it's not documented anywhere?

Tuesday, November 18, 2008 11:17:16 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, November 17, 2008

After announcing the MdxInjection program I got several requests for additional details and for the ability to run it without using Visual Studio. So, here are some important points:

  • When I published it I had developers in mind because I'm sure than anyone will want to do his little modification before using it for his own needs. That's why I published it as a solution and not as executable.
  • I written it down using VS2008 but only with the .Net 2 framework. Those of you who uses VS2005 won't be able to open the solution.
  • The program has only one public method - InjectMdx, who takes two arguments: The location of the CommonMdx file and the location of the xml configuration file.
  • The CommonMdx.mdx file contains the common MDX script. The relevant part has to start with /* Common MDX */ and then the common mdx script. Anything written before it won't be treated. That gives you the ability to save some data or comments for yourself in this file.
  • Example of the configuration xml file can be found in the Test libary inside the solution. Basically, it enable you to define in which servers, databases and cubes you want to inject the common script. Pay attention that you have to write the connection strings in this file.
  • Note that the program will detect cube dimensions with their name changed and will know how to replace them. That means that if you mention the Time dimension in the common script and inject it to AdventureWorks cube, the script will replace the string "Time" with the "ShipmentDate" string, for example.

For those of you who want simple execution file, I added a windows console project in the solution.

Link to only executable program
Link to the solution with the added windows application project
Link to the solution without the windows application project
Tuesday, November 18, 2008 6:54:17 AM (Jerusalem Standard Time, UTC+02:00)

In the previous post I talked about the DRY principle in the BI Development. I mentioned that one of the major problems in the principle's implementation is in the common MDX code. Chris Commented:

"I'd like to be able to have a global MDX Script and be able to do something like a #include to bring calculations into specific cubes. One to add to my wishlist for the next version..."

And as I said there that I have a temporary good solution until we'll have it in the next SQL Server release (if someone from Microsoft is reading...).

The MdxInjection program takes your common MDX Script and a very simple xml file that defines where to inject this script. It injects the script into your desired cubes and even replaces the dimensions' names where necessary (it is relevant where you put dimension in a cube with a different name to thr dimension or when you use Role Playing Dimensions). I couldn't hold myself from writing some test code so it's also included in the project. The project is written in C# 2 using much AMO code. All the technical little details are inside.

Enjoy.

Download Link

Monday, November 17, 2008 8:29:02 AM (Jerusalem Standard Time, UTC+02:00)
 Friday, October 17, 2008

This month we're really busy with a very important project and a short schedule. This made me think of ideas for agile development for BI, but I'll leave it for other time for now. In order to make us better BI developers, I decided to take one Pragmatic Programmer principle and use it. I took one of the most important (for my opinion) principles - DRY (Don't Repeat Yourself). The DRY principle says that "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system". In classic programming it's simple to use: Use methods and generic classes to implements logic that will repeat itself in the project. But how do you do it in BI development? Here are some ideas I thought and even implemented some of them in my environment. Every layer/step in the BI development has it's own bulletin. I'll be happy to hear more from you.

  • First of all - use functions in your DataWarehouse's database. Do it as much as you can. Do not repeat any logic twice or more, no matter if it's in procedures, views or even CLR functions.
  • We all have much logic that repeats itself in the ETL process. For example, we found ourselves doing over and over the next process: When we build a fact table, we take every cell that points to a dimension table by a foreign key and "looking up" if it's found in the dimension table. If it's not there we replace it with Undefined, UD or null. That makes us feel very bad because we feel that we're doing the same all the time and it gives us the feeling of machines rather than programmers. The solution for this problem (and many other) is to build our own tasks (in SSIS) or transformations (in SSIS & Informatica). Alberto Ferrari did a beautiful work in this field in SSIS. I'll add some transformations of my own once I'll have release-ready versions of them.
  • My co-workers just loves the Calculated Member feature in the Data Source View in SSAS. In enables them to make a new column without making a view and with no touch in the underlying database. The problem here is that after a while we have a LOT of calculated members, many of them repeats themselves and when you look for logic you lost, you can look for hours in the never-ending DSV. The solution here is not using calculated members at all. Put all your logic in the database (and as I said - in functions). The only place where you should use calculated members is where you must - when you have no write permission to the DataWarehouse or when you build your DSV over an operative database and you don't have write permissions.
  • The same is with Names Queries in the Data Source View in SSAS. Don't use it.
  • There's much logic that you can do only in MDX. Here, the problem is that MDX scripts are defined over cubes and not over dimensions, meaning that if a dimension has MDX logic you have to repeat it in every cube's MDX script. The solution is to add the MDX programically using AMO. Every time the ETL process ends, it should run a program that takes the MDX script from a single file and place it in every relevant cube. I know it sounds a little bit wacky and I even didn't do it myself, but for what I know, it's the only solutions for DRY in MDX.

As I said, I'll love to hear your ideas about this topic.

Friday, October 17, 2008 9:46:22 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, September 22, 2008
My friends were stuck with a totally weird bug this week. After a day of frustration they called me for the rescue. It took me some time to figure it out and I think that every SSIS developer (and maybe every developer) can learn a thing or two from others' mistakes.

The mission: The data flow takes one table with duplicate rows and copies it to another table and makes sure that every row will appear only once. In the way, the data flow also adds some irrelevant fields. Among them is the Create_User and Create_Date fields which tells by who and when the package last ran.
How my friends did it: Again, it's a very simple flow. They only added Derived Column transformation to add the new fields and then they added an Aggregate transformation to make every row appear only once.

Note that this is not the real package. It's a sample I did on my machine to show it here.

The Bug: When I first seen this it seemed to me very simple flow and I asked myself how can it be that this is happening:

As you can see, it seems that the Aggregate transformation is not deterministic. Sometimes it outputs 99 rows, sometimes 198 and in some other times I get other results as well.
Investigating: I wanted to see what's the difference between the table that I got in the first time (99 rows) and the table I got in the second time (198 rows) so I changed the destination table and compared the two tables. I ran "select * from A where Column1+Column2+... not in (select Column1+Column2+... from B)"-style query but it was no use - it showed me that there were no rows that appeared only in one of the tables. In this step I really started to think (as my friends did) that maybe the Aggregate transformation has something wrong inside... Instead of blaming Microsoft, I decided to think. I needed to see what can make the flow non-deterministic. Then, it hit me.


The only non-deterministic component in the flow is the Derived Column because it has the getdate() function (it may be simple to see here, but in the original package the derived column transformation had many fields). The results of this function may differ in the milliseconds, especially for large tables. Then I looked in the Aggregate transformation and seen that the Create_Date column also was in the Group by operation, meaning that if two rows has different millisecond they will be placed twice in the destination table, although they are the same in every column. That's it, the bug was found. But still, one question remained: Why the query did not show me this? The answer is also simple but tricky to find: In the comparison query I concatenated all the columns in the tables in order to compare the results. When I did this, I casted the Create_Date to nvarchar which truncated the milliseconds.

Conclusions:
  • Pay attention to non-deterministic elements in what you do, whether it's code or ETL process.
  • When you do dummy stuff like checking all the checkboxes in a list - think what are the outcomes.
  • Call Miky when you're desperate.
Monday, September 22, 2008 8:10:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, September 20, 2008
This week I had something disturbing. When I installed Excel 2003 on the Panorama machine in order to use Excel functions in my MDX calculations, the NovaView Desktop stopped working. When I tried to load a view it threw an error in connection message. Calling to Panorama support, they told me that it's a known issue and it's hard to find by using the Panorama knowledge base. So here it is:

If you have connection issues in the Desktop program, enter the registry editor (Start -> Run -> regedit). Look for HKEY_CLASSES_ROOT\MSOLAP\CLSID and make sure it's the same as HKEY_CLASSES_ROOT\MSOLAP.3\CLSID. Remember - always copy from MSOLAP.3 to MSOLAP and not vise versa.

 | 
Sunday, September 21, 2008 6:38:48 AM (Jerusalem Daylight Time, UTC+03:00)