Wednesday, December 03, 2008

I've been thinking lately about the new Microsoft Chart controls which are based on the Dundas acquisition (made in April 2007). What is the meaning of this to us, the BI developers?

Until now we were always counting on the abilities of our BI products. Let's take Hyperion/Oracle Essbase for example. Let's say I want to have a special graph of a new type or a new feature in a graph. I couldn't do it at all, because the product's code is closed (someone has to do money, doesn't it?) and I can't add any more graphs or features. There are some products where I can do things like this. For example, in Panorama NovaView I can build a new KPI type or doing a sophisticated visualization using JavaScript and Panorama SDK, but that's a lot of coding.

Now, we have the ability to write graphs by coding them without large amount of code. We can customize them as we want and we're not limited by any product. The drawbacks are maintenance and knowledge that we need to have here, but these are things that we need in every product anyway. I didn't learned this framework yet so I can't tell where are the limits, but they seems pretty far. Alex Gorev is writing about it in his blog (web, rss) so you can learn more about it there. It will take time to see if it affects the BI development world, so all what left to do is to sit and wait.

Thursday, December 04, 2008 5:46:20 AM (Jerusalem Standard Time, UTC+02:00)

Today I had a very disturbing coincidence.
My friend Ariel worked on a SSAS solution with no version control (we're using VSS). Instead of using that, he developed by opening the database on the server. I told him that he must fix it and we must have a recent version-controlled solution. In the past we asked Microsoft support how to do that (we lost all our vss files and had only the databases). They simply said that it's not possible. Ariel has found today that it can be made very easily using File -> New Project -> Import Analysis Services Database, as you can see in the picture:

Thursday, December 04, 2008 5:09:58 AM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, November 18, 2008

This is a little bit tricky. Unlike the AdomdClient assembly, the AdomdServer assembly  doesn't have a descriptive name. It's called msmgdsrv.dll and it is located in Program Files\Microsoft SQL Server\MSSQL.2\OLAP\bin. Why it's not documented anywhere?

Tuesday, November 18, 2008 11:17:16 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, November 17, 2008

After announcing the MdxInjection program I got several requests for additional details and for the ability to run it without using Visual Studio. So, here are some important points:

  • When I published it I had developers in mind because I'm sure than anyone will want to do his little modification before using it for his own needs. That's why I published it as a solution and not as executable.
  • I written it down using VS2008 but only with the .Net 2 framework. Those of you who uses VS2005 won't be able to open the solution.
  • The program has only one public method - InjectMdx, who takes two arguments: The location of the CommonMdx file and the location of the xml configuration file.
  • The CommonMdx.mdx file contains the common MDX script. The relevant part has to start with /* Common MDX */ and then the common mdx script. Anything written before it won't be treated. That gives you the ability to save some data or comments for yourself in this file.
  • Example of the configuration xml file can be found in the Test libary inside the solution. Basically, it enable you to define in which servers, databases and cubes you want to inject the common script. Pay attention that you have to write the connection strings in this file.
  • Note that the program will detect cube dimensions with their name changed and will know how to replace them. That means that if you mention the Time dimension in the common script and inject it to AdventureWorks cube, the script will replace the string "Time" with the "ShipmentDate" string, for example.

For those of you who want simple execution file, I added a windows console project in the solution.

Link to only executable program
Link to the solution with the added windows application project
Link to the solution without the windows application project
Tuesday, November 18, 2008 6:54:17 AM (Jerusalem Standard Time, UTC+02:00)

In the previous post I talked about the DRY principle in the BI Development. I mentioned that one of the major problems in the principle's implementation is in the common MDX code. Chris Commented:

"I'd like to be able to have a global MDX Script and be able to do something like a #include to bring calculations into specific cubes. One to add to my wishlist for the next version..."

And as I said there that I have a temporary good solution until we'll have it in the next SQL Server release (if someone from Microsoft is reading...).

The MdxInjection program takes your common MDX Script and a very simple xml file that defines where to inject this script. It injects the script into your desired cubes and even replaces the dimensions' names where necessary (it is relevant where you put dimension in a cube with a different name to thr dimension or when you use Role Playing Dimensions). I couldn't hold myself from writing some test code so it's also included in the project. The project is written in C# 2 using much AMO code. All the technical little details are inside.

Enjoy.

Download Link

Monday, November 17, 2008 8:29:02 AM (Jerusalem Standard Time, UTC+02:00)
 Friday, October 17, 2008

This month we're really busy with a very important project and a short schedule. This made me think of ideas for agile development for BI, but I'll leave it for other time for now. In order to make us better BI developers, I decided to take one Pragmatic Programmer principle and use it. I took one of the most important (for my opinion) principles - DRY (Don't Repeat Yourself). The DRY principle says that "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system". In classic programming it's simple to use: Use methods and generic classes to implements logic that will repeat itself in the project. But how do you do it in BI development? Here are some ideas I thought and even implemented some of them in my environment. Every layer/step in the BI development has it's own bulletin. I'll be happy to hear more from you.

  • First of all - use functions in your DataWarehouse's database. Do it as much as you can. Do not repeat any logic twice or more, no matter if it's in procedures, views or even CLR functions.
  • We all have much logic that repeats itself in the ETL process. For example, we found ourselves doing over and over the next process: When we build a fact table, we take every cell that points to a dimension table by a foreign key and "looking up" if it's found in the dimension table. If it's not there we replace it with Undefined, UD or null. That makes us feel very bad because we feel that we're doing the same all the time and it gives us the feeling of machines rather than programmers. The solution for this problem (and many other) is to build our own tasks (in SSIS) or transformations (in SSIS & Informatica). Alberto Ferrari did a beautiful work in this field in SSIS. I'll add some transformations of my own once I'll have release-ready versions of them.
  • My co-workers just loves the Calculated Member feature in the Data Source View in SSAS. In enables them to make a new column without making a view and with no touch in the underlying database. The problem here is that after a while we have a LOT of calculated members, many of them repeats themselves and when you look for logic you lost, you can look for hours in the never-ending DSV. The solution here is not using calculated members at all. Put all your logic in the database (and as I said - in functions). The only place where you should use calculated members is where you must - when you have no write permission to the DataWarehouse or when you build your DSV over an operative database and you don't have write permissions.
  • The same is with Names Queries in the Data Source View in SSAS. Don't use it.
  • There's much logic that you can do only in MDX. Here, the problem is that MDX scripts are defined over cubes and not over dimensions, meaning that if a dimension has MDX logic you have to repeat it in every cube's MDX script. The solution is to add the MDX programically using AMO. Every time the ETL process ends, it should run a program that takes the MDX script from a single file and place it in every relevant cube. I know it sounds a little bit wacky and I even didn't do it myself, but for what I know, it's the only solutions for DRY in MDX.

As I said, I'll love to hear your ideas about this topic.

Friday, October 17, 2008 9:46:22 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, September 22, 2008
My friends were stuck with a totally weird bug this week. After a day of frustration they called me for the rescue. It took me some time to figure it out and I think that every SSIS developer (and maybe every developer) can learn a thing or two from others' mistakes.

The mission: The data flow takes one table with duplicate rows and copies it to another table and makes sure that every row will appear only once. In the way, the data flow also adds some irrelevant fields. Among them is the Create_User and Create_Date fields which tells by who and when the package last ran.
How my friends did it: Again, it's a very simple flow. They only added Derived Column transformation to add the new fields and then they added an Aggregate transformation to make every row appear only once.

Note that this is not the real package. It's a sample I did on my machine to show it here.

The Bug: When I first seen this it seemed to me very simple flow and I asked myself how can it be that this is happening:

As you can see, it seems that the Aggregate transformation is not deterministic. Sometimes it outputs 99 rows, sometimes 198 and in some other times I get other results as well.
Investigating: I wanted to see what's the difference between the table that I got in the first time (99 rows) and the table I got in the second time (198 rows) so I changed the destination table and compared the two tables. I ran "select * from A where Column1+Column2+... not in (select Column1+Column2+... from B)"-style query but it was no use - it showed me that there were no rows that appeared only in one of the tables. In this step I really started to think (as my friends did) that maybe the Aggregate transformation has something wrong inside... Instead of blaming Microsoft, I decided to think. I needed to see what can make the flow non-deterministic. Then, it hit me.


The only non-deterministic component in the flow is the Derived Column because it has the getdate() function (it may be simple to see here, but in the original package the derived column transformation had many fields). The results of this function may differ in the milliseconds, especially for large tables. Then I looked in the Aggregate transformation and seen that the Create_Date column also was in the Group by operation, meaning that if two rows has different millisecond they will be placed twice in the destination table, although they are the same in every column. That's it, the bug was found. But still, one question remained: Why the query did not show me this? The answer is also simple but tricky to find: In the comparison query I concatenated all the columns in the tables in order to compare the results. When I did this, I casted the Create_Date to nvarchar which truncated the milliseconds.

Conclusions:
  • Pay attention to non-deterministic elements in what you do, whether it's code or ETL process.
  • When you do dummy stuff like checking all the checkboxes in a list - think what are the outcomes.
  • Call Miky when you're desperate.
Monday, September 22, 2008 8:10:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, September 20, 2008
This week I had something disturbing. When I installed Excel 2003 on the Panorama machine in order to use Excel functions in my MDX calculations, the NovaView Desktop stopped working. When I tried to load a view it threw an error in connection message. Calling to Panorama support, they told me that it's a known issue and it's hard to find by using the Panorama knowledge base. So here it is:

If you have connection issues in the Desktop program, enter the registry editor (Start -> Run -> regedit). Look for HKEY_CLASSES_ROOT\MSOLAP\CLSID and make sure it's the same as HKEY_CLASSES_ROOT\MSOLAP.3\CLSID. Remember - always copy from MSOLAP.3 to MSOLAP and not vise versa.

 | 
Sunday, September 21, 2008 6:38:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, September 04, 2008
My blog was down for couple of hours because of this bug which been fixed in the later versions of dasBlog. I haven't found the solution in google search so this post is for those who will search it in the future (google scans every post in this blog). The solution(s) can be found dasBlog/Thread/View.aspx?ThreadId=34910&ANCHOR#Post115981">here.

Friday, September 05, 2008 4:03:37 AM (Jerusalem Daylight Time, UTC+03:00)
In the last years I've seen many astonishing BI web sites. I always asked myself what I need to do to bring my customers such beautiful web-based BI solutions. After having much experience with Panorama NovaView and especially the Panorama SDK I started to run some questions in my mind: Why won't I build some re-usable puzzle pieces that can be joined together to a web site? These pieces can be web controls that using and even interacting Panorama views and Analysis services. Why won't publish it as open source and give it to the BI community?

The PanoramaBasedWebSite project is a toolkit that contains web controls you can easily use in your ASP.NET based web site. The project is written in ASP.NET 2.0 and C# 3.5. These web controls interacts with Panorama views (using Panorama SDK) and Analysis Services (using AMO).
The idea is that you can take these puzzle pieces, combine them as you like in your web site and create your good-looking BI web site with almost no programming. The project is only in its first steps, but I believe that publishing the design/idea is also important. This is why the first release is already published, although it has only two web controls so far. This is what we have so far and what I'm planning for the future. I'll be happy to hear your thoughts/ideas:

First Release Contents

  • PanoramaView web control - this is the main control of the project and it will probably take a lot of the project's weight. The control simply shows panorama view. For now, it doesn't do much rather then showing a view so there's a lot of work to do for this control. It gets two properties - BriefingBookName and ViewName. You can look at the TODO: comments in the code to see what future plans I have for this control.
  • UpdateDatePanel web control - this control shows the date and time when the last process of the cube was made. It can be used in two ways: You can only set the PanoramaViewID property. The control will extract the cube and the database name from the view and take the update date from the cube. The other way is to set the CubeName and DataBaseName properties.
Future Plans

  • KPIView - Already working on it. Similar to PanoramaView, but if the view shows KPI then a drilldown will be made when the user clicks on a gauge.
  • QueryList - Shows the result of MDX query. For example, the list shows the top 10 employees of the month (in sales perspective, for example). This list will be interactive, meaning that clicking on a row will make a drilldown, drill to data or replace the list with another query results.
  • DimensionPicker - Gives the user the ability to pick members of a dimension/hierarchy. After selecting, the control will slice all the views on the page (or only predefined set of views).
  • DatePicker - Same as DimensionPicker but for dates. It will show a calendar to the user and clicking on a date will perform a slice in the views.

The use of the controls in your aspx pages is very easy. You can see for yourself:

<PanoramaControls:UpdateDateLabel ID="UpdateDateLabel1" runat="server" PanoramaViewID="PanoramaView1" />
<PanoramaControls:PanoramaView ID="PanoramaView1" runat="server" Width="100%" Height="80%" BriefingBookName="MikysBook" ViewName="MyFirstView" />

I'll be happy to read your thoughts and ideas about this project. There will be more to come. Stay Tuned.

 |  |  | 
Friday, September 05, 2008 3:37:45 AM (Jerusalem Daylight Time, UTC+03:00)