Monday, September 22, 2008
My friends were stuck with a totally weird bug this week. After a day of frustration they called me for the rescue. It took me some time to figure it out and I think that every SSIS developer (and maybe every developer) can learn a thing or two from others' mistakes.

The mission: The data flow takes one table with duplicate rows and copies it to another table and makes sure that every row will appear only once. In the way, the data flow also adds some irrelevant fields. Among them is the Create_User and Create_Date fields which tells by who and when the package last ran.
How my friends did it: Again, it's a very simple flow. They only added Derived Column transformation to add the new fields and then they added an Aggregate transformation to make every row appear only once.

Note that this is not the real package. It's a sample I did on my machine to show it here.

The Bug: When I first seen this it seemed to me very simple flow and I asked myself how can it be that this is happening:

As you can see, it seems that the Aggregate transformation is not deterministic. Sometimes it outputs 99 rows, sometimes 198 and in some other times I get other results as well.
Investigating: I wanted to see what's the difference between the table that I got in the first time (99 rows) and the table I got in the second time (198 rows) so I changed the destination table and compared the two tables. I ran "select * from A where Column1+Column2+... not in (select Column1+Column2+... from B)"-style query but it was no use - it showed me that there were no rows that appeared only in one of the tables. In this step I really started to think (as my friends did) that maybe the Aggregate transformation has something wrong inside... Instead of blaming Microsoft, I decided to think. I needed to see what can make the flow non-deterministic. Then, it hit me.


The only non-deterministic component in the flow is the Derived Column because it has the getdate() function (it may be simple to see here, but in the original package the derived column transformation had many fields). The results of this function may differ in the milliseconds, especially for large tables. Then I looked in the Aggregate transformation and seen that the Create_Date column also was in the Group by operation, meaning that if two rows has different millisecond they will be placed twice in the destination table, although they are the same in every column. That's it, the bug was found. But still, one question remained: Why the query did not show me this? The answer is also simple but tricky to find: In the comparison query I concatenated all the columns in the tables in order to compare the results. When I did this, I casted the Create_Date to nvarchar which truncated the milliseconds.

Conclusions:
  • Pay attention to non-deterministic elements in what you do, whether it's code or ETL process.
  • When you do dummy stuff like checking all the checkboxes in a list - think what are the outcomes.
  • Call Miky when you're desperate.
Monday, September 22, 2008 8:10:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, September 20, 2008
This week I had something disturbing. When I installed Excel 2003 on the Panorama machine in order to use Excel functions in my MDX calculations, the NovaView Desktop stopped working. When I tried to load a view it threw an error in connection message. Calling to Panorama support, they told me that it's a known issue and it's hard to find by using the Panorama knowledge base. So here it is:

If you have connection issues in the Desktop program, enter the registry editor (Start -> Run -> regedit). Look for HKEY_CLASSES_ROOT\MSOLAP\CLSID and make sure it's the same as HKEY_CLASSES_ROOT\MSOLAP.3\CLSID. Remember - always copy from MSOLAP.3 to MSOLAP and not vise versa.

 | 
Sunday, September 21, 2008 6:38:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, September 04, 2008
My blog was down for couple of hours because of this bug which been fixed in the later versions of dasBlog. I haven't found the solution in google search so this post is for those who will search it in the future (google scans every post in this blog). The solution(s) can be found dasBlog/Thread/View.aspx?ThreadId=34910&ANCHOR#Post115981">here.

Friday, September 05, 2008 4:03:37 AM (Jerusalem Daylight Time, UTC+03:00)
In the last years I've seen many astonishing BI web sites. I always asked myself what I need to do to bring my customers such beautiful web-based BI solutions. After having much experience with Panorama NovaView and especially the Panorama SDK I started to run some questions in my mind: Why won't I build some re-usable puzzle pieces that can be joined together to a web site? These pieces can be web controls that using and even interacting Panorama views and Analysis services. Why won't publish it as open source and give it to the BI community?

The PanoramaBasedWebSite project is a toolkit that contains web controls you can easily use in your ASP.NET based web site. The project is written in ASP.NET 2.0 and C# 3.5. These web controls interacts with Panorama views (using Panorama SDK) and Analysis Services (using AMO).
The idea is that you can take these puzzle pieces, combine them as you like in your web site and create your good-looking BI web site with almost no programming. The project is only in its first steps, but I believe that publishing the design/idea is also important. This is why the first release is already published, although it has only two web controls so far. This is what we have so far and what I'm planning for the future. I'll be happy to hear your thoughts/ideas:

First Release Contents

  • PanoramaView web control - this is the main control of the project and it will probably take a lot of the project's weight. The control simply shows panorama view. For now, it doesn't do much rather then showing a view so there's a lot of work to do for this control. It gets two properties - BriefingBookName and ViewName. You can look at the TODO: comments in the code to see what future plans I have for this control.
  • UpdateDatePanel web control - this control shows the date and time when the last process of the cube was made. It can be used in two ways: You can only set the PanoramaViewID property. The control will extract the cube and the database name from the view and take the update date from the cube. The other way is to set the CubeName and DataBaseName properties.
Future Plans

  • KPIView - Already working on it. Similar to PanoramaView, but if the view shows KPI then a drilldown will be made when the user clicks on a gauge.
  • QueryList - Shows the result of MDX query. For example, the list shows the top 10 employees of the month (in sales perspective, for example). This list will be interactive, meaning that clicking on a row will make a drilldown, drill to data or replace the list with another query results.
  • DimensionPicker - Gives the user the ability to pick members of a dimension/hierarchy. After selecting, the control will slice all the views on the page (or only predefined set of views).
  • DatePicker - Same as DimensionPicker but for dates. It will show a calendar to the user and clicking on a date will perform a slice in the views.

The use of the controls in your aspx pages is very easy. You can see for yourself:

<PanoramaControls:UpdateDateLabel ID="UpdateDateLabel1" runat="server" PanoramaViewID="PanoramaView1" />
<PanoramaControls:PanoramaView ID="PanoramaView1" runat="server" Width="100%" Height="80%" BriefingBookName="MikysBook" ViewName="MyFirstView" />

I'll be happy to read your thoughts and ideas about this project. There will be more to come. Stay Tuned.

 |  |  | 
Friday, September 05, 2008 3:37:45 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, August 17, 2008

There is one tiny new feature in SSAS 2008 that you can easily miss. It called Empty Cube. When you create a new cube using the wizard, you can create an empty cube, meaning that it has no measures, dimension relationships, etc. The original use of it is for "users (who) want to create everything manually, or when all dimensions are linked dimensions" (taken from the description in the wizard)

In the past, I written about using SSAS with Visual SourceSafe in order to have source and version control for the SSAS project. I mentioned that it has many disadvantages but the big advantage (source & version control) is bigger so in the bottom line I recommend using it. One of the problems we experienced were that every time that someone creates a new object (cube, dimension, etc.) he has to check out the .dwproj file. The result is that sometimes we have a fight in the team for that file and we shouting: "who taken out the dwproj???" (yeah, I know that we can check who did it inside VSS but shouting is more fun).

The empty cube feature is a nice solution for this problem: When you create a new project you can create all the (empty) cubes and then the .dwproj file is free and no longer needed. I'm assuming that you know which cubes you'll have when starting a new project. The only thing remained is the same solution for dimensions. I'll recommend it in the Connect site (it's not working now for some reason).

Monday, August 18, 2008 3:50:41 AM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, August 12, 2008
Found a great site for BI beginners. Learn Microsoft BI has some videos about BI and SSAS which can place you in a good position as a beginner.

Tuesday, August 12, 2008 9:36:52 PM (Jerusalem Daylight Time, UTC+03:00)
I've been asked to review Widgenie and since I'm a nice person - why not? Widgenie is basically a widget creator that takes data from variety of sources: excel and csv files for now and in the future Google docs and more data sources. After you declare your data source you can change the look of the widget and then you can publish it in variety of ways: Facebook, blog, comments, etc.

I'm writing this post while creating my first widget so these comments are from the first encounter with the product, meaning that I can miss few things, but I believe you readers will get the picture:

  • The limitations on the excel source are way too much: Why the maximum file size is 2M? "The sheet should contain only the data and column headers. Titles, notes and other text outside of the data table will impede the upload" ? Why can't I take my old familiar excel and use it as is? Com'n guys, write some VBA scripts and work this out.
  • When I start from "Create new Widget" and then moving straight to "Create new Data File" because it is my first time, I want to go straight from the end of the data file wizard to when I've been in the create widget wizard. I don't want to start over the create widget wizard.
  • Every step in the wizard has a little question mark with it that explains the current part. It is very intuitive, nicely done and nicely put.
  • The widgets are very beautiful. As a BI developer, I'd be happy to put some of these on the CEO's dashboard.
  • What about multilanguage support? The hebrew columns appears as jibrish in the widgets.
  • The publishing process is not simple enought. I don't want to get a script that I need to place in my blogger/facebook/iGoogle. I want that the process will end at the target of the widget. For example, let's say I want to put the widget in my facebook's profile. I would expect that facebook will be open in the end of the wizard and a new Widgenie application will be created on my profile asking me to choose one of the widgets I created.
  • The text cloud widget is very simple and powerful. It can be very useful for managers.
For conclusion, Widgenie is a very beautiful product that has a long way to do if it wants to stand with the big sharks of the BI world. It has to fix some issues, support more sources and targets and have more capabilities (snapshots support, SSAS integration, excel-style chart editing and more). I don't know why I can't embed here the widgets (it's just ain't working) so here are links to the bar chart and the text cloud widgets.

Tuesday, August 12, 2008 9:12:53 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, August 03, 2008
update: Chris Webb and Mosha commented and made it clear that the reason for this error wrap is the NonEmptyCrossJoin function and nothing else. I also checked and I did not find any other function that wraps underlying errors.

This is something you need to be aware of when you're writing MDX. I don't know whether it's a bug or by-design. I'll be happy
to know (please comment if you know something that I don't).
Consider the following MDX:

SELECT
  NonEmptyCrossJoin
  (
    [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
   ,[Employee].[Employee Department].[Department].&[Sales]
  ) ON 0
FROM [Adventure Works];

The query will return with this error: The Set_Count argument of the NonEmptyCrossJoin function is either negative or larger than the number of sets provided. This is quite reasonable because I written the second argument as a member, where (NonEmpty)CrossJoin expects only sets. So, let's upgrade this member to a set:

SELECT
  NonEmptyCrossJoin
  (
    [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
   ,{[Employee].[Employee Department].[Department].&[Sales]}
  ) ON 0
FROM [Adventure Works];


All I did is wrapping the second argument with {} and we have a set. The query will return 19 columns.
Now, for the interesting part. Let's count the members of this CrossJoin before the we fix it:

WITH
  MEMBER [a] AS
    NonEmptyCrossJoin
    (
      [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
     ,[Employee].[Employee Department].[Department].&[Sales]
    ).Count
SELECT
  [a] ON 0
FROM [Adventure Works];


This return... 0.               
Where is the error?
My guess is that the Count function wraps the error. The NonEmptyCrossJoin returns null and the count of members in null is zero. The meaning of this is that if you'll ever forget to wrap the member with {} you'll always get zero and not an error. This can be very dangerous. Just for the check, running this query after the fix:

WITH
  MEMBER [a] AS
    NonEmptyCrossJoin
    (
      [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
     ,{[Employee].[Employee Department].[Department].&[Sales]}
    ).Count
SELECT
  [a] ON 0
FROM [Adventure Works];


will return 19. This been tested with both SSAS 2005 and 2008 (RC0). The examples here are from RC0.

Be careful with your MDX.

Monday, August 04, 2008 5:28:26 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, July 26, 2008
OpenSearch is one of the beautiful things I discovered lately. If you're using Firefox 2 and above or Internet Explorer 7, look at the search field in the right-top corner of the browser. See the shiny little thingy there? Click on it and you can instantly add two new search engines for fast search through your browser. The first one is my blog's search and the second (and more important) one is the ability to search BiBlogs right from the browser. Yeah, now you can search the whole BI community's blogs with only one click.

I call all the BI bloggers to add this too. It's 5 minutes work and it can help lot of people out there. See here for instructions.

Saturday, July 26, 2008 7:26:41 AM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, July 23, 2008
When you practice on SQL Server on your local machine you don't want that its services will start up with the computer. As I mentioned before, you should declare the startup method of these services as manual (see here). After that, you can build two simple batch files that will start and stop the services. Believe me - it's very comfotable to start and stop the services with only one mouse click. The first batch file (I called it sql.bat) contains only two lines:

net start MSSQLSERVER
net start MSSqlServerOLAPService

The second one (sqlend.bat) looks like that:

net stop MSSQLSERVER
net stop MSSqlServerOLAPService

Note that I only start/stop the SQL Server and analysis services, but you can do whatever you like.

Have fun.

Thursday, July 24, 2008 5:19:08 AM (Jerusalem Daylight Time, UTC+03:00)