Monday, October 22, 2007
If you read my blog from my home page and not via RSS or RSS-based sites, you may see it in the right column of the web page. You can click there and I will have a call from you (if you have Skype installed). If you need any help or explanation about BI stuff, just click and ask. Oh, by the way, you need to have some cash in your Skype account in order to pay.
I'm not trying to be greedy. I'm just trying to earn a little bit of money from my knowledge. If you need a little assistance regarding to BI, SQL Server or Panorama - ask me. If the answer will be quick I will not charge you at all.

So, pick up the phone... ;-)

Monday, October 22, 2007 7:51:52 AM (Jerusalem Standard Time, UTC+02:00)
I believe that every BI developer seen this in many Data Warehouses: Boolean Dimensions. As you may guess, boolean dimension is a dimension with only two members and of course with no hierarchy. For example: cash/credit card in sales cube, exists/not exists in inventory cube, etc. If you haven't seen this phrase before - relax - I just invented it. :-)
Now, the question is what to do about these dimensions:
a. Include them in the ETL process or just leave it as is?
b. If you put it in the ETL - how would you implement it?

Here's what I did in my project. You may disagree with me and I would like to see other approaches too.
a. Yes, I included it for some reasons. As every Pragmatic Programmer knows, everything can be changed so do not assume anything as globally-static. This rule takes place in here: Boolean dimensions may grow and have more members. For example, in the sales cube I mentioned above, maybe there will be another way to pay such as exclusive card of the shop (There is a network here in Israel who has it). Even male/female boolean dimension may have an Unknown member. So never exclude these dimensions from your ETL process. Wait - one more thing. You may think: Why interrupt my ETL process with these silly dimensions? If they'll grow up I'll add them to the process. As an answer think about the timings: You can never know how much time the dimension's ETL will take (although it will be very small), so in order to stay away of surprises - include it in your ETL process. just for case.
b. I implemented it as two hard-coded expressions and sent them to union. The result of this union will enter directly to the target table. In Informatica, the mapplet can't start without source table so just put a dummy table with only one row and connect it to the expression items. Why only one row? If the table will contain more than two rows then the Informatica server will consider the process as failed one.

As I said, I'll be happy to read other approaches other than mine.

Monday, October 22, 2007 7:44:45 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, October 15, 2007
I really think that the time dimension is the most complex dimension in 90% of the DWHs. The complexity is in two places: In the DWH design and also in Analysis Services (or any other BI tool).
First of all - why we didn't take the already-made Server Time Dimension which exists in SSAS 2005? For two reasons: The first is that the Project Real guys do not recommend using it (you can find their SSAS article here). The second is that we wanted to have some features that are not available in the server time dimension, such as Hebrew date. In a matter of fact, even if we didn't have such feature we still would build the time dimension ourselves because it's giving you much more control over the dimension. For example, you can always add some new attributes which Microsoft developers didn't think about.
I started myself to build the time dimension in excel. I figured out that this mission is little more complex that I thought it would. Most of the functions I wrote were simple, but there were some complicated ones. So here are some tips for you if you want to build your time dimension using Excel:
  • If you want to week number for every date, do not write the function yourself... Excel has function called weeknum. If you don't have it just add the function toolbox which has it (I can't recall its name right now. check in excel help).
  • If you want to have records for every level in your hierarchy (not only for days), put every level in its own excel file (not excel tab). It will help you later when you will transfer it to your DB.
  • Check yourself. Pick randomly some dates and check that all of its record has correct data.
After building the excel files I needed to transfer it to my Oracle server. I used SSIS because I didn't want to wait for my DBA to copy these files into the Informatica server (it can't use my the local files, it has to be in its server. SSIS can use local files). This also was a little tricky. First of all, close excel when running the SSIS packages, otherwise it will fail. Second, when moving the non-leaf levels, go into the columns section in the destination box and erase the irrelevant columns. It will reduce the chance for errors. Finally, click on the source box and click on "Show advanced editor". Enter the source's output columns options and define properly the columns' data types. This also will reduce the chance for errors.

I had a little bit of an argument with my DBA about how should the time dimension be. I think that the time dimension does not have to be processes at all. My time dimension is from 1960 until 2020, so no daily ETL is required. She says that all the logic has to be in Informatica so I need to develop a mapping for this dimension. I think that we both are right and that's because that in ideal world she is right. In every developers team, all the BL has to be in one place. But we don't have much time (the deadline is very close) so I won't spend the time building more mapping in Informatica when I have the time dimension already made in excel.

Maybe someday I will have the time to do this. Maybe not.

Monday, October 15, 2007 8:18:38 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, October 14, 2007
I guess that this will not be my last post on this subject, but I want to start sharing some thoughts and tips from my experience when designing and building DWH. In this post I will focus on the fact & dimensions tables relationship in terms of data completeness (if you wonder what it is, read on).

Before you start to design the DWH, sit and talk with the people who built the systems which you take your data from, including the DBA. For every table, ask them what is the primary key (it's NOT always defined properly in the DB), then ask them again and then ask them if they are sure. It happened to me that I discovered that the systems guys were wrong about their DB's primary keys.
The same thing is about Foreign keys and here you should be even more careful. Even if they claim so, check yourself that every foreign key in the fact table is placed correctly in the dimension table, especially when the fact table has far history records. Sometimes system developers or even worse - system DBAs delete records from the dimension tables that are not relevant. This will cause that these keys will still be in the fact's history records but will not be found in the dimension table, causing uncomplete relationship between the fact and the dimension table.

So far is about the part when you talk and "investigate" the system developers (the DWH design). What to do when you actually developing the DWH? First, develop the tables of your dimensions tables. Do not forget to add the primary keys in the dimensions tables and the primary and foreign keys in the fact table. Then develop the ETL processes and go for the dimensions first.  If you know that the dimension has completeness problems with the fact table that you will develop later (you talked with the system developers, remember?), add UNDEFINED (UD key) record for the dimension table. Later, when developing the fact table's ETL process, make Join with the dimension table and check that the records' foreign key exists there. If not - change the key to UD. In SSIS and Informatica (and I guess that also in other products I don't know, such as DataStage) you can use Lookup instead of Joiner if the dimension table is less that 1G records. That will optimize the ETL process. After you developed all your ETLs, run the dimension processes. After they finish (assuming everything went OK) run the fact table's ETL process. If it succeeded you can go and have a drink. If not - check what went wrong. If you want to know which keys didn't showed up in the dimension table and causes the incompleteness problem, you can disable (not delete) the foreign key from the fact table and run the process again. Then, with a simple SQL query, check which foreign keys don't exist in the dimension table. Go back to your ETL design and check what you did wrong. As I pointed before, in this step you might be very angry at the system developers...

That is all for now. As I said, I assume that more ideas will come on in the future.

Sunday, October 14, 2007 7:11:49 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, September 30, 2007
This post is about Panorama because it is the UI tool I'm working with, but this can be made with every BI UI tool.

My customer wanted to get the effect shown by Analysis Services 2005 when browsing a dimension (see the picture below). He wanted to see some properties of the members shown in the rows, along with the usual measures. Unfortunately, Panorama (and I'm sure that also other tools) does not have this option in the GIU. The solution is this code:

Create Member CurrentCube.[Measures].[MyProperty] as
  iif(IsLeaf([MyDimension].[MyHierarchy].CurrentMember),
     [MyDimension].[MyHierarchy].CurrentMember.Properties("MyProperty"),
     Null)

Note that declaring only the third row will cause that every member that is not a leaf will cause an error, which is something we don't want the viewer to see. If the dimension has properties for members in other levels too, you can adjust this decleration. This member can be declared either in the DataBase's Script (after the CALCULATE expression) or inside the session/query (not recommended in Panorama). Now, all you have to do is to show the dimension's members in the rows and this new measure in the columns (after or before the regular measures), and you'll get what you want.

Monday, October 01, 2007 4:47:12 AM (Jerusalem Daylight Time, UTC+03:00)
 Monday, September 24, 2007
My friend, Ilya, had a problem in SSIS. He had a .csv file with too many commas. The meaning is that strings that started and ended with inverted commas (") and had commas inside it were recognized by SSIS as new column. For example, the row:
"My name, is Miky", 200, 10 was recognised by SSIS as four columns instead of three. Ilya wrote down a code for SSIS (in VB) that run before the package begin its work. Here it is, hope it will help who ever seen this.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Text
Imports Microsoft.VisualBasic.FileIO
Public Class ScriptMain
Public Sub Main()
  Dim csvFileFullPath As String
  Dim tabFileFullPath As String
  csvFileFullPath = Dts.Connections("Your CSV Connnection").ConnectionString
  tabFileFullPath = Dts.Connections("Your Table Connection").ConnectionString
  Using tabStreamWriter As New StreamWriter(tabFileFullPath, False, System.Text.Encoding.GetEncoding(1255))
  Using csvFileReader As New StreamReader(csvFileFullPath, System.Text.Encoding.GetEncoding(1255),True)
  Dim currentRow As String

  currentRow = csvFileReader.ReadLine()
  tabStreamWriter.WriteLine(currentRow)
  While Not csvFileReader.EndOfStream
    Dim outputRow As New Text.StringBuilder()
    Dim tmp, tmp1 as String
    Dim offset as Int32 = 1
    Dim beginS, endS As Int32

    beginS = 1
    currentRow = csvFileReader.ReadLine()
    beginS = InStr(offset, currentRow, """")
    While Not beginS = 0 Or offset > Len(currentRow)
      endS = InStr(beginS+1, currentRow, """")
      tmp = Mid(currentRow, beginS, endS - beginS)
      tmp1 = Replace(tmp, ",", " ")
      currentRow = Replace(currentRow, tmp, tmp1)
      offset = endS + 1
      beginS = InStr(offset, currentRow, """"")
    End While
    outputRow.Append(currentRow)
    tabStreamWriter.WriteLine(outputRow.ToString())
  End While
End Using
End Using
Dts.TaskResult = Dts.Result.Success
End Sub
End Class

The solution here is to search for any comma (,) that is between two inverted commas (") and replace it by space.
Although it is a good solution, I would take another solution: Replace any comma by special string, such as &Miky&, convert the csv file into table, and after that go over that column(s) and replace any &Miky& by comma.

Monday, September 24, 2007 7:31:56 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, September 09, 2007
I was asked how to get to SSIS log to see how much time took for the package to run.
Well, that depends.
On Development:
When developing new package, after running the process (click on the green arrow or press F5) there's a new tab called Progress. Clicking it will show you everything about the package's execution, including the time it started and the time it finished.

On Production:
When developing the package, open the SSIS menu (Yes, there is a menu called as the product's name. Microsoft...) and click on Logging... There, you can define logs for your package. You can log in many ways: Writing to SQL sever, output file, XML file and more. I recommend logging into SQL server and logging only the big and "hard" parts in your data flow. In the Details tab, pick up only the exceptional events, such as onError, onTaskFailed and onWarning. If you wish to know how much time took for you package to run, also pick up onProgress.

Follow this link to read about every event in SSIS.

Monday, September 10, 2007 6:09:23 AM (Jerusalem Daylight Time, UTC+03:00)
I won't cover here the topic of Exception handling in MDX, but show you a funny thing that I have never seen in any computer language. Consider this MDX code:

iif (1.0e+40 * 1.0e+40 = (1/0), "Overflowed", "Didn't Overflow")*

On some processors, this code will output "Overflowed". That's because this multiplication will overflow and (1/0) also overflows, so what we have here is two "overflow values" that are equal.

Where on earth have you seen something like this???


* Taken from the book "MDX Solutions" second edition, p. 136


 | 
Monday, September 10, 2007 5:54:32 AM (Jerusalem Daylight Time, UTC+03:00)
I'm almost done with my exams, so my writing can continue.

This post is not about how to customize your Dashboard (well, not only about it). Its purpose is to say it loud: Customize your Dashboard!
When the executives of your company (Yeah, I guess you work in a company. Does someone building Bi Portal for himself?) see the customized gauges with their company logo on it, they'll love it. No matter what these gauges will show them, you got their attention and their sympathy for the Dashboards site you made. Now, everything is easier. The bosses are in your hands.

For the Panorama NovaView users:
  1. Follow this link to learn how to do this.
  2. Do NOT start working before you backup your E-BI/KPI folder !
  3. I recommend using Notepad++ or another good XML editor when writing in the XML files. Otherwise, you can mix the whole file and you'll have to start all over again.

 | 
Monday, September 10, 2007 5:30:52 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, August 11, 2007
A good example of how to not use Ajax.

Few months ago, Blizzard Entertainment announced that Starcraft 2 is on the way and they opened a web site to make sales promotions. In the sections where you see details about units and buildings in the new game, clicking on a unit will change 90% of the page with Ajax (no refresh in browser). The problem is that many heavy pictures and flash are loaded using this method and this makes the browser freezing for a while. I tried it with different browsers and different computers but it won't help. The browser always freezes.

Think twice when designing Ajax-enabled web page. Downloading too much data with Ajax can perform troubles. And troubled users won't come back to your web site.
Sunday, August 12, 2007 6:09:56 AM (Jerusalem Daylight Time, UTC+03:00)
We've been working for a while to enable SSO in our Panorama's Dashboard site. In a matter of fact, the responsibility for this was under the skilled hands of our system team. After a short time they succeeded and SSO was established in our site. We saw it when we entered the site: Instead of login page we directly entered the dashboard page.
After a few days, when I entered into the settings section of the dashboard site, I saw this:



Yes, that's right. No security at all. This is why we entered directly to the dashboard page instead of the login page...
The system team claims that they never said that the SSO succeeded and we say they did. No one will prove he's right, so there's no one to blame. But blaming is not everything. The important thing here is to learn for the next time: When you think you got a feature - check it. Things not always as they seems to be.
Sunday, August 12, 2007 5:52:53 AM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, August 01, 2007
While reading the first chapter of the book "MDX Solutions With MS SQL Server Analysis Services 2005 And Hyperion Essbase", I wrote down some important notes, especially for the MDX beginners. Even if you're experienced user, check this out. You may find something useful.

  • If you were a code programmer in your past, you can relax: MDX don't care about capitalization.
  • Don't even try to skip an axis: It's impossible and it is meaningless. Use the predefined names for the axis, such as: columns, rows, pages, etc.
  • You're new to MDX and the whole OLAP gives you a headache? Try to imagine this as a hypercube. It can help you a lot.
  • When writing large queries, pay attention to the "readability" of your MDX. Use the Monospace fonts whenever possible.
  • Do NOT think of SQL when learning or working with MDX. Although the syntaxes may look alike, these languages are totally different when you get to know them.
  • .Members will give you all regular members. .AllMembers will also include calculated members.
  • An expression like [Time].Members won't work if the Time dimension has multiple hierarchies.
  • The asterisk (*) can replace the CrossJoin function. It may improve readability of the code.
  • When using Order() function, you can specify a sorting criteria which is not shown in the result grid.

 |  | 
Thursday, August 02, 2007 3:04:47 AM (Jerusalem Daylight Time, UTC+03:00)