Monday, May 23, 2011
After six years of work with SSAS, I decided to start re-reading the best book on the subject - Microsoft SQL Server 2005 Analysis Services. Why read about 2005 when there are more recent versions? That's because this is the most comprehensive book on the subject and the only one that was written by the SSAS's development team. The book gives you a better understanding of the technlogy and shows you the motivation for every architectual decision of the creators. While reading, I come across subjects that I wanted to post about and I believe that along with reading the book I'll post more.

The book cover

At the university, I was taught that half crazy is more dangerous than crazy. The reason is simple: You know that you should always be careful from the crazy guy, but regarding the half crazy person you don't know when to be careful and when to learn from him. Information systems are the same and especially DWH: There is no half truth or not precise truth. If you know that there's a data problem - bubble it up and don't show you user half a truth (because it's also half a lie).

Every time we bind a DB column to a dimension attibute, a DataItem object is created. The DataItem defines the connection between the conceptual and the physical model. For example, it defines from which table and column the data is taken. Along with other properties, there is one important property called NullProcessing which defines how the server behaves with null data. The default is ZeroOrBlank, meaning that the null data will be shown to the end-user as zero (if the DataType is not string) or empty string (if the DataType is string). The problem is in case where the attribute is a parent in an hierarchy. This is what happened:
We defined "Sales Amount" measure under the hierarchy Category -> Sub-Category -> Product. By mistake, one of the members in the categoty level got null value and it was shown as zero in the cube. The user was used to see the Shoes sub-category under the Clothes category. One day, he seen that the shoes sales amount was decreased dramatically - He seen products under the Shoes sub-category which is under the Clothes category but the sub-total was decreased. In the bottom of the table other products were "hidden" with the Shoes sub-category but with the zero category. The reason is that some of the shoes products were given null categories and in the report the shoes products were split between the Clothes category and the zero category. The user got half truth and he interpreted it as wrong data and that should never happen in a BI system.

What can we do? In my opinion, in most cases we need to set NullProcessing as Error, which will throw an error and prevent us from showing wrong data. This is how we (the IT people) will know about the problem before the end-users.
There are more options to NullProcessing:

  • Preserve - keeps the null value with no replacement. As I shown, this is not recommended. Furthermore, preserve uses more system resources.
  • Unknown Member - replaces the null with the unkown member you defined in the dimension. I recommend using this option only if the users know that the unknown member exists and what it means.
  • Automatic - The server will select the most appropriate behaviour according to the context. I don't really understand what it means and the writers don't explain it, but generally it is recommended to define deterministic behavior to your system.

Does it mean that now we'll go over all our data bindings and change their NullProcessing proprety? Of course not ! We need to think where this behavior has a bad effect and where it doesn't matter. For example, in a key attribute of a dimension which is also a leaf level of an hierarchy, the ZeroOrBlank option is quite good.

Monday, May 23, 2011 8:56:43 PM (Jerusalem Daylight Time, UTC+03:00)