Wednesday, September 14, 2011

You must be wondering why this blog wasn't updated for a long time. The reason is that I started a new job. :-) Along with other stuff, I'm playing with QlikView and I'd like to share some interesting ideas.

In QlikView, The list boxes which filters the entire sheet (or document) is a very common thing. I thought that if we have a box list with a small amount of values, it might be more beautiful to replace each value with an image. For exmaple, let's say that we have three divisions in the company. Instead of putting a list-box with their names, we can put three images with their logos and make that a click on the image will filter on these values (very simple with the Actions tab). The problem is that the user won't know on which division the sheet is filtered on. That's why we need to make the other images more transparent. Using macro, that's very easy and the result is awsome:

sub SetTransparentImage
  set image = ActiveDocument.GetSheetObject("ImageIDHere")
  set imageProps = image.GetProperties
  imageProps.Layout.BkgAlpha = 50
  image.setProperties imageProps
end sub

Now, put that in the macro script inside your document. In order to add images to your sheet, add new Text boxes and set their background to the image you want. In the Actions tab, set the desired value on click and also run this subroutine.

Enjoy.
 | 
Thursday, September 15, 2011 12:23:04 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, June 23, 2011

The problems: A. Storage costs money, much money. B. Today's storage is very slow and we want fast retrieve of data.

The Solution: Take the commonly-used data and put it in very fast and expensive storage. Take the less used data and put it on slower and cheaper storage. The overall budget should be much lower.

Link (pdf)

Friday, June 24, 2011 3:42:33 AM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, June 15, 2011

For the google search: Informatica service start and stops

Yeah, that's exactly what happened to me. Everything looked just fine until the last step of the installation, where the installer pings the new integration service, but there is no answer.
Looking at the services manager, I see that the InformaticaService is stopped. When I restart it, it stops after 5-10 seconds. I thought it's because of the service's account settings, but it's not. After some time of search and re-attempts to make it work, this is what I can say if you're stuck as I was:

  • While you're in the installation wizard, on the node configuration step, press the "Advances Settings" button. There, you'll see the min. and max. port numbers. Go back to the node configuration and change the port number so it will be in that range (why it wasn't in the first place? ...)
  • The Windows PATH environment variable must contain the informatica server binary folder: <INFORMATICA_INSTALL_DIR>\Server\bin
  • If you're still stuck, look after the service.log file in the informatica directory. Look at the error and google it.
Hope I saved somebody a few hours.

Thursday, June 16, 2011 6:57:43 AM (Jerusalem Daylight Time, UTC+03:00)
 Monday, May 23, 2011
After six years of work with SSAS, I decided to start re-reading the best book on the subject - Microsoft SQL Server 2005 Analysis Services. Why read about 2005 when there are more recent versions? That's because this is the most comprehensive book on the subject and the only one that was written by the SSAS's development team. The book gives you a better understanding of the technlogy and shows you the motivation for every architectual decision of the creators. While reading, I come across subjects that I wanted to post about and I believe that along with reading the book I'll post more.

The book cover

At the university, I was taught that half crazy is more dangerous than crazy. The reason is simple: You know that you should always be careful from the crazy guy, but regarding the half crazy person you don't know when to be careful and when to learn from him. Information systems are the same and especially DWH: There is no half truth or not precise truth. If you know that there's a data problem - bubble it up and don't show you user half a truth (because it's also half a lie).

Every time we bind a DB column to a dimension attibute, a DataItem object is created. The DataItem defines the connection between the conceptual and the physical model. For example, it defines from which table and column the data is taken. Along with other properties, there is one important property called NullProcessing which defines how the server behaves with null data. The default is ZeroOrBlank, meaning that the null data will be shown to the end-user as zero (if the DataType is not string) or empty string (if the DataType is string). The problem is in case where the attribute is a parent in an hierarchy. This is what happened:
We defined "Sales Amount" measure under the hierarchy Category -> Sub-Category -> Product. By mistake, one of the members in the categoty level got null value and it was shown as zero in the cube. The user was used to see the Shoes sub-category under the Clothes category. One day, he seen that the shoes sales amount was decreased dramatically - He seen products under the Shoes sub-category which is under the Clothes category but the sub-total was decreased. In the bottom of the table other products were "hidden" with the Shoes sub-category but with the zero category. The reason is that some of the shoes products were given null categories and in the report the shoes products were split between the Clothes category and the zero category. The user got half truth and he interpreted it as wrong data and that should never happen in a BI system.

What can we do? In my opinion, in most cases we need to set NullProcessing as Error, which will throw an error and prevent us from showing wrong data. This is how we (the IT people) will know about the problem before the end-users.
There are more options to NullProcessing:

  • Preserve - keeps the null value with no replacement. As I shown, this is not recommended. Furthermore, preserve uses more system resources.
  • Unknown Member - replaces the null with the unkown member you defined in the dimension. I recommend using this option only if the users know that the unknown member exists and what it means.
  • Automatic - The server will select the most appropriate behaviour according to the context. I don't really understand what it means and the writers don't explain it, but generally it is recommended to define deterministic behavior to your system.

Does it mean that now we'll go over all our data bindings and change their NullProcessing proprety? Of course not ! We need to think where this behavior has a bad effect and where it doesn't matter. For example, in a key attribute of a dimension which is also a leaf level of an hierarchy, the ZeroOrBlank option is quite good.

Monday, May 23, 2011 8:56:43 PM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, March 29, 2011

We're doing something very useful at our workplace and because I believe in collaboration I would like to share it with you, dear readers. I'm sure it will help a lot of system analysts and developers.

When we write down the design of the ETL process, especially regarding complex fact tables, we find it necessary to write it down as a flowchart. This creates a simple and common language between the designer/analyst and the developer and even with the customer. It's much easier for everyone to read a flowchart rather than reading a long and complex document. Therefore, we create diagrams such as these using Visio (this diagram was created using Google Docs, but it's preety much the same):

I only brought here the beginning of the diagram. The real one is 3 times bigger.
Note that every leaf in the tree has its own number. These numbers indicates the record's "case number". Now for the trick: In the target table (fact, dimension or anything else) we add a new column that contains the number of the case, according to the numbers shown in the diagram. This field is not exposed to the end-user. The pros of this trick are very significant:

  • Already in the early stages of the implementation we can check the statistics, meaning how many times each case happens and see if we got what we expected. This can help us find new things that we (and sometimes even the users) weren't aware of and find bugs in early stage of the development.
  • Each record can "tell its story" and tell us "why it was created". This saves us a lot of effort when we want to question the data and find the source of the problems.
  • During the tests we check two different things: Whether each record was "identified" correctly by the ETL (by checking the case number) and whether the values entered into the other fields are correct. While once we used to check only the values correctness and then find the problems, now we can detect the problems much faster.
After we came up with this idea, I thought of another one: Why won't we expose this new field to the users/analysts and let them "enjoy" this knowledge? Maybe these statistics can also help them with the data analysis? This requires us to build "cases" dimension and add a description for every case number. I don't think that this is a good idea in any circumstances, but you should know that it's an option.

Tuesday, March 29, 2011 11:20:39 PM (Jerusalem Daylight Time, UTC+03:00)
 Monday, March 07, 2011

This post is the fifth (and last, I think) in a series about HR BI systems. In the previous post I've shown how to make the BI system smarter by cleaning records and finding Super-Transactions. In this post I'll complete a few last points and introduce you to some interesting reports I built.

Finding Super Transactions

In the last post, I introduced you the concept of Super Transactions (ST). I explained how to make the ETL process find them in the raw data and how to model them in the DWH and let the end-users use them in their analysis. However, I didn't explain how to find them in the initial design phase.

Objective: Find the material processes that happens to many people.
The main problem: There are many people and we don't know where to start looking! We should think about a common denominator to all the processes in order to find them.
The key observation: People are boring. Meaning, in terms of their statuses or various trailers. Think about your personal diary (in terms of what we talked about in the previous posts), or the "log" of your life. Think how many times you changed your martial status, salary, workplace status, etc. Yes, you have no more than a few records. You change something only once in a couple of years (in the best case).
On the other hand, when an employee undergoes a material change (ST), there are many changes that happens (almost) at once in different appendixes (or subjects). When the student changes his position in the university, he changes in short time his position, salary, courses and more.
Let's show it in a simple graph. The X axis is the time line and the Y axis is the number of appendixes that changed:


This shows us that most of the time our life is static and suddenly you find there are "peaks" that indicates that something "big" happened. This is where we'll look for the STs.
How we'll find them? I'll take you back to the advanced modeling of the project. For every record in the diary we added end_effective_date column which has the date of the next record minus one day. That enable us to write between statements and make our life more simple. Back to our case: using analytic functions, I added a field that says how much time passed between the current and the previous record. As I said, the number are very large in most cases. The cases where there's some records with small numbers are the cases we are looking for. It says that something important happended. Using this method I found eight out of nine (!!) which implemented in the system.

Incremental Load

In an early stage of the project we realized that it will be very "unwise" to load all the people all the time. As I mentioned, most of the people don't change every week. The problem is that incremental load of the diary ("log") of people is not simple at all. The first reason is that we need to update the end_effective_date and ind_current of the last row, but that's not difficult. The second and main reason is that updating and adding one record is not enough: in case that the man was in a middle of a ST we need to re-calculate the ST. This is why we decided to completely delete the records of the people that changed (in one or more appendixes) and rebuild them from scratch. This is why we run the ETL only on for the people that changed. This solution is very simple and easy to implement. All we need to do is to delete these people's records from all three diaries and then run the ETL only on them. Thus, the ETL process is very short (half hour total) in every night and only the first run (on all people) is (very...) long.

Interesting Reports

I'll show here some interesting reports I built for the users in order to stimulate your thinking:

  • Organizational Pyramid - the ratio, over the years, between the number of different populations. For example, the ratio of senior lecturers, lecturers and students. The report is shown as stacked graph bar. The graph answers the basic question: Do we have enough students to make them enough teachers in the coming years? Do we have enough lecturers to make them enough senior lecturers in the coming future? In this picture I built a very similar graph in Excel that shows how many students of every year we have.

  • Students Rift - For every faculty, the gap between the number of students that the university wanted to have and the actual number of students. You can change the time parameter and see the screen as it was on another date. You can also drill-down on every faculty and see the raw numbers.
  • Number of Promotions & Retirements - For every year, how many promotions & retirements were made. You can drill-down the numbers and see the names and details of the people.
  • Grades Graph - Classic line graphs othat shows grades for a lecturer or faculty over the timeline. Excellent tool to discover interesting trends such as professors who write too difficult exams or decrease in the academic level of a faculty.

To conclude: I hope you enjoyed the posts about BI HR. I got some mails regarding these posts and I'm glad to hear that I'm helping people here in Israel and even over the world.

Monday, March 07, 2011 9:31:10 PM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, January 05, 2011

This post is the fourth in a series about HR BI systems. In the previous post I've shown to advanced model of the solution, the ERD of the datawarehouse and I shown how to answer most of the business questions using that model. In this post I'll show a more complex issue which is smart records filtering and Super Transactions. I will introduce you to this subject and show you how I implemented it.

To better understand this post, I highly recommend you to read the previous post first. Let's summarize the last post: We reached to the conclusion that in order to answer most of the business question regarding HR, we need to build a very wide Fact table (meaning that it will have many fields) that contains all the relevant fields from the appendixes. I introduced you to the problem of irrelevance and I shown the way to solve it using the N/A value instead of null when the appendix hasn't started yet.

Actually, after doing what I described in the last post we finished 90% of the project. Now, only the other 90% remains... ;-) The thing is that another main requirement remains (although basically it was our idea). The reason for which I didn't mention it in the previous posts is that I didn't want to bring in this complex issue in the first steps. I wanted you, my reader, to stay focused on the main structure of the solution. The requirement is to clean and improve (from the data point-of-view) the personal diary records and identify personal processes. I will explain later what this means. Note that at this point the example of the university is not so relevant so I apologize for the weird examples.

Cleaning Records

For example, let's say that we have one lecturer (or more) that has swift changes in his salary: During the month his salary changes but in the end of the month it always goes back to the original salary, meaning that he always gets the same salary in his bank account. I know it sounds weird, but I've seen a lot more confusing cases in financial systems. If we'll bring the lecturer's data to the DWH as it is, we'll have a number of serious problems:

  1. In case that the user filter the report on the date where the lecturer has the temporary salary, the user will get a wrong result because he won't see the "real" salary ("real" is a problematic word in this case. We'll discuss it later).
  2. The rapid changes in the lecturer's salary appendix will cause great growth in its diary. This is problematic because of disk space, but the main problem is that the diary that will be shown to the end-user will be "junky" with a lot of unnecessary records which will prevent the end-user from seeing the true changes in the diary. In other words, we want a cleaner personal diary.
  3. On the other hand, there are other users (especially business analysts) that want to analyze exactly these cases so we can't just delete the records...

Super Transactions

Super Transaction, or ST, is a personal process or change in the man's life which reflects some appendixes and can happen on more than one day. For example, let's say that we have a tutor that got a promotion and becomes a lecturer. By doing that, he changes his Position (tutor -> lecturer), Courses (now he teaches other courses) and the most important of all - the Salary. These changes won't have to happen exactly on the same day. Its position can be changed when he got the announcement but the salary will change only in the end of the month. From the technical point of view, we can say that the man only had some changes in some of his appendixes. But from the business point of view, he has a serious career change aka he been over a process, or a ST.

The requirement is to identify and flood these STs and even let the users investigate and filter by them (You can read more about it in Kimball's book).

So, how we'll make it real? In particular, how we'll let the users see both the original and the cleaner data? First things first, so let's talk about the big picture - the model of the solution.

General Model

To fulfill these requirements we'll define three work modes. The user can switch between them at any time.

  1. Full Mode - In this mode, the user see exactly the data of the source systems. The records filtering and the ST detection won't take part here.
  2. Filtered Mode - In this mode the user will have cleaner records and he will see the ST.
  3. Integrated Mode - The user will see both the original data and the filtered records, so he'll can compare them and have a better understanding of how the system works.

Let's wear the hat of the DWH analyst. If we'll want to add a filter to the table according to the work mode that user has chosen, we'll have a major problem. Every record can be belonging to more than one mode. A record that won't be filtered belongs to every mode. A filtered record only belongs to the filtered and the integrated mode. ST record (we'll talk about it later) also belongs to the filtered and the integrated mode. The naive solution says that we'll add three indicators (boolean columns) that will say for which work mode this record belongs and we'll filter the table using these columns according to the work mode the user has chosen. The problem is it will cause great performance problems so we have to create indexes for these columns. But, creating indexes for boolean records is not very affective because the variance is extremely small.

The solution we found is to split the diary into three tables, one table for every work mode. When the user changes the work mode, the connection will change and the system will take the data from another table. The great benefit in this solution is the performance gain. The query is mapped on the appropriate table and we don't need to filter according to the work mode. In order to implement this, you need a reporting tool that can dynamically change the connection, but I believe that any reporting tool can do it after some workarounds. In Reporting Services, you can do it by defining a Stored Procedure as the Data Source. The SP will get the required schema/table as a parameter and use it in the query (you can see code here). In OBIEE, you can do it by using value-based fragmentation (instructions here). The disadvantage is that you take unnecessary disk space, but we prefer performance over space.

The ETL process now has three targets, one for each work mode. The filters inside the ETL process define in which table, or work mode every record will be loaded.

Cleaning Records

Let's start from the most important thing: It's very important to make the customers understand that our BI system is not a artificial intelligence system and it can't guess which records need to be filtered. We need to make clear that the system will work by predefined rules. This is why the preliminary step is the most difficult: We need to define the rules, test them, improve them, test again and so on. A good example is that: If a person changes its status (in almost any appendix) and then changes it back or change it into another status, this is a "dirty" record which needed to be cleaned. This simple rule can filter a lot of noise in the data. It happens a lot of times that a user of the source system changes something, see that he has mistaken and turning things back. The problem is that some of the system only add rows and don't update them is such cases. This is why it makes sense to filter these records.

After we make sure that we and the customers understand each other, we can start with the implementation. This can be very easy or very difficult, depends on the ETL tool you're using and especially on the complexity of the rules you defined. While the ETL tools usually look on one record at a time, we need to compare the record with its predecessor, the record after it and sometimes even more. In this step, I recommend you to stop, take a paper & pen and start to think how to implement it before you're opening your ETL tool. The most important thing is to check that the results satisfy your expectations. Maybe you'll have performance problems here, but you can come back and handle them in the final steps of the project (personally, I prefer dealing with data correctness in the first steps and performance in the last steps). As I said, after you decide (in the ETL process) whether it's a regular, filtered or ST record, you'll route it to the appropriate target(s).

Super Transactions (ST)

This is the most difficult part of the project. Take a deep breath.
I'll remind you what it is from the business view and this time also from the technical point of view. ST is a personal change, or process that happened to a man. This change will often be reflected in more than one appendix and can happen on more than one day. Remind yourself with the tutor that became lecturer. The salary change happened weeks after the position change. The requirement is to flood that a big change happened here and not only some changes in some appendixes.
Let's go on with the technical stuff. There are some basic questions regarding ST:

  1. What will happen with the records that are contained inside the ST? contained record is a record which is a part of the change (salary change or course change in our example). If all the changes happened on the same day, will we mark the record as a ST record or maybe we'll add a new ST record? If the changes happened on more than one day, will we add a new ST record or maybe we'll mark one of the records as the ST record? Which one? What will happen with the rest of the contained records?
  2. What will happen with records that occurred on the same time as the ST but don't belong to it? For example, let's say that when the tutor became lecturer he also became a father (Mazal Tov!). The diary will have a new record for this change, but it doesn't belong to the ST. What we'll do with this record?
  3. Will we run the ST detection before or after the record cleaning? It will change the results dramatically.

Enough with questions. let's start with the answers. We thought and got to the conclusion that there is a strong connection between the work modes and the ST.
We defined the full mode as the mode where the data is exactly the same as the source data. This is why we won't clean records here and we won't mark any ST records. We want to keep the balance with the source systems. Our greatest threat is that the users will say: "Hey, this is not what I see in my (source) system. This whole BI is crap!!!"

On the other hand, the filtered mode is where our system becomes smarter. Here we can add the record filtering and the ST detection. This is why here we'll mark ST records. Why mark and not add a new record? Because this will break our most basic rule - one man cannot have two records with the same date. This is not just a technical problem (we can add Is_ST column and add it to the table's primary key). This is a conceptual principal. We added a new column named ST_Name.(If it's null then the record is not a ST record).

Which record will we mark as the ST record? Very simple: As part of the design stage, we'll decide (along with the customers) for each ST which is the leading record and we'll mark it. Often, this is the record that will help us to detect the ST. Going back to our example, the ST record is the one where the tutor changed its position. In this record, we'll make ST_Name=Position_Change. Don't forget to mark the contained records as filtered and in the what_changed columns in the ST record, add the other contained appendixes. Back to the example, Write in the what_changed column that also the courses and salary changed as well. We'll also update these appendixes' column in the ST record with the new values.

What about not-contained records (such as the family status change)? Our solution completely covers that. The ST detection only handles with the appendixes that we defined as contained in the ST. Even if the change happened in the same date as the position change, it won't affect anything at all.
Regarding the order of the actions (ST detection or records filtering) - We thought about it and came to the conclusion that it is better to first clean the records because the ST detection process needs to see the best and most clear picture as it can for a better ST detection. We've been glad to see it's right in the tests we did.

Summary

The record cleaning and the ST detection that the BI system does is a quantum leap. The system does not only reflect the source data, but it also add an "artificial intelligence" layer which helps the user navigate the mixture of data. As a matter of fact, this makes the system turn data into knowledge. On the other hand, it doesn't forces itself on the end-user and it enables him to see the original source data. It's important to say that the design is a dynamic and "breathing" process that doesn't end on Day 1 of the system. As time goes by, things and business rules are getting clearer and better so we come back and sharp the rules.

In the next post (the last of the series in my opinion) I'll discuss even more interesting issues such as differential load, how to find ST in the design phase and some of the most interesting reports you can do with our HR DWH. Stay tuned.

Wednesday, January 05, 2011 6:29:09 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, November 14, 2010

This post is the third in a series about HR BI systems. In the previous post I've showed the basic design problems. In this post I'll show how we solved them. I recommend reading the previous post before reading this one.

I finished the previous post with a lot of questions and zero answers. I have shown that the classic star-schema won't work here because it can't answer the most basic questions in the HR business world. At this point we decided to stop and make the most reasonable move - use the wisdom of others which dealt with similar problems. We opened the classic book of Kimball, who's considered to be the father of DataWarehousing. I sat for a while and read the HR chapter. Between the lines, I recognized that Kimball had the same problems we did, which brought him to model the DWH as follows:

Instead of classic star-schema, we'll build here a very wide table which we'll call the Diary. This Diary will contain all the appendixes. For example, in our case (look at the previous post) the fields are: ID, start_effective_date, end_effective_date, first name, last name, more static fields (which don't change with time), department, more fields from the department appendix, position, more fields from the position appendix, course, more fields from the course appendix, payroll, more fields from the payroll appendix, IsCurrentRow. The basic logic of this idea is that each row was created because there's a record in one of this person's appendixes which was created in that date (or maybe in more than one appendix). We take the data from the record in the corresponding appendix and bring the relevant data for that date from the other appendixes. For a better understanding, I'll show a little example. Here are the four appendixes: (Note that for the sake of the example, a student/lecturer can have only one course at a time)

Department Appendix
012 01/01/2000 History
012 01/06/2005 Computers
Position Appendix
012 01/01/2000 Student
012 01/01/2006 Lecturer
Courses Appendix
012 01/01/2000 Middle Ages
012 01/06/2005 Computer Basics
Payroll Appendix
012 01/02/2006 Basic

After making the "merge" of the appendixes this is how the Diary will look like:

ID Start Date What Changed Name Payroll Department Position Courses End Date
012 01/01/2000 Department
Position
Courses
Miky Schreiber History Student Middle Ages 31/05/2005
012 01/06/2005 Department
Courses
Miky Schreiber Computers Student Computer Basics 31/12/2005
012 01/01/2006 Position Miky Schreiber Computers Lecturer Computer Basics 31/01/2006
012 01/02/2006 Payroll Miky Schreiber Basic Computers Lecturer Computer Basics 31/12/2999

So what have we done here?

We built a diary that combines all the other appendixes. For each record, or point in time, you can see what was the situation of the person from every appendix's perspective. That is, if on date X only the department appendix was changed, you can still see what was the situation in the position appendix. Hence, you can ask business questions that combines several appendixes even when these questions are aggregative. For example: how many students which have the lecturer position get paid basic salary? I invite the reader to read the business questions in the previous post and see how you can answer them using this model. You must understand that with this model you must always fetch only one row per person or otherwise you'll have wrong results. In order to do that you need to filter the table on a particular day (using expression like: where myDate between StartDate and EndDate) or use IsCurrentRow=1 (you can calculate this column in the ETL).

How did we do that?

I won't cover here the entire ETL design, but I'll explain the key principles: First, we build the avenue. The avenue is a small table with only two columns - the ID of the person and the StartDate. We make distinct query on all the appendixes and get, for every person, all the start date from all the appendixes and this is how we build the avenue. Once we have the avenue, we join it with every appendix, and this is how we bring the values from the appendixes themselves. Now, for each start date we have the data of the appendixes where this start date came from. In order to "complete" the data from the other appendixes, we "spread" the missing value from the previous row. For example, on 1/6/2005 the position status hasn't changed so after the join the Position field will be null, so we take the value from the previous row.

The problem of Irrelevance

After making the above, we had a problem that floated very quickly. As always, each column has a foreign key to its dimension table, so the values cannot be null. The problem is that sometimes the values must be null - it happens in the first rows of each person, when not all appendixes has their values yet. For example, look at the payroll column in the "diary" table above. It gets a value only in the last row. We can't have null in this field because we want to have foreign key but we don't have any value yet. What can we do? The solution we found is a "N/A" value which we added in such situations. Note that you need to add a new row with the "N/A" key in every dimension table.

Is that all? of course not! Using this "diary" table we can answer 90% of the business question we faced. This table is basically the whole project. But, there are more challenges we have to solve. I'll leave it for the future posts, where I'll show how we can make the HR BI system smarter.

Sunday, November 14, 2010 10:11:48 PM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, October 13, 2010

This post is the second one in a series about HR BI systems. The previous post dealt with pricing and prioritization of the project. In this post I will describe the initial modeling process - familiarity with source systems, the main/major problems and their effect on the ERD model.

When starting with a new BI system design, the first question to be asked is: What are the business questions that interests the customer? It depends, of course, on who is the client, but that was covered in the previous post. Unfortunately I can't reveal the content world for whom I built the system and therefore to I'll use a University's content world as example. Almost all the questions that the customers wanted to get answered had totals that combine different subjects. For example:

  • What is the average salary in each department?
  • How many professors and students were in each department in each year in the last five years?
  • What is the ratio of students / lecturers in each department in the last years? (Graph)
  • How many transitions between departments (by students and lecturers) took place over the years? Which department is the most abandoned and which is the most popular?
  • Display events of lecturer / student from admission to the university until this day
This list is only a small part of what our customers wanted to know. Like always, there were no answers to these questions because it required integration between multiple sources. While getting familiar with the content world, I noticed several key points that will be treated later on in the project (and in future posts):
  • There's a very difficult problem of terminology in the organization ("one truth"). For example, the most basic terms - a student and lecturer are not consensus. For example, what happens when a student is also a lecturer? What is the lecturer's department if he lectures in more than one department? If a student moved to another department and returned after a semester is it considered to be one, two transitions or maybe none? Of course, different people had different answers to these questions.
  • All sources came in the same format - appendix. For those who not familiar, appendix is a table containing the ID, date and nature of change (kind of a diary on a given subject, "Nigreret" in Hebrew). That is, each record shows only the new / current situation (but the table is historical). When you want to ask a question relating to only one appendix (salary, for example) there is no problem. The problems arise when asking questions that combine several appendixes (salary and department, for example). There is a real need for carrying out the integration of the appendixes. Well, This is what Datawarehouse meant to do.
  • We did not find the Fact table. On one hand it seems that every appendix is a fact table because it describes changes / events that took place in the world, but ... Where are the measures? On the other hand, appendix looks like a Slowly Changing Dimension (SCD) but ... Where's the fact table? No doubt that this is a Factless Fact.

So we're got familiar with the main questions, the sources of information (same format - easier for us) and the unique attributes of this project. Now comes the hard question - how we'll design the data warehouse? Our initial solution looks like this:

In other words - classic Star Schema. The appendixes are SCD and the Fact table is in the center of the star. The fact will contain only one record per person and it will contain only static data of the person, such as ID, name and date of birth. This model takes us back to the questions about how the linkage between dimensions and fact will be made and what is the measure the fact. We tried to think how we answer the most basic questions required by the system (by the way, this is the way to see the quality of the ERD):

  1. For each lecturer, what is the salary, department and current position? - We'll take from every dimension/appendix the current record of the lecturer. The link between the dimensions will be carried out by the fact (lecturer's record) using the ID column
  2. What is the salary, department and position of the lecturer on date X? - From every dimension we'll take only the record which its starting date is from before X and nearest to X and again we'll link the dimensions through the fact table. Hence, we should calculate analytically (analytical functions) for each record its end date (the day before the start of the next record) and then it's easy to query with those kind of questions by using "between" statements.
  3. Average of the salaries in each department - we'll pull out the list of lecturers in each department (from the department appendix). We'll choose a date that interests us, and for each person we'll pull only the record which contains that date. Then we'll do group by. Ostensibly it's not trivial in every classical reporting tool, but let's think for a moment: what do we do here? Group by members of a particular dimension by another dimension counting, where dimensions are linked through the fact table. Looks fine. So - What's the problem? The problem is that we are trying to aggregate/sum by a dimension's attribute and not by a measure, which is not possible in every reporting tool. (a good solution will work with many tools and not just with dedicated tool for a particular ERD)
  4. Questions involving only one appendix - very easy and do not require datawarehouse at all
  5. Show events from the first day in the university until that day (of a certain person) - not possible in classic report tool. Let's say we want to see the "event log" of the lecturer Y, sorted by starting date in ascending order. This requires a union of all the appendixes / dimensions and sort of all records by starting date, where there may be several different appendix records that start in the same date (even very possible. change in the position usually leads to a raise)
At that point we realized that we needed to retreat. That looked like a good ERD at first but the last question is very basic and our model proved wrong. How do we show the event log which is a combination of the appendixes? How can we identify processes when they occurring, such as increase in the amount of courses and salary which leads to a better position? These are all questions that the system should give them an answer and a classic star schema can't help us here.

What have we done? Continued next post ...

Wednesday, October 13, 2010 11:39:33 PM (Jerusalem Standard Time, UTC+02:00)

Hey, look! A post about OBIEE with no Javascript!! ;-)

OBIEE lets you have either actual values or percent in the data labels in pie charts. I've found a trick to show both: add a new column (I called it concat) in the criteria tab which is the concatenation of the entity (shop name, for example) and its measure value. This is a string column, so we'll need to convert the measure into string. It'll look something like that: LOGICAL_TABLE.ENTITY_NAME || ' ' || cast(FACT_TABLE.MY_MEASURE as char). Edit the graph/pie view and make the concat column be the legend axis and the measure be the pie values axis. In the "Chart Type Special" menu (the button with the star) choose "Show values as Percentage of total". The result will be that the pie values will have both entity name, actual value (because now, for OBIEE, its part of the enitity name) and the percent.

Enjoy!

 |  | 
Wednesday, October 13, 2010 6:34:04 PM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, October 12, 2010

After two months of writing only about OBIEE, it's time to write about what I did besides that. The main reason I have not written on anything else is a load of work on the HR BI system as a manager. In the future posts I will discuss the BI system that have taken my precious time lately and I will review the solution bottom up, i.e. from the pricing stage, going through the data warehouse structure to the implementation of the ETL and reports, of course. In this post I'll write about pricing and prioritization .

Part I - Pricing

I'll start with several leading questions in order to stimulate thinking.
When we start estimating prices of the BI project, how do we do it? The intuition of most of us (so I believe) is to get as much initial information about the project's content world as we can: What are the main problems, what is the result the customer wants to get, etc. Then we estimate the cost of the project, just like that. This rises the question of how accurate the assessment we've got? Furthermore, frankly - what is it based on? Past experience? Who's? Do we know how to explain it to our colleagues? Can update it when there are material changes to the requirements? If yes - how?
The purpose of this post is formalizing, even a little bit, the process of pricing the project. In the second part we will also prioritize.
Pessimist: How can it help? After all, in every case it is estimation. There are a lot of parameters in the project and we can't put every single one into the equation and even if it was possible, most are unknown parameters in the initial phase of pricing!
Miky: That's right, but the approximation can be done in a formal way and based on experience of others. It is much better than just a wild guess of numbers.
So we realized what we need this for. Let's start.

We want to price the project in terms of time (time can also be translated into money, of course). To start, we need to know and be familiar with the various processes in the organization that we can implement. To do this we will have to sit down with key people of the organization to its various parts (departments, sections, you name it) and just sit and learn. After we learned the various processes (converting these "conversations" to a list of processes is the result of experience and talent in information systems and BI systems in particular, So you may want to get help if you are new in this field) open Excel and write a list of all processes in one column. Then, we add the following columns:

  • Content world - column that contains the name of the process or content world
  • Short description - so those who work after us will understand what we meant
  • Times (that are required) - whether the analysis of process is required in the past, present or future (of course there could be a combination of them). This column is critical because any change in this could change dramatically the length of the project
  • Priority in HQ - how much they interested in this process. Number from 1 to 5.
  • Priority in field - how much they (intermediate level managers to the production floor) are interested in the implementation of the process. 1 to 5.
  • Source Systems - which systems are required to bring the information required to analyzing the process
  • Complexity of bringing this data - based on previous column, how complex is to "put your hands on data." The difficulties can be technological (legacy systems, systems with no interface exists, unknown technologies) or political (the DBA of the system does not cooperate, there is already a "mini-BI system" and they don't want to replace it) 1 to 5
  • Complexity of the process - with no relation to the previous column, how much the process is complex. To simplify the matter, what will be the length of the design document? 1 to 5
  • Additional notes
In this simple excel sheet we've gathered the raw data with which we will analyze the estimated duration of the project. It is quite possible that at a later stage of the project we'll go back and update this data sheet.

Now let's take a moment and get out from our project's scope and think big.
Objective: Build a costs matrix for each part / module of the project. Let's see an example:

Easy Medium Difficult
Preliminary Design 2 5 10
Detailed Design 2 5 12
ERD creation 1 1 1
ETL - Dimension 1 3 7
ETL - Fact 1 6 15
Reporting Model 1 6 10
OLAP Project 1 3 7
Reports 0.3 0.5 1
OLAP Reports 0.3 0.5 1.5
Web Site 2 10 20
Security 3 6 10

The numbers are in terms of days (not hours). For example, the construction of report in medium difficulty level will take about half a day. It is important to emphasize that this is actual time and not including any buffers, but it is including unit tests. The buffers will be added later (those who read the book "Critical Chain" know how catastrophic it is to add the buffers now). Let's go on: we'll take the first process that we consider implementing and write its quantities matrix:

Process #1
Easy Medium Difficult
Preliminary Design 1
Detailed Design 1 1
ERD creation 1
ETL - Dimension 1
ETL - Fact 1 1
Reporting Model 1
OLAP Project 1
Reports 1
OLAP Reports 1
Web Site 1
Security 1

Pessimist: Wait a minute. I learned about the WBS methodology in which I take any part / module in the project and break it into its atomic elements. What's the difference?
Miky: Nothing at all and very much. We also break the big problem into smaller parts, but here we utilize the fact that the components are identical. Every project is composed of those parts - design, ETL, dimensions, cubes, reports etc.

After we multiplied the matrixes and summarized the result we got the estimated cost for the first process. In our case we get 53 (after rounding) days. Let's call this number M. Now we add a few more fixed costs of our process:

  • Developer Tests - 6
  • User tests - 2
  • Installation - 2
In addition to these costs there are also constants to be added to the equation:
  • Gross - 1.3 (vacations, illnesses, etc.)
  • Management - 1.15 (time spent on the project manager's job)
  • Buffer - 1.1
These fixed values are the result of experience and of course they will be updated along time and the various projects. We'll multiply M with the constants, and then we add the fixed costs. Now we got 97.16 days. Divide by 20 (work days per month) and we get 4.8 months. It's the cost for the implementation of the first process. It should be emphasized that this number is in terms of work, which means theoretically that if you assign two people to work on the project it will last only 2.4 months. It is theoretically and not exact for the simple reason that there are constraints and dependencies in the development process (let's not go into that now).
Doing this for every process, we'll get the estimated time for every process to implement.
Pessimist: Sorry to bother, but... Is it not too much work just to get the pricing? A skilled manager will guess pretty much the same numbers, right?
Miky: First of all - not sure. Second and more important - this method gives the manager and the entire team a common language for schedule and pricing. So it's important to write the matrix with the developers or at least review it with them. It is important to make sure the team knows the matrix and its implications for the schedule. In addition, this method also provides a common language with the customer. If the client wants to understand why the project will take so much time / money you can sit down and explain. If the client wants to reduce costs you can talk with him in terms of Dimensions and Facts, make a simulation and see how much the reducting of a one difficult dimension will spare the entire project cost. Nice, isn't it?
 
Part Two - Prioritization
 
As before, we'll start with the basic questions: how we decide which process we'll implement first? Will it be the process that the CEO wants to see implemented first? Maybe the process that will provide the maximum benefit for the entire organization? Perhaps the easiest and the most simple?
Now that we have the data in frost of our eyes, we repeat the question from which we started: how we decide which one we'll implement first? Here's the answer:
Explanation: The numbers in parentheses is the work in terms of months. Axis X is ease of implementation (not difficulty, for convenience) and it is obtained by weighting the data reception complexity from the source systems and complexity of the process (you can weight these numbers with your own custom formula). The Y axis is the priority, where I gave a factor of 0.8 to the HQ and 0.2 to the field's priority, but even here you can weigh as appropriate for your organization. This graph/map is definitely a product you can show your client or decision-makers and sit down together to decide where to start. Conventional wisdom says to start from the upper-right quadrant (most important and easiest) but of course there are additional considerations that do not appear here. For example, there is a tiny and unimportant consideration such as ...How much time do we have???
 
Is it the end of our work?
Of course not! The costs matrix requires maintenance, but over time it will become more and more accurate and stable. The project manager should monitor the progress schedule to know how long or rather how much work spent in each stage of the development. Sometimes we'll find that the way we written the quantities-matrix is not good and in that case we will come and update it. In addition, over time we'll find that the values in the pricing matrix are decreasing due to specialization and skills acquisition of the development team.
 
Thanks for Arkady Gurevich, my team master who guided me in implementing this idea and helped the project in general.
 
Now that we decided with which processes we'll start the project, we can move forward. See you in future posts where we start to plan the HR BI system.
 |  | 
Tuesday, October 12, 2010 11:46:45 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, October 03, 2010

I can see that many developers asked on the forums how to dynamically hide column in OBIEE answers. You can find some answers there but they're obviously not elegant. In this post I'll show my implementation to solve this problem. As always, I must mention that I use version 10.1.3.4 of OBIEE and that I strongly recommend first reading my previous posts regarding OBIEE, especially the post where I'm explaining how to add Javascript to your OBIEE answers.

Read carefully the attached JS file and understand it. You must edit the numberOfResultTablesInPage variable and add your custom logic which will define which columns need to be hidden. I added my example there. The code is very simple: every time our logic determined that a columns need to be hidden, call the function hideColumn(colNumber, resultTable). Pay attention to the order of the calls to this function. Always start from the higher-numbered column.

Enjoy.

dynamicColumnHiding.js (1.77 KB)
 |  | 
Sunday, October 03, 2010 9:16:49 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, September 12, 2010
My next posts will discuss the project I just finished (at least its first phase) which is about HR (human resources) BI. In this post, I'll describe the data dictionary feature - why and how we did it.

From the first day of this project I've seen that data dictionary is extremely necessary. Different people uses the same term but they mean different things. There is no consensus about most basic term in the HR world - employee. I asked different people the simplest question: "What is an employee" and I've got different answers - Only full time employee; only one that gets the payment from me, meaning no outsourcing or contractors and so on...

The solution I thought about is a simple but strong data dictionary (aka DD) which contains all the terms and their definitions. The DD is maintained only by one person who is the main customer of the BI system. This is because letting it be edited by more than one person will make the mess rise up again. Now, let's continue to implementation:

Step 1 - Creating the DD
I created a new list in MOSS/SharePoint. The list has only the following columns: Term, Definition. That's it! As I mentioned, only one person has the right to edit this list. Using the help of our infrastructure team, I made a small ETL process which takes the DD from MOSS and writes it into an Oracle table in the DWH. This DD-ETL runs in parallel with the BI ETL and won't fail the whole process if it fails.

Step 2 - OBIEE Administration Tool
In the OBIEE Administration program, import the DD table to the physical layer. Now we have a slight problem - the DD table is not connected to any other table in the model because it's not belonging to the star schema and OBIEE doesn't accept it. To solve it, we'll create a new view (new table -> properties -> Table Type: Select) in the physical layer which will only contains the terms from the DD table (select term from DD_Tabel). We'll call this table/view DD_Dummy and connect it (foreign key) to the real DD table using the term column. Take these two tables to the logical layer (aka Business Model) and only pull the real DD table to the presentation layer. Now we have a working model with the DD table in the presentation layer.

Step 3 - OBIEE Dashboard
If you had read my previous OBIEE posts, you should know by now how to add Javascript code to your dashboard. Add the code in the attached js file and add the attached css too (edit the file res/s_oracle10/b_mozilla_4/custom.css in the OBIEE server).  Create a new report which shows exactly the DD table. In every page you want to have the DD feature, do the following:
  • Add a new section and hide it (Properties -> Format Section -> Use Custom CSS Style -> display:none)
  • Add the DD report into the hidden section
  • Below it, add a text element and inside it write:
    Mark the "Contains HTML Markup" checkbox
  • Add a new hidden text element anywhere you want which will contain the script (you can use the same hidden section if you want) and put there the following script:

That's it. If you did everything right, the magic will start immediately. Moving the mouse cursor above any of the column headers in every table in your page will popup a little tooltip with the definition of the column header. Note that if you already have my old javascripts in your OBIEE you might have some functions duplicated. Delete the duplicated functions manually.

DataDictionary.css (.36 KB) DataDictionary.js (2.42 KB)
Sunday, September 12, 2010 8:32:16 PM (Jerusalem Daylight Time, UTC+03:00)
 Monday, September 06, 2010
There are two ways to search for texts in mappings. The first one is to use the "Find in workspace" tool inside Informatica (CTRL+F). The problem with this tool is that it's looking for ports and transformation names. What can you do when you need to know in which transformations you used your unconnected lookup? For this sort of questions, you can use the other way: Export the mapping to XML file (Repository -> Export Objects) and search inside the file. There, you can search inside expressions, source qualifiers overrides, lookup filters and so on.
Monday, September 06, 2010 6:53:04 PM (Jerusalem Daylight Time, UTC+03:00)
When using the date picker in OBIEE, the date format that it works with is DD/MM/YYYY. In out organization, oracle is tuned to use another date format, so it causes trouble. What we did to solve this is very simple. In the OBIEE Administration tool, go to the physical layer and open the connection pool. In the connection script tab, enter the following into the "Execute before query" section: alter session set nls_date_format = 'DD/MM/YYYY'
This statement will execute before every sql the OBIEE sends to oracle and tell the DB to use the DD/MM/YYYY format.
 |  | 
Monday, September 06, 2010 6:46:05 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, September 05, 2010

After a long break from blogging which caused by a lot of work in my organization, it's time to come back. I learned a lot in the last months and I have a lot to write. This time I'll write about working with static parameters in Informatica (version 8.6.0 HotFix 3).

Let's start from the beginning: Why should you work with parameters?
The answer (at least the main answer) is simple: It prevents hardcoding. For example, let's assume you write the following formula in your mapping: IIF(MyPort > 5, TRUE, FALSE). As we know, in the real world everything can change and now the formula needs to have 6 instead of 5. Changing this will make us open the mapping, edit it and worst of all - install it in the production environment, and as we know it can always make a lot of trouble. If the number 5 was stored in parameter file, all we needed to do is to edit the parameters file (in the production, of course) and that's it! Another good reason is that sometimes you can use the same mapping several times, each time with different parameters and that can ease the development for you.

How can we make it? We'll do it step by step (by the way, in order to learn this thing create a small workflow just for training):

Step 1 - Using Parameters in the Mapping
In your mapping, go to the Mappings menu and choose "Parameters and Variables...". There, you can create and edit your parameters (only within this mapping, of course). The name of the parameter should start with $$. (For example: $$MyParam). These are the properties you can edit for every parameter:

  • Name - Like I mentioned, should start with $$. Make the name as much descriptive as you can.
  • Type - Parameter can't be changed during the session and variable can. In our scenario we'll use parameters which are taken from parameters file, so choose Parameter.
  • Datatype - There's no much to explain.
  • Precision - If you're using string, make the precision at least 100 chars. It will prevent troubles when your parameters contains expression.
  • Scale - comes along with precision, where it relevant.
  • Aggregation - Relevant when you use multiple partitions in the pipeline. We'll leave it for now as the default.
  • IsExprVar - Very important. It determines if the parameter is an expression or a static value. Practically, if you use it inside an expression transformation, it should be True. Otherwise, it should be false.
After you defined your parameters, you can use it in your mapping in several ways:
  • Source Qualifier - you can override the whole SQ or add a source filter from parameter.
  • Lookup - you can override the lookup's sql or just its filter.
  • Expressions - Using a parameter as a placeholder inside expression is quite difficult because Informatica adds "" before and after the expression. Coming back to the example above, that's why we can't store 5 in the parameter. What we're doing to solve that is to store the whole expression in the parameter (the whole IIF(...)). After doing that, we've find out that this makes the parameter file clearer.
Step 2 - Build your parameter file
The parameters file, as its name suggest, holds the values of the parameters for the whole workflow. There are three types of lines in the param file:
  • Comment - starts with #. Very helpful for describing the other lines and for making order in the file.
  • Section headers - The first section contains the global parameters which are relevant for all the params in the workflow. The section starts with line contains only: [Global] (this should be the first line in your param file). After this (first) line, write all the global parameters. The other sections are session-specific, meaning that the params in these section will only affect one session. These section headers will look like this: [MyInformaticaFolder.WF:MyWorkflow.WT:MyWorklet.WT:MyInnerWorklet.ST:MySession]. The parameters that will come after this header will only affect the session called MySession which is inside the the worklet MyInnerWorklet and so on.
  • Parameters - each line in the file will contain only one parameter. The syntax is: $$MyParam=value. Don't write space between the equality sign (=) and the value because it will enter into the parameter itself.

As you can see, it's quite simple. Now, let's end this with the last step.

Step 3 - Attaching the parameter file to the workflow
Very easy. In the workflow manager, open the workflow and go to Workflows -> Edit -> Properties. Insert the parameter filename (with the path, of course) and that's it.
Now, for some important tips that will save you a lot of time:
  • The integration service reads the param file only when you run the workflow. Re-running a session or worklet after changing the file won't affect the results.
  • Unline what they write here, you can pass parameters to the mapplet. In order to do so, this is the correct syntax in the param file: MyMappletName.$$MyParam=TheValue. When this line is under the session header, it will only affect the mapplet called MyMappletName which is inside the session/mapping.
  • Don't use initial values for parameters. You won't know when the param file is not correctly attached to the session.

There's much more to discuss - using Informatica, you can use dynamic parameters and variables. You can get a value out from a session and use it in other sessions. I'll leave it for now for future posts.
Sunday, September 05, 2010 9:13:29 PM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, July 21, 2010

Note: In order to understand and implement this you need to know how to add JavaScript to OBIEE. I strongly suggest you to read my older posts about OBIEE manipulation. this feature has been tested on version 10.1.3.4

I've been asked how to create radio-buttons prompt in OBIEE dashboards. The purpose of this is to make the dashboard more beautiful, becasue radio buttons looks better than drop down list (aka ddl). Another nice thing you can do with this is to add description to every option (as I hope to implement in the near future). So, this is how you'll do it:

  • In the OBIEE Administration application, I have a session variable called MyProjectWorkingMode (replace MyProject with your project name). This variable controls the mode (or whatever) of the user (let's say it can be red, green or blue). The default value for this variable is red. This variable is important for us because we'll use it to "remember" what the user selected.
  • In OBIEE Answers, I created a dashboard prompt which is a drop-down-list. This prompt has the same default value as the variable (red, in our example) and it sets our variable. important: the caption of this prompt must be MyProjectWorkingMode.
  • In order for us (or the JS code) to know in which mode we are we'll create a report called MyProjectCurrentWorkingMode. Put there one column which will show our variable's value and one column from the presentation layer (all reports must go to the underlying DB). The expression in the first column will be ValueOf(NQ_SESSION.MyProjectWorkingMode). In the results tab, create a narrative view which will have a div with the id="MyProjectWorkingModeDiv" and inside it write @1. This will give us the current working mode inside the div. In the compound view, make that the report will only contain the narrative view. Save the report.
  • Create a new dashboard page. Add Text object and inside it put the javascript code from the file attached to this post and replace anywhere the "MyProjectWorkingMode" with the name you entered. Make sure the name is the same as in the prompt's caption and the div. Don't forget to mark the "contains HTML markup" checkbox. Add the report we created in the previous step in a seperate Section and hide that section (Properties -> Format Section -> Use Custom CSS style = display:none). Add the dashboard prompt.
  • That's it. If you did everything right, the JS code will make the ddl as radio buttons.

I suggest that you'll get over the JS code in the file attached in order to understand what I did. That will let you change the implementation if something is not as you wanted.

Last thing: Before you ask me why it doesn't work in your application, make sure you did everything right and that you have the same version as mine. If it still refuse to work, ask for help from a web/JS developer who can debug the code and see what failed. In my previous posts my features didn't work for some and they thought that this ain't working, where it worked for a lot of people who thanked me.

Enjoy.

obieeRadioButtons.txt (4.12 KB)
 |  | 
Wednesday, July 21, 2010 10:09:06 PM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, July 15, 2010

No-one in my organization knew what is the difference between stop and abort in the Informatica Workflow Monitor. I googled it, and here is the answer in short term: Abort is killing the process in the underlying operating system, meaning that the memory that the session took for itself won't be released. Most of the OS don't have good-enough garbage collectors, meaning that this memory allocation won't be free until you'll restart the whole server(!). The stop command will ask the session to quit whenever it can, meaning that this will take more time.

Use the abort command only if you must stop the session right now. If you see that over time your informatica server goes slower and slower, restart the server in the weekend.

You can see the full answer here.

Thursday, July 15, 2010 10:37:52 PM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, July 01, 2010

This is the fifth and last post regarding Itay Braun's seminar in the SQL & BI conference. In this post, I'll write some notes I collected from the last part of the seminar which discussed SSRS. As you can see, most of the tips are relevent for every reporting tool.

  • Scale-Out instead of Scale-In. Meaning that it's better to have multiple SSRS servers on one DB then bigger and better single SSRS server.
  • One report should not return all the data. Define and build your report to answer specific answer.
  • Use subscriptions. Minimile impact on performance.
  • Use cache executions or at least snapshots.
  • Visualization (see also my post about dashboard design)
    • Use same colors for for the same business units
    • Use images for better data recognition (e.g countries' flags)
  • Monitor the SSRS server. Look at the trace logs, the execution log and the performance counters. You can use Scrubs(free SSRS performance reports).
Thursday, July 01, 2010 7:31:33 PM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, June 22, 2010

This is the fourth post about Itay Braun's seminar in the SQL & BI conference. In this post, I'll write some notes I collected from the third part of the seminar which discussed SSIS.

  • Log everything that happens inside the package. Use the SSIS log providers and the event handlers. The effect on performance is not significant.
  • Analyize the log data. You can analize total execution time, SSAS partition processing time and much more.
  • Log detailed error information about rejected rows. You can even log the actual error row.
  • Monitor also the execution of the ETL jobs.
  • Data Profiling is a new feature in SSIS 2008. It's used for analysis of the data's quality in the DWH. Go and learn.
  • Configure everything you can in the package configuration, such as: tasks, containers, variables, connection managers and data flow components. You can use XML config file, config from parent package, registry configuration or even configuration from a table.
  • The Cache Transformation saves data in the cache. It's very useful in lookups.
  • Data types: Make them as "small" as possible (better performance). Be aware of precision issues when using money, float and decimal types.
  • Don't sort in SSIS unless it's absolutely necessary.
  • Sometimes using T-SQL will be faster then running it in SSIS.

Some of the tips are also relevant in Informatica. For example, I always let the DB make the sort whenever possible. Sometimes it's also works with joins. Running packages or procedures can be much more faster than running it in the ETL, but if you look at it from maintenance point of view you'll do it in the ETL. It depends on much factors, such as the time you can have to run the ETL, the knowledge in your team and more.

Tuesday, June 22, 2010 10:40:51 PM (Jerusalem Daylight Time, UTC+03:00)

This is the third post about Itay Braun's seminar in the SQL & BI conference. In this post, I'll write some notes I collected from the second part of the seminar which discussed SSAS design, but here I'll focus on near real-time cubes. I decided to write an independent post about near real-time cubes because this subject deserves its own place. By the way, I personally think that this subject is very interesting.

There are several solutions to implement near real-time cubes:

  1. Processing only the last partition. Pros: Trivial and simple. Cons: Might impact users in terms of resources and cache.
  2. Small real-time cube. Pros: Simple and elegant solution. Cons: It's small, so it can't contain much data even if you need to. This is the main reason why people usually don't do it.
  3. Backend processing server. Now it's getting interesting. The backend server is only used for processing. The front-end server(s) is used for querying. Pros: Good performance on query time. Cons: You need to take care of syncing the cubes, using XMLA sync, physical file sync (SAN snapshots) or any other technology. The cache is deleted with every sync.
  4. Flip-Flop. It's like the previous solution, only here we switch between the servers. Pros: It enables you to check your data/cube before making the switch. Good performance on query time. Cons: You need to take care of changing the connection string in the client side or implement NLB solution. It might disconnect users or you need to take care of waiting for the end of the session and only then change the connection string.
  5. ROLAP partition. Pros: No processing at all! Cons: Bad, bad performance.

If anyone have implemented one of these I'll be happy to hear.

Tuesday, June 22, 2010 10:17:34 PM (Jerusalem Daylight Time, UTC+03:00)
 Monday, June 14, 2010

This is the second post about Itay Braun's seminar in the SQL & BI conference. In this post, I'll write some notes I collected from the second part of the seminar which discussed SSAS design.

  • There's a very common misunderstanding about what is an entity. You can see it when there are a lot of dimensions. If there are "car color", "car manufacturer" and "car creation year" dimensions it seems you've got a problem. If there are dimensions with only key and description attributes (or worse - only one description attribute) then you've got a serious problem.
  • Design to Scale: Often the performance problem are caused by the dimensions and not by the fact/cubes.
  • Don't develop SSAS with hebrew (or any other right-to-left language). Use hebrew only in translations .I've personally got hurt with this. Maintaining MDX with hebrew is hell.
  • Avoid using linked measures. If you have some, consider joining the cubes to one cube.
  • There need to be a balance between the amount of the dimensions and the amount of the cubes. If they are equal, you've got a problem. BIDS will warn you (in the 2008 version) if you have two measure groups with the same dimensionality.
  • Keep the dimension keys as narrow as possible. Dimension keys based on more than one column is not optimal. Use member properties, such as AttributeHierarchyEnabled, AttributeHierarchyOptimized and GroupingBehaviour.
  • Data types is an issue in SSAS. Make sure the data types in the DSV are the same as in the dimension attributes.
  • The SSAS engine works best when the foreign key between the fact and the dimension is based on integer columns. Read more about dimension design here.
  • Attribute relationships and flexible vs. rigid are critical to performance. As I read more and more about SSAS I understand how much it's crucial to performance.
  • Automatic creation of partitions can be a good solution for a wide variety of problems. Note that when deploying your SSAS solution to the server you might override the partitions you created with your code. To solve this, you can use the deployment wizard or to import your SSAS database into solution.
  • Each partition should have up to 50M records or up to 250MB. Optimize the partition query. It's only plain SQL query. Get help from your DBA and get a better performance.
  • 99% Aggregations will not cover 99% of the aggregation possibilities/combinations. The engine chooses statistically-randomly which aggregation to save. Use the usage based wizard.
  • Most of the processing time is the time required for the relational DB to return the results of the SQL query. As before, get help from the DBA, improve this query (note that you can't change it) and get a shorter processing time.
  • To solve MDX performance issues, you need to learn what is the difference between the formula engine and the storage engine. I was suprised to see that many SSAS developers never heard of it.
  • In MDX Scripts, use scoping instead of conditional calculations when possible.
  • Avoid StrToMember or StrToSet functions.
Monday, June 14, 2010 9:34:43 PM (Jerusalem Daylight Time, UTC+03:00)

These are some important notes I written in the great seminar by Itay Braun. The seminar was held in the SQL & BI conference in Tel-Aviv two weeks ago. Sorry for the delay, believe me I was busy. The title of the seminar was "Solution - Oriented BI for developers". Actually, it was a collection of lectures and tips on general BI and on SSAS, SSIS and SSRS, mainly on the 2008 R2 version. In this post, I'll bring some notes I collected from the first part of the seminar, which was on general BI development and desing. I'll also bring my personal point of view.

  • The solution objective: Taking some reports and showing them is not BI. BI is on combining several sources and bring knowledge based on this integration. Only then it's getting interesting.
  • We need to aspire to answer 5 question through our BI solution: What have happened? Why did it happen? What is happening now? What will happen? What do I want to happen? Just for thinking, try to tell yourself which answers does your BI solution give. I think you'll find out (as I did) that you're far away from these five answers. We all are.
  • BI is a process, not a one-off project. I always say that BI design & development requires maturity. You need to know that you can't do everything in one shot. There are even things that you can do but it won't be right at the moment. The organization also needs to grow up with the solution.
  • You need to decide on an early stage of the project which client tools you going to have. You also need, of course, to decide if you'll have reporting, OLAP or both.
  • The ideal BI solution gives an infrastructure and let the customers do their own reporting. The big disadvantage here is performance. You can't know what kind of a killer-queries the user will run.
  • A good dashboard gives the manager all the knowledge he needs to have in front of him. It tells him what is the current status of his organization without one click. Do you have this kind of dashboard? I don't...
  • Microsoft has a lack of good visual client tools in the BI field. This is where third-side companies, such as Panorama comes.
Monday, June 14, 2010 7:21:47 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, May 16, 2010
If you want to join fact and dimension table based on non-equality operator, such as between statement or <= and >=, you can't do it using simple join. It will throw you the error "All operators must be equality". To do it, you must use the complex join (the one with the yellow button).
 |  | 
Monday, May 17, 2010 1:42:16 AM (Jerusalem Daylight Time, UTC+03:00)
 Monday, May 10, 2010
From time to time I find myself writing a hierarchial query in Oracle DB. It helps me a lot when I need to understand hierarchial data, mostly resides in the source systems of the Datawarehouse. Yesterday I've seen that a lot of my colleagues haven't heard about these kind of queries, even the most experienced ones. So here is a tiny tutorial for this subject which you can go back to any time. The important thing here is that you'll know that these kind of queries exists even if you'll google it the next time you'll need it (search for "start with" or "connect by").
Let's say you have a table which looks like this:
Key Father Description
1 10 Renault
2 20 Jambo
3 20 Airbus
4 10 Hyundai
5 10 Toyota
6 10 Honda
7 20 F16
8 20 F15
9 10 Alpha Romeo
10 -1 Cars
20 -1 Airplanes
-1 -99 Transportation

You want to see the data in a tree form, meaning that you'll see Renault, its parent, its grandparent and so on. All you need to write is the following query:

select * from MyTable
start with KEY = 1
connect by prior FATHER = KEY

That's it. The result is:
Key Father Description
1 10 Renault
10 -1 Cars
-1 -99 Transportation

Note that the recursion stops only when it cannot find the next parent. In addition, many times you'll want only one row from the tree (the root, its son or some other criteria). In this case you can simply add a where clause to the query.
Unfortunately, doing the same with SQL Server is much more difficult (tell me if I'm wrong).

Enjoy.

Monday, May 10, 2010 10:57:55 PM (Jerusalem Daylight Time, UTC+03:00)

SQL Server 2008 R2 has been released not-so-long-ago. You can download the trial and check it for yourselves. Alternatively, you can enter the digital tour or read the free e-book. I started reading the e-book and I promise to update here about the new features of R2.

Monday, May 10, 2010 5:51:59 PM (Jerusalem Daylight Time, UTC+03:00)

 

"Tony Stark once saved his heart - and his life - with the Iron Man suit. Now he's found his heart and forged new life into his company. Stark Industries, by abandoning munitions manufacture for cutting-edge communications. Learn how Oracle helped with the transformation"

Have they lost their minds? Don't they have better ways to increase their revenues?

 | 
Monday, May 10, 2010 7:30:08 AM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, May 05, 2010

At the end of the month the Israeli SQL & BI conference will be held. The conference will last two days and there will be number of seminars, some of them will cover Microsoft BI issues. You can see here the list of the seminars. Luckily, I give away a free ticket and you, dear reader, can have it. All you have to do is to answer the following question by leaving a comment here and maybe you'll get a ticket. And here's the answer:

During 2007-2008, Microsoft made two important acquisitions. Two of the new features in SQL Server 2008 R2 are based on these acquisitions. Which are those acquisitions and what are those new features?

Your comments won't show here but don't worry, I can see them.

Good luck.

Wednesday, May 05, 2010 5:11:59 PM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, May 01, 2010

I just opened a new blog in the Israeli Microsoft blogs site. The blog will be written in Hebrew and it will be focused on the israeli BI scene. If you're from Israel or know reading Hebrew, you're welcome to visit or the subscribe to the RSS feed.

Sunday, May 02, 2010 6:01:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, April 25, 2010

Do you know how to add JS code to your OBIEE site?

It all started with a bug in OBIEE (version 10.1.3.4) and ended with a wonderful solution that can help us in future problems. Look at the following report table:

Customer

Day

Shop

A 1/1/2000 EBay
B 15/2/2002 AdventureWorks
C 28/5/2005 Office Depot

Building the Business Model layer and the report correctly will take care that clicking on the EBay link will take us to the Ebay shop page (in OBIEE, not ebay.com) which is prompted on 1/1/2000 and customer A. This happens because the Shop report is also prompted on Customer and Day. The problem starts when we sort the table by the shop name:

Customer

Day

Shop

B 15/2/2002 AdventureWorks
A 1/1/2000 EBay
C 28/5/2005 Office Depot

I would expect that clicking on the EBay link will make the same thing as before, but actually it prompts the Shop page on Customer B and 15/2/2002. It seems that the report saves the original location of the record instead of prompting on the actual record! In order to solve that I written a small JS code that enables us to navigate based on previous fields in the report (See the attached file). In order to use it we need to change the link. Edit the column properties, go to the Data Format tab, check "Override Default Data Format", Pick Treat Text As Custom Text Format and in the editbox enter the new link. A link example can be found in the attached file. Note that the third parameter for every field is how much fields to go back.
The great thing about this solution is that it doesn't only solves this particular issue. It enables you to prompt on every logical column, no matter on which column the original value was shown.
BIPortalPageNav.js (1.04 KB)
 |  | 
Monday, April 26, 2010 3:53:13 AM (Jerusalem Daylight Time, UTC+03:00)
 Monday, March 15, 2010

Note: all my OBIEE posts are based one upon the other. If you want to implement this, please read and do everything I mentioned in my previous posts.

I think that this is the most impressive development that we did over OBIEE. OBIEE is very lack in paging abilities. You can navigate only forward and backwards in the report. Here is how our google-like paging looks like:


In order to implement this in your environment, you should be familiar with how to add javascript into OBIEE . Just call the addPageNumbers() function in every page you'd like. Note that this is heavily based on the current OBIEE's DOM (version 10.1.3.4). In addition, this will only work for one report per dashboard page (this was the original requirement in our dashboard site).
The JS is attached below. Enjoy.

addPageNumbers.js (6.44 KB)

 |  | 
Monday, March 15, 2010 5:21:16 PM (Jerusalem Standard Time, UTC+02:00)
 Thursday, March 04, 2010

Our users had a weird requirement: They wanted to search all the columns in one action. We thought how to do it. In this post I'll describe the solution. Note that this solution is not good in terms of performance, but this is the best we've got.

  1. In every report (/answer/request) you develop, add a column and call it SearchColumn. Define its formula as the concatenation of all the columns in the report. You also need to cast the numeric columns to char in order to concatenate them. For example: FirstColumn || cast(NumericColumn as char) || LastColumn. To make things easier, I written a little C# program that takes the request XML and outputs this column's formula. It's a very simple program so I won't put it here (unless someone will ask for it). You can write this simple program in every language you know.
  2. Hide the column (Column Properties -> Column Format -> Hide).
  3. Create a filter based on this column. In the operator drop-down-list, choose "contains any". Then, click on Add -> Variable -> Presentation. Enter Term in the "Variable Expr" box (Term will be our variable name) and enter space in the (default) box.
  4. Now, we'll let the user to enter the search term. Create a new Dashboard Prompt based on any column you'll like. In the operator, choose "is equal to / is in". Control - "Edit Box". Show Default To - "Variable Expression" and enter Term. Set Variable - "Presentation Variable" and enter Term here too. Label - "Search", for example.
  5. All you have to do is to add the prompt in every page you want to let the user search.

Note that the search term will stay until the user will clear the prompt field. As I said, this solution is problematic for number of reason (performance, "why the other page filters too?" and maintenance).

 |  | 
Thursday, March 04, 2010 10:29:14 PM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, March 03, 2010

This is quite simple, so I'll let the JavaScript talk by itself (attached). I assume that you know how to add your JS to OBIEE. You need to call the function hideTabs from every dashboard page and edit the hiddenTabs array.

This was written on version 10.1.3.4 of OBIEE. Note that if the DOM will be changed in a future version it won't work.

hideDashboardPages.txt (2.03 KB)
 |  | 
Thursday, March 04, 2010 3:51:32 AM (Jerusalem Standard Time, UTC+02:00)

Non Empty Behavior is a very good optimization method you can use with your calculated measures. It allows you to define a list of measures that their emptiness defines the emptiness of your calculated measure. To understand this, look at the picture (taken from the MSDN):

This is the calculation expressions pane from the Calculations tab in the cube designer. From MSDN:

"The measures you specify in the Non-empty behavior list are used to resolve NON EMPTY queries in MDX. When you specify one or more measures in the Non-empty behavior list, Analysis Services treats the calculated member as empty if all the specified measures are empty. If the Non-empty behavior property is blank, Analysis Services must evaluate the calculated member itself to determine whether the member is empty."

This is not new. NEB is quite old feature in SSAS. The new thing (at least for me) is that NEB can also be used in Panorama formulas! For example, M1+M2 /*NONEMPTY_BEHAVIOR={M1,M2}*/ will apply M1 & M2 as NEB. You can see more Panorama optimization methods here.

 |  | 
Thursday, March 04, 2010 2:01:11 AM (Jerusalem Standard Time, UTC+02:00)
One of the basic requirements of a modern BI system or Data Warehouse is to be non-volatile, meaning that the DWH will save the history. In this post I'll explain how to implement it in the DWH while assuming that the history can be changed and (the more interesting part) how to implement it in Analysis Services.

Before starting with history saving in the DWH, I need to say that this has nothing to do with Slowly Changing Dimensions. You can work with Type 1, 2 or 3. All you need to take care of is that the fact table will be consistent with the dimension tables. Assuming that the history can change (this is the scenario in my organization), we need to extract all the fact data every time. If you work with staging level/environment/layer you can use it to calculate which records has been changed or inserted and only upsert them to the DWH itself (it's more secure). In our case, we'll work without such staging level so we need to load all the fact data in every load to the DWH.

Loading the fact in the ETL is very simple - we'll save the datetime of the execution start, round it and put it in every row in the fact table. We can call this field CreateTime. You probably ask yourself why we need to round it. The answer is that we don't want the time dimension to have every possible value of datetime. It's huge and the users don't need that kind of accuracy. That's why I round it in my projects to half-hour, meaning that 14:27 becomes 14:00 and 14:50 becomes 14:30. Let's look at an example of such fact table:

RowNum

Dimension1

Dimension2

Measure1

Measure2

CreateTime

1

X

A

4

20

03/03/2010 15:00

2

X

B

56

19

03/03/2010 15:00

3

Y

B

2

3

03/03/2010 15:00

4

Y

B

1

10

03/03/2010 15:00

5

Y

C

7

5

03/03/2010 15:00

6

X

A

4

20

03/03/2010 15:30

7

X

B

56

40

03/03/2010 15:30

8

Y

B

2

3

03/03/2010 15:30

9

Y

B

10

10

03/03/2010 15:30

10

Y

C

7

5

03/03/2010 15:30

Here we can see two executions of the ETL: The first between 15:00 and 15:00, the second between 15:31 and 15:59. Note the differences in rows 2 vs. 7 and 4 vs. 9. This means that the source table(s) has changed between the executions.

In the DWH, we create foreign keys from Dimension1 and Dimension2 to their dimension tables. We'll also create FK between CreateTime and the time dimension. I won't get into the time dimension design here, you can read about it a past post. I assume that the time dimension has the hierarchy Year -> Month -> Day -> Hour -> Half-Hour. It seems that now we have a good fact table for use in Reporting Services (or any other reporting tool). Create a parameter called Extraction Time/ETL Time and assign it values from distinct on CreateTime. The user can pick a date and the report will filter on this date. Perfect.

The problematic part starts in Analysis Services. Only one extraction time can be the current member. There is no sense in summing or aggregating more than one extraction time. There are two ways to deal with it:

The wrong way - I'm writing this down so you'll learn from this design mistake. We created a calculated member which takes the last extraction time:

Create Member CurrentCube.[Extraction Time]. [Calendar].[All].[Last Data Time]
AS
Tail ([Extraction Time].[Calendar].[HALF HOUR].Members,1).Item(0),
Visible = 1;

In every Panorama view, we sliced the Extraction time on the [Last Data Time] member, which will show the user the data of the last ETL run. Our basic assumption was that the user wants to see the current data and when he'll want to see the historical data he will pick another member from the Extraction time dimension. This is the point we missed and it soon  enough bumped into our face.

We forgot that every system, even BI system which is built for analysts should be intuitive and idiot-proof. The users picked members from the day level in the Extraction time dimension. When we asked them why they did it they said that "we wanted to see the data from 03/03/2010 and suddenly the numbers got bigger!". That happened, of course, because the member 03/03/2010 is the sum of 03/03/2010 15:00 and 03/03/2010 15:30. The solution is that no matter which member the user picks in the Extraction time dimension, he should see only one leaf (half-hour level) member at a time. This takes us to

The right way - We set the [Last Data Time] member to be visible = 0 (I kept it for debug purposes) and I written MDX script which I call NonLastEmptyLeaf. This script makes sure that no matter which member in the Extraction time hierarchy will be picked, the user will be sliced only on the last non empty leaf. For example, if the user picks 03/03/2010 he will see the data of 03/03/2010 15:30. This is what will happen also when he'll slice on 03/2010, 2010 and even the [All] member. This is the MDX script:

Freeze([Extraction time].[Calendar].[All]);
Scope(
  Descendants (
   [Extraction time].[Calendar]. [All],
   0,
   SELF_AND_AFTER
  )
);
  this = Tail (
   Filter(
    Descendants(
     [Extraction time]. [Calendar].CurrentMember,
     [Extraction time].[Calendar].[HALF HOUR],
     LEAVES
    ),
    [Measures].CurrentMember > 0
   )
  ).Item(0);
End Scope;

Now, there is no meaning of [Last Data Time] member because slicing on the [All] member will show the current data, meaning that no slicing at all will show the current data because [All] is the default member.

To conclude, history browsing can be dangerous if we let the users the option to slice on more than one data time. The open question that I have is how can I prevent the user from picking more than one member in this hierarchy in Panorama views? Should I even try to do that or should I count on his minimal common sense?

Wednesday, March 03, 2010 9:34:48 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, February 14, 2010

I knew for a long time that measures can have text and not only numbers, but last week I had a scenario where it was the perfect solution for my problem.

Let's say that I work in number of jobs. In every job I got some tasks that I need to do. The fact table looks like this:

Job

Task

Tasks To Do

Tasks Completed

Developer

Develop BI System

1

1

Developer

Help Friends

1

0

Manager

Manage

1

1

Therapist

 

0

0

Summarizing the "Tasks To Do" column will get the total tasks that I need to do. Summarizing the "Tasks Completed" column will get the tasks that I did. Note that in the Therapist job I didn't get any tasks.
In the OLAP cube, I created a calculated measure which is the percent of the completed tasks (for every job or for any other dimension). The problem is: What is the answer in case that there are no any tasks to do? Some will say 0% and some will say 100%. That's why a text measure such as "There are no tasks" can be a perfect solution for this kind of problem. This is the calculation of the calculated measure:


Create Member CurrentCube.[Measures].[Completed Percentage] as
IIF(IsEmpty([Measures].[Tasks To Do]),
  null,
  IIF([Measures].[Tasks To Do] = 0),
    'There are no tasks',
    [Measures].[Completed Tasks] / [Measures].[Tasks To Do]
  )
)
This scenario can happen in many cubes. Tell me if you have any other solution for this kind of problem.

Update: Read the important comments below.

Sunday, February 14, 2010 9:49:00 PM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, January 19, 2010
This is a basic post which I'll assume you know in the posts to come.
Basically, There are two ways of adding JS into the OBIEE dashboard:

If the script is for one page only - Enter the dashboard editor and edit your page. Add a new Text object from the bar on the left. It doesn't matter where you add it in the page. Click on its Properties button and write your script in the text box. Don't forget to start it with <script> and end it with </script>. Check the "Contains HTML Markup" box. There is no need to hide this object because it doesn't contain anything visual. Rename the text object and give it a descriptive name.

If the script is common to more than one page or may be in the future - Create a new request which contains only one column. It can be any column from any table, but I recommend you to use a column which doesn't have many rows (you can use a small dimension if you have one). Click on the Results tab and enter the Narrative view. Write your script in the Narrative text box. Like before, don't forget to start it with <script> and end it with </script>. Check the "Contains HTML Markup" box and enter zero in the "Rows to display" text box. You can save your request now. Every dashboard page that contains this request will run the JS you just written.

How to develop correctly with JS in OBIEE
Don't enter bunch of code in the request. It'll only get you messy once the length of the code will grow. Your code inside the dashboard should only call functions. For example, it can look like this:
<script>
if (myFunction)
{
  myFunction();
}
if (mySecondFunction)
{
  mySecondFunction();
}
</script>

You will write and maintain these functions in external JS file. Take the common.js file (search it under the OBIEE dir in the server) which is called in every page in OBIEE dashboards (don't forget to backup first!!) and add your code in the bottom of the file. Now you have a convenient place where you can edit your JS code. There is one last thing you need to know before proceeding - the browser caches the JS files so you'll need to clear the browser's cache every time you change your code. In Internet Explorer, you can do it in Tools -> Internet Options -> Delete Files -> Delete all offline content. You'll do it a lot while developing, so having a button that does that will be great. You can use Microsoft's IE developer toolbar or another developer toolbar.

Now we can apply new features to OBIEE.

 |  | 
Tuesday, January 19, 2010 11:46:07 PM (Jerusalem Standard Time, UTC+02:00)

This tip can save you a lot of time because when you don't follow it or you're not aware of it, finding the bug will take you a lot of time.

When you build your workflow or worklet in Informatica, you usually define dependencies between the tasks, meaning that task A will start only after task B finishes. You do it using the "Link Tasks" button in the toolbar. The issue is when you define for example that task C will run only after both task A and task B will finish. In that case, linking the tasks is not enough. You need to edit task C and choose "AND" in the "Treat the input links as" box. The default is "OR", meaning that the task will run when either task A or task B finishes. This can cause you a lot of trouble if task C is using data from task A and B (otherwise, why there's dependency between them?).

Tuesday, January 19, 2010 11:40:13 PM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, January 06, 2010
While developing reports in OBIEE, I found myself making the same action over and over again: Styling a column, clicking on "Copy Cell Format", going through all other columns and pasting the format. A better way to do this kind of actions in one shot is by changing the Request XML. Let's say you applied style to the first column and you want to copy it to the other columns. This is how you'll do this:
  1. In the report editor, click on the Advanced tab.
  2. Copy the Request XML.
  3. You'll need to edit the xml in a text editor which supports macros. My favorite is Notepad++. Paste the xml in the text editor.
  4. Now you'll have to learn by yourself how the XML is built. Don't worry, it's very simple. Note that each column is a xml element called saw:column and the style is stored in its child element called saw:displayFormat.
  5. Start recording the macro and copy the desired elements to the next column. After that, stop the recording. Pay attention to where the keyboard cursor is before and after the recording.
  6. Place the keyboard cursor in the desired place and run the macro several times as much as the columns you need to update. The shortcut to do it in Notepad++ is Ctrl+Shift+P.
  7. Paste the xml back in the report: Go back to the page where you copied the xml from, paste the new xml and click on "Set XML". That's it.
Learning the schema of the Request XML, you can do many other customizations to your report in one click. You'll see another example in my future post about searching all columns in one click. Thinking about this approach, you may find that it can help you to make your work shorter and cleaner.
 |  | 
Wednesday, January 06, 2010 6:33:30 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, January 04, 2010

One of the (many) missing features in OBIEE is record/row count. In this post I'll show how to implement record count and show it below the table, or report in the OBIEE dashboard. I know that you can find this in other blogs and forums, but I intend to show more sophisticated methods in my next posts that will be heavily based on this, so I recommend you to read it first.

  1. Create the record count column in your report: click on any column in the sidebar to add a new column in your report. Edit its formula using the fx button. In the formula text area, enter Max(RCount(XXX || YYY)), where XXX and YYY are two fields which define the key, or lowest granuality of the report. In other words: The combination of XXX and YYY will not show more than once in the report. You can add more columns if you need (column adding is made using the Column button in the "Edit Column Formula" window). If the type of the column is not char, wrap the column with cast(XXX as char).
  2. Check yourself: Preview the report. The last column should contain the record count. The value is the same in every row. If it's not working, go back and fix the formula.
  3. Hide the formula: Click the "Column Properties" button in the column (the button with the hand), go to the "Column Format" tab and check the Hide checkbox. Now, check and remember the location of the column in the report. In other words, remember if it's the fifth column or the tenth column, for example.
  4. Create the narrative: Click the Results tab and in the drop-down list choose Narrative. In the prefix, enter: "Row Count:" or something like that, and in the narrative itself, enter: <div id="RecordCount">@XX</div>, where XX is the location of the new column we created. In rows to display, enter 1. Return to the compound layout and look how the narrative below the table looks like. You can return to the narrative and change its style if you don't like it. Just leave the RecordCount div as it is. We'll use it a lot in the upcoming posts.

Problems with this implementation:

  1. The count is made in the BI Server after all records are coming from the database. If you'll check query time with and without the record count, you'll see a big difference.
  2. The record count is getting scrambled if the user sorts the table.

I'll post here any updates, if there will be any, about these problems.

 |  | 
Monday, January 04, 2010 7:07:41 PM (Jerusalem Standard Time, UTC+02:00)

For documentation purposes: When you get "Unexpected error occured: Error in Application" error in SSAS while trying to view the cube's calculations tab, you'll need to close BIDS and do some file copy tasks.

Monday, January 04, 2010 6:22:08 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, January 03, 2010

It's been a while since I last posted here. It's because it was the end of the year with annoying deadline and a new reporting tool we needed to learn - OBIEE. We learned a lot about it and even extended it, so the next posts will be about that. I'll start with a collection of tips & tricks we collected over the development process. Some tips will be helpful only to newbies, where the others will be also relevant to experienced developers. I divided it to two sections: The administration program and the webby part. Have fun.

The Administration program

  • There is no way to auto arrange to ERDs of the physical and the Business model (aka logical) layers. The best approach to make things more simple is to finish the tables import in the physical layer and arrange it by yourself: choose to show all tables, zoom to 50% and move the tables so it will create a nice and clear star schema. after that, just print it out and sent it to your co-developers.
  • Naming standards are, as always, very helpful, especially when working as a team. A foreign-key column (non measure) name should end the same for all columns in the project. Measure field name should end with _sum, _count, etc.
  • In order to develop in the administration program, the developer should be defined as administrator...
  • The "check out objects" dialog box will get annoying after few hours of development. To get rid of it, go to Tools -> Options -> General, and select "check out objects automatically". Pay attention to what you do from now on, because even viewing a table's properties will check it out.
  • You can always convert a table to a view in the physical layer. Right click on it, properties, and choose "Select" in the table type. Then, write the SQL query.
  • When working online against the repository and making an action against the DB (such as row count or view data), the connection to the DB is made using the BI server's data source. Only when working in offline mode the DB connection is made using the local ODBC.
  • When dragging a physical table into the logical layer it won't save the tables relationships. Only when dragging a number of tables into the logical layer the relationships will also be imported.
  • Circular relationships between the tables are not permitted in the logical layer but permitted in the physical layer.
  • Always use complex join in the logical layer. The BI Server supposes to understand in which physical relationship to use. Change it only when it doesn't.
  • When defining one-to-many join in the ERD you drag the line from the single to the many table.
  • In a dimension with a hierarchy (not flat one), the lowest level should contain the table's key column. Every column which won't be associated to a level will be automatically associated to the lowest level.
  • In 90% of the cases, all the measures will have sum as aggregation rule. You can define it in a single shot by selecting all the measure columns, right click on them and choose "Set aggregation".
  • The best practice is to build a dimension over any dimension-table, even if it's a flat hierarchy. Also, you should connect each fact table to its levels in the dimensions. You enter into the fact's source's properties and in the content tab you select the logical level for every dimension which is connected to the fact table.
  • You can quickly build a dimension by right-clicking the dimension-table and choose "Create dimension". It won't always be good enough, but it's not so bad.

The Webby part (Answers & Dashboard)

  • The error "Incorrectly defined logical table source (for fact table XXX) does not contain mapping for YYY" means that the BI Server cannot reach from table XXX to table YYY. It can happen because of two reasons: The first one is that there are no relationships defined in the logical or the physical layers (or maybe both). Note that the connections can be indirect. The second one is that the relationships doesn't makes sense. For example, going from a fact table to a dimension table through many-to-many table cannot happen. You'll see this error many times as a beginner and it can be quite frustrating at first. You need to understand that the BI Server can only understand simple and classic star schemas. More complex ERDs need to be split by using alias tables.
  • While developing a complex report (answer, request, you name it) with many columns, save it from time to time. Don't forget that you're working with a webby application. Closing the window won't save your work and won't alert you if you didn't.
  • After styling a column in the report, you can copy the style to other columns. In the styling window there's a "copy format" button. You can click on paste in the other columns' style windows.
  • In order to make a customized report, you can use the Narrative view. To insert a column's value, use @ and the column's location. For example, @2 is the placeholder for the second column. You can also apply javascript in the narrative to make dynamic columns and effects.
  • When you add a column in a fact-based report, bring it from the dimension table and not from the fact table. This will help you when you'll make navigation from this report to the dimension-based report.
  • The last button () in the navigation toolbar won't go to the last page as expected but will bring all the rows. In a case of a big table, this operation will take a long time and will make the browser freeze.
  • After a change in the RPD you need to update the answers site: Click on "Reload Server Metadata" below to side bar. Then, Settings -> Administrator -> Manage Sessions -> Close all cursors.

This is definitely not the last post about OBIEE. The next posts will be much more interesting and helpful. I promise!

 |  | 
Sunday, January 03, 2010 10:54:45 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, November 09, 2009

In both SSIS and Informatica, the union transformation doesn't delete duplicate rows. In SSIS it's clear because the transformation is called "Union all", which reminds us the union all statement in SQL which doesn't remove duplicate rows. The confusing part is in Informatica, where the transformation is called Union although it doesn't remove duplicates. One of my teammates got confused this week so I thought it's worth blogging about.

One more tip about ETLs with union transformations: When you'll investigate, maintain or fix ETL it will be much help if you'll know where each row came from. That's why I recommend adding a column named "src" or something like that and store there the source of the row before the union happened.

Monday, November 09, 2009 11:41:31 PM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, October 27, 2009

Lucky for me, I have the opportunity to work with more than one ETL tool in my daily work. In the far past, I worked much with DTS and PL/SQL packages (which is an ETL, but not exactly a tool). Nowadays I work both with SSIS and Informatica so I can compare them in several aspects. I don't think that one is better than the other. It's just that they have different approaches to the ETL mechanism. Anyway, I always expect that the data transformations will behave the same in every tool and I found one transformation that behaves differently between the tools - the Router.

The Informatica's Router, if you're not familiar with, receives a data row and routes it a specific data flow. Its parallel transformation in SSIS is the Conditional Split. As I said, I always thought that they behave the same. In both transformations you provide boolean expressions that determine where the row will be routed. The difference is that in SSIS, "each input row can be sent to only one output, that being the output for the first condition that evaluates to true" (from SSIS's documentation). In programmer's words, it behaves like switch-case statement with break in each condition. In Informatica, "If a row meets more than one group filter condition, the Integration Service passes this row multiple times" (from Informatica's documentation). Back to the programmer's language, it's like switch-case statement with no break inside the conditions.

If you know other transformation that behaves different between ETL tools, I'll be happy to know.

Tuesday, October 27, 2009 11:10:34 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, October 25, 2009

We're now starting to work with OBIEE (aka Siebel Analytics) as our main reporting tool. Our vision is to completely replace it with the old Oracle Discoverer, who will be soon out of Oracle's support. As I'll go on and on with OBIEE, I'll post here things you might want to know. Even if you don't work with this tool you can open your mind and get familiar with new stuff.

My first experience with OBIEE - the installation, was not a great fun. The main problem was the error: Oracle Business Intelligence is not supported on this Windows version. Oracle Business Intelligence is only supported on Windows XP x86, Windows 2003 x86, Windows 2003 AMD64, Windows 2003 EM64T, Windows Vista x86, Windows Vista AMD64, Windows Vista EM64T, and Windows 2000 x86.

The problem is that the installation can't find the windows version because systeminfo is not working. Try to run c:\windows\system32\systeminfo.exe. If you get an error about framedyn.dll, this is exactly the problem. Copy this file from the wbem sub-directory to the system32 directory. Try to run again systeminfo and you'll see it's running. Make sure the PATH environment variable contains the system32 directory and that's it.

 |  | 
Sunday, October 25, 2009 4:44:38 PM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, October 13, 2009
Two weeks ago I did a talk about SCD. Here is the presentation.
I think that even if you're doing BI (design or development) for quite a while you'll find this presention resourceful and maybe you'll learn a thing or two...
Have a nice learning!
Tuesday, October 13, 2009 11:38:41 PM (Jerusalem Standard Time, UTC+02:00)
 Thursday, September 24, 2009

Last August, I've seen an Informatica webinar featuring Dr. Ralph Kimball (you can see it here). The topic of the webinar was "Reverse the flow of Data Warehouse Design to Ensure Success". In order to make long things short, what he says is that in the DWH's first steps you don't and can't know what are the full requirements. This is quite problematic, because the first steps of the DWH - its design - are the most crucial steps, where you take the most important decisions which will be hard to change later. His solution is revolutionary and sounds a little bit hallucinatory - Start with the website. After you have the front-end finished, whether it's a dashboard website or just a simple tabular report, and you have the customer's approval, you can start the DWH design and you know exactly what you should do and what you should not.

Thinking of that, we decided that he's right but it's too much. The cost of building a website with a data source behind it which will be changed 100 times in a short period is very high. Instead, we're working in a defferent method - We are working with Mockups.

Our mockups are PowerPoint presentations which mock the dashboard website that will be the project's front-end. We're using a lot of buttons and hyperlinks between the different slides in order to create a feeling of a real website with navigation and interactivity. When we show it to the customer and tell him that this is what his dashboard will look like, at first he's very happy and then the new requirement's phase begin: Can I have this button here? This one more slicer there? What about replacing this chart with this one? etc. In PowerPoint presentation, making these changes is very easy and after a couple of days we have a good understanding of how our front-end will look like. Threfore, we fully understand how our DWH will look like. This is a great method, I recommend you'll start using it in your next project. Try and see for yourself!

There are some more points you need to know before starting using it:
  • You're not have to use Power Point for this. Visio and even Paint Brush can assist you. Nevertheless, I recommend you to use a tool for this (remember that the mockup will be changing a lot). There are many free & open-source tools to do it.
  • You may consider to have the mockup to be your contract with the customer. On the other hand, the final result may look a little bit different. You can draw everything in PowerPoint but in the reality things may be hard to implement.
  • After finishing the design phase, don't go over and over to the mockup to keep it updated. Again - The reality is stronger than the first design. When the project will be finished, no-one will care about the mockup.
Thursday, September 24, 2009 9:49:44 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, September 13, 2009

Panorama views with parameters are very common wherever mass Panorama-based development is made. Some use it within websites and others let the users change the parameters using Panorama WebAccess. When using such views within websites, a common thing that we do is to change those parameters ourselves using the Panorama SDK, where the values are based on the current status of the user, the current time or other parameters which come from outer source such as XML files, Databases, etc.

The main challenge when using parameters within web sites is that you must think of end cases for the parameters' values. When you send a bad parameter to a view using the SDK, the view will sometimes go off and it will stop responding to other things you may do with it in the website. I'll describe it with a life-experience example: We've built a website with panorama paramertized views. Some of the views had FromDate and ToDate parameters, which define the time period for the view to be sliced on. By the way, this is in fact an error by itself - we can use other dedicated functions to do that as described here. The source of these parameters came from a database. The values inside the DB were given from a power user which is the only one who can change them. In some cases, the power user changed these values and the views weren't sliced at all. Debugging it, we've seen that the period between FromDate and ToDate has no data at all so the view wasn't changed. The problem was that once the view was trying to slice on empty period and failed, it stopped responding to other actions (it can be also changed from other controls on the page). So again - think of the end cases when you create parameters-based views.

What I written so far is easy to say, but sometimes you have to do it. The views have to have parameters and sometimes they will get bad data from the users. We have no control over it and we can't deny it. What can we do? A simple approach I use from time to time is to create shadow views. A shadow view is a hidden view (or 1x1 pixel sized) that cannot be seen by the user. Everything that the user tries to do on the real view will be tested before that on runtime on the shadow view. Only if the action succeeded on the shadow view it will be made on the real view. Otherwise, no action will be made on the real view and instead the user will get a message (using Javascript's alert() function, for example) describing why his request cannot be made. The way to check if the user's action succeeded on the shadow view is using output from Panorama SDK functions, but this subject is a little bit wider for this post and it's very case-specific. There are no generic solutions for this problem that I know of, but if I'll find something I'll write another post about it.

To conclude: Think twice about end cases when using parameters inside your views. If you have no choice use the shadow view approach. Write and run tests to check the different values that you'll send to your views.

 | 
Sunday, September 13, 2009 7:13:01 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, August 27, 2009

Make sure that all the Panorama developers has the same version of Panorama NovaView Desktop and that it's the same version of the Panorama server. In addition, check that all the Panorama servers (meaning development server, production server, etc.) also have the same Panorama version. It's very unpleasant when you copy the views from the development server to the production server while installing your new or updated BI project just to find out that some of the views don't show because of that. That's right - it won't cause all of the views to malfunction, it will only affect some. That's the dangerous thing and this is why you need to double check it immiedetly.

 | 
Thursday, August 27, 2009 11:05:43 PM (Jerusalem Daylight Time, UTC+03:00)

in my last post I recommended you to learn about the batch mode feature of Panorama. There's one more thing you've got to know: Check you're code flow very well because if you'll use batch mode within batch mode, meaning that you'll call the EnterBatchCommandMode function after you did it previously, the view will get stuck. Be sure that you don't have such a case.

 | 
Thursday, August 27, 2009 10:51:37 PM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, August 19, 2009
After going to production, we encountered a serious bug where a web page simply stucked and made the browser freeze. This page is a little bit complex: It contains three views with interaction between them, so we had much trouble with this page in the past. We weren't suprised to hear that this page causes us more trouble, so we went on to debugging.
One of the views in this page get two parameters from the web page (using Panorama's SDK) - fromDate and toDate, which define a time interval for the view to slice on. After debugging, we've found that the problem was that after updating the fromDate parameter and before updating the toDate parameter, the view had no rows left. When trying to update the toDate parameters in the view with no rows, it made the browser stuck.

The solution is to make the parameters update in one action. There are two ways of doing this:
The first one is to use the CallUpdateParametersEx with number of parameters. You can update many parameters in one function call (look in the SDK documentation). The problem with this solution is that the code is not readable.
The second and better solution is to use the EnterBatchCommandMode and LeaveBatchCommandMode functions to make the parameters update in one transaction. This way, the code is much more readable. You can update every parameter in its own CallUpdateParametersEx call. The usage of this function is very simple: Call the EnterBatchCommandMode function before the parameters update and call the LeaveBatchCommandMode after that.

I'm sure that there are more scenarios where this concept can be helpful, so it's important to get familiar with.

 | 
Wednesday, August 19, 2009 8:24:15 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, July 26, 2009
This post is a collection of some tips we collected in the last week regarding Panorama views development:
  • You should always check that the view is loaded with data. Sometimes it will be loaded with no data because of lack of data in the DWH or because the view is sliced on a member with no data. It confuses the applet and from that point it won't always continue to function. From Dashboards developer point of view, if the view is loaded with no data, replace the view with a descriptive message. Dashboard page with "No Data" message will make the users think that something is very wrong with the BI system.
  • Panorama has some issues with formats. Define the formats in the OLAP. Panorama will use them as defined in the OLAP.
  • Avoid as much as you can from sending parameters to views. When adding parameter to the view, think twice if you have a way of getting this parameter's data from the OLAP. You'll be suprised to find out that in many cases you'll be able to do that.
  • If you slice your view as a result of a user event, try not to use parameters. Try to use the callFilterGridMembers function or callSliceByMember function.
  • Slicing a view on a member which has no data can cause the view to stop working. What you can do is to create a hidden view (1 pixel x 1 pixel) which is the same as the first view. Whenever you want to slice the view, make the slicing on the hidden view first. Using the Panorama's SDK, find if the view has data and only then slice the original view.
Thanks to Itay Segal for the tips and for the help.
Sunday, July 26, 2009 11:25:33 PM (Jerusalem Daylight Time, UTC+03:00)
What will happen when new members are added to the dimension which is shown in your Panorama chart?
The answer is one of the following:
A. Scroll bar will be added in the bottom side of the chart.
B. The chart will expand itself. If the chart is in a dashboards page, it may ruin the page because the view will extend beyond its area.

That's why Panorama's default setting for "Auto Select all X-axis Members to Chart" (and Y) is false, meaning that the chart won't expand even if new data arrives.

The other side of this default setting can be bad. Last week, we had a situation where one of our dashboard pages acted in a strange way: The page contained two views, where the first view is a crosstab and the second one is a chart. When the user clicked on a row in the first view, it sliced the second view on the desired member in the Y axis. What happened is that after the chart showed two members in the X axis, for example, It didn't show three or four members in the X axis, no matter what. We didn't underatand what went wrong and went home with sad faces on Friday. This morning, we realized that we forgot to enable the Auto-Select setting.
Sunday, July 26, 2009 11:01:16 PM (Jerusalem Daylight Time, UTC+03:00)
 Monday, July 20, 2009
The BI team and the source system team has a great bond between them. When the source system's team wants to change the system's logic or the physical tables, they must inform the BI team as soon as possible, because the BI team must change their system too. Otherwise, the ETL will fail and the users will see old or no data, or in the worse case the users will see incorrect data.

On the other hand, when the BI system shows data which is not the same as the data in the source system, the responsibility of showing that the BI system is OK is on the BI team's shoulders. It needs to investigate the problem and see if the error is on the BI side or in the source system side. Maybe there is no error at all and the difference is caused by difference in the logic.

Sometimes, we (the BI team, of course) investigates the data problems and find that the error resides in the source system itself. Then, we need to make sure that the error is fixed in the system, whether by the IT staff or by the power users. The important point here, and this is why I decided to write this post, is that the change must be done first in the development environment of the source system. Then, we'll run the ETL process and see that everything is fixed. This is important because sometimes we think that the change will make things right and it's not. That's why everything should be changed in the development environments first in both systems.

Only after we've seen that the data is fine, we can make the change in the production environments. Last tip: Make sure that the change is also made in the source system's production environment. Sometimes the time interval between these events is long and the source system's team just forgets all about it. Write it down as a task before going to production.

Monday, July 20, 2009 7:24:28 PM (Jerusalem Daylight Time, UTC+03:00)

Last Week we had a funny case which took us much time to understand.

In our development environment, we like to have the same source data for a long period of time. This way we're familiar with our data and we can expect the numeric results in the higher layers (OLAP, dashboard, etc.). When we have a stable environment and we think that our system gives the correct data, we ask our infrastructure team to refresh our sources, meaning that they bring new production data to the development environment.

The situation that we had last week is that we refreshed our sources and from that point the data didn't make any sense. We asked the infrastructure team if they're sure they refreshed the data and they said they are sure. We checked the system and the data many times and seen that the data is different from what we had but still it didn't make any sense.

After much frustration, one of us made a suggestion to write sql query in this form:
select max(some_date_field) from fact_table

Then we were all amazed. After much time of sisyphean work we realized that our data is not new at all. Moreover - our old data was old too. The maximum date in our main fact table was half a year ago. The reason was that we took our data from a legacy system which runs a job that exports their tables into files, and our infrastructure team imports these files into our source tables. The problem was that no-one ran the job so the files themselves were old, meaning that our refresh activity means nothing.

Conclusion: Never trust your sources. Always run a sql query like mentioned above to check that you have recent data and that the data you think you have is the data you have. You can think of more queries that validates that what you have is what you expect to have. Run them every morning to make sure you're ground is stable. It's only a matter of seconds and it's worth it.

May the source be with you.

Monday, July 20, 2009 5:50:02 PM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, July 09, 2009

The error function in Informatica can be very useful when you want to know why you have failed rows without failing the whole process. The use of the function is ERROR(string). Calling this function will make the error appear in the session log and the row will be ignored. For example: IIF(my_date_field > sysdate, ERROR('Futuristic Date'), my_date_field). This will ignore futuristic dates and show them only in the session log. By the way, you can declare default value for the port and the process will replace the ERROR with the default output.

Thursday, July 09, 2009 9:19:26 PM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, June 17, 2009

A wonderful idea I heard of is to turn to full screen mode in Internet Explorer when entering the Dashboard site. It can make a better user experience. Try it yourself and you'll see the huge difference.
How will we do that? We will add a JavaScript code to the first page of the dashboards site and after that we'll ask our system administrator to enable this script for us. Let's get to work:

Step 1 - The JavaScript

Create a new HTML component in the dashboard page. Edit it and click on the "View Source" button (the one with the <>). Enter the following code:
<SCRIPT>
var wscript = new ActiveXObject("Wscript.shell");
wscript.SendKeys("{F11}");
</SCRIPT>

It will simulate the user hitting the F11 key which will turn the IE to fullscreen mode. The only problem is that when viewing the page, you'll see this message:

This takes us to step 2.

Step 2 - Enabling ActiveX

First, I'll show you how to do this on your local maching and then you'll ask your system administrator to enable it on all the machines in the organization using distribution. Enter the Tools menu in IE and hit Internet Options. Click on the Security tab and make sure the "Trusted Sites" zone is selected. Note that the Panorama Dashboards site is already defined as trusted site (if the initial installation of Panorama Dashboards made according to the installation manual. If it's not, you have a problem). Click on "Custom Level" and Enable the "Initialize and script ActiveX ..." option:

Now, you'll see that there's no promting for ActiveX controls. Show this to your sys admin and ask him to make this happen on every user's machine (using distribution, of course). As I said, the dashboards site is a trusted site so I can't see any problem to enable this. The result is very beautiful and can make a lot of users happy. Note that you can also add a button in your page that will call the same script in order to return to normal mode.

Enjoy.

 |  |  | 
Wednesday, June 17, 2009 6:08:44 PM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, June 09, 2009

When adding parameters to your view, you'll see that they appear in the upper-left corner of the grid/crosstab. In the NovaView Desktop program it can be tolerated, but in the Web Access or in the Dashboards web site it cannot be. It's very annoying and we can't let the users see our inside use of the parameters. What can we do?

The solution is very simple: We need to change the skin of the view/dashboards page/dashboards site (depends on how you work) and make the grid corner font's color identical to the color of the grid's background. That way, the users will not see the text in the grid's corner. The way of doing it is also not hard:

Remember: Always backup your files before modifying them. In the panorama folder, enter E-BI/Config/Skins and enter your skin's folder. In the classic way of work, you're using the default skin which can be changed in the Dashboards settings section. I recommend you to make a new skin out from the default one (see here), update the skin's name in the Dashboards settings section and not touching the default skin itself. In your new skin, change the GridCornerFont setting so that its color will be the color of the grid's corner's background. You can see the color of the grid's corner in the GridTopLeftBackground setting. For example, if GridTopLeftBackground=(194,210,226), then if you set GridCornerFont=((Arial,1,R),(194,210,226)) then no-one will see the text over there.

Enjoy.

 | 
Tuesday, June 09, 2009 10:55:27 PM (Jerusalem Daylight Time, UTC+03:00)
 Monday, June 01, 2009
Trying to build ASP.NET page with Panorama applets, I could not understand why the applets appeared blank when I put them in tables. After a while, I've found that this got something to do with the DOCTYPE declaration that each aspx has in its head (!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"). When I removed this line, it all seems to work fine. Digging into this DTD specification, I can't see anything that will prevent from applet tag to appear inside td tag. Strange.

 |  |  | 
Monday, June 01, 2009 9:32:34 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, May 31, 2009

A new desire came from one of our customers. The request was to have the ability to search in Panorama Crosstab. The first solution I thought of was searching the grid in iterative way and it worked fine using the Panorama SDK. After that, my friend Boris came with another simple and elegant solution: We can use Parameter in the Panorama view and highlight the number which was defined in the parameter. In this post I'll explain how to implement this.

1. Create a new view using Panorama NovaView Desktop and make sure you see the grid in the view.

2. Define a new parameter: Click on View -> Paramaters, and click on the "Manage Parameters" button. Click on Add. The default type is Number and this is exactly what wee need (for now). In the name, type Highlight and in the Default Value type a number that you see in the grid (this is the number that will be highlighted later). Let's take 0 for example. Click on OK twice and close the little Parameters window.

3. Create a new Exception: Click on Data -> Exceptions -> Exceptions... -> Add. Click on Next and then choose "Custom Exception". Click on "Edit Exception" and there write the following formula: [Measures].CurrentMember = [[Highlight]]
This will simply select all the cells with the number that we defined earlier in the Highlight parameter. Click on OK and click Next. In this step, define the style of the highlighted cells. I picked red color and Bold font style. You can click on Finish now and then click OK. Open the small Parameters window (right click in the crosstab's corner and choose Parameters) and click on "Apply Changes". Now, you will see that all the cells with 0 are highlighted. If you don't see it, check that you did all the steps correctly.

4. When we will show the view to the user, we don't want to show him anything highlighed when the view is loaded. This is where a little trick takes place: open the small Parameters window (right click in the crosstab's corner and choose Parameters), double click on the Highlight parameter. Choose String as the parameter type (on the right part) and in the Default value, enter abc. Click twice on OK and then on the apply button and you'll see that now the highlighed cells are regular ones.

5. In the dashboards page, or in the web page you created using the Panorama SDK, create a button that will call the function searchGrid. Just add the button the property onclick="searchGrid('master')", where master is the applet's name. this is the code of the searchGrid function:

function searchGrid (applet) {
var reply = prompt('Please enter the number to search','');
eval(applet + '.CallUpdateParametersEx("P|~|Highlight|~|' + reply + '|~~|")');
}

Another tweaks I implemented and I didn't write in this post in order to make it simple (for advanced developers only):
  • You can search all the views in the current web/dashboard page. Just call the function for every applet, but make it in Batch mode.
  • You can search all the grid even if the user doesn't see all the rows. You can tell him if the number he searched for is in there or not.

That's all. Test your new page and enjoy. For every question about this and anything else, you can leave a comment or write my mail.

 | 
Sunday, May 31, 2009 6:11:47 PM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, April 07, 2009

When you develope a big Panorama Dashboards site, you'll have a lot of javascript code in the background. We use JS to call the Panorama SDK functions and methods, make the server side and the client side work together and to make the website dynamic and user-friendly. After few days, you'll see that you have a lot of code out there, so you must organize it (if you didn't do it in the first place). My friend Doron wrote a great post about JS development guidelines which can help Panorama Dashboards developers and any big website developers.

Wednesday, April 08, 2009 6:12:17 AM (Jerusalem Daylight Time, UTC+03:00)
Very easy. In Sql Server:
Select top n * from my_table
In Oracle:
Select * from my_table where rownum <= n

It can be very useful in many many cases. For example, you're designing DWH over a system and you're looking at a certain field in one of its tables. You want to know which values this field contains, but fetching "select distinct my_field from my_table" takes too much time. Instead, if you believe know that the data is well distributed, you can use "select distinct my_field from my_table where rownum <= n". Use 1000 for n in the first trial and add one zero in the end of the number n every time until you got a query that takes too much time than you want to wait. after you got the n you can live with, can use the values you have in your query result.

Wednesday, April 08, 2009 5:54:28 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, March 22, 2009

We're now beginning a quick session of Panorama Dashboard development using the 5.5 version. The Dashboard site is written in ASP, so it seems very native to write our custom code also in ASP. On the other hand, writing in C# is much more fun, so we decided to try and see if we can make the server side code be in C#. We found that it is indeed possible. You can place an iframe with ASP.NET page and reference the dashboard page from it using JavaScript (start from page.parent and go on from there).

This way you can enjoy the two worlds. In the next posts I'll show some examples of what you can do using this method.

 | 
Monday, March 23, 2009 6:55:14 AM (Jerusalem Standard Time, UTC+02:00)

There are some things you can only learn in the hard way. It didn't happenned to me personally but to my team friends, so I consider it as it is my bad.

We upgraded our ETL tool - Informatica, from version 8.5 to 8.6. We had to run some tests to see that the results are the same. So, what we did is to save the result table from 8.5 in Excel, save the result table from 8.6 in Excel and then compare them using Excel-built-in functions. The only problem is we didn't pay attention to the places where zero and null interchanged. This happened because the two versions act differently where null values take places in aggregation functions. For example, when there is a sum function and it aggregates only on null values. In one version the output is zero and in the other the output is null...

This can also happen in other tools and technology. For example, in OLAP cubes the difference between zero and null is the difference between seeing the member of the dimension on the screen and not knowing of its existence.

For conclusion, always be aware to this point and don't forget to check it.

Monday, March 23, 2009 6:09:59 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, February 22, 2009

Here are some tips we collected over the years about dashboard design:

Page Layout

  • Less is more - don't put too many views in the page.
  • Rule of thumb - no more than five reports in one page.
  • Don't use scrolling - the average user won't scroll down the screen.
  • Position in screen - some researches made about this subject and here are the recommendations:
    • Top-Left - it's the part of the screen that the user looks at first. Put there the most important data.
    • Center of the screen - the part the user looks after the top-left. Put there the second-most important data.
    • Top-Right, Bottom-Left - Neutral parts.
    • Bottom-Right - The user won't pay attention to it, don't put there important data.
  • Fixed menus in every page.
  • Small amount of navigation targets in every page. Too much navigation paths will cause confusion.
  • Concentrate on the main page - in 90% of the cases the user will stay there.
  • Add graphic components and highlight them if necessary.
  • Blue color only to links (and underline, of course).

Views Layout

  • Two digits after the decimal point - in non-integer number, put only two digits after the decimal point. The human mind can't understand more than that.
  • Focus on the clarity of the data and not only on its beauty. For example, 3D pie charts are very beautiful, but flatting them will make them more clear to the user.
  • Measures have meaning only when compared to other data. Don't put stand-alone measure.
  • Pay attention to graphical change between the data and not only colors. Remember that there are color-blind users.
  • Text is more clear than icons.
  • Use the San Serif and Arial fonts. They are the most readable to the user.
  • Align the text only to one side and not to the middle. It seems better to programmers, but users want their text aligned to the left or to the right.
  • Colors - don't use too much color. The dashboard page is not a jungle. Use colors of the same family.
  • Put dark text on bright background the vise versa.

And to conclude - use CSS whenever you can. It will save you much time and effort.

Sunday, February 22, 2009 9:14:26 PM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, February 10, 2009

In many cases, extraction of Panorama view's MDX code is necessary. For example, in order to check whether the performance bottleneck is in Panorama or in the OLAP server, you can take the MDX code, run it in SQL Server Management Studio and compare run times. You can take it further more but I'll leave it for future posts.

In Panorama Desktop program, click on Tools -> Direct MDX... and then CTRL + ALT + V. Then, you can copy it and use it in any way you want.

 | 
Tuesday, February 10, 2009 5:00:54 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, February 09, 2009
I'm glad to announce that two new Panorama forums were opened in the last days:
The first one is Panorama's Technical forum (English only). From now on you can get answers using this forum and see another users' problems and answers.
The second one is independent Panorama forum (Hebrew only). This forum was created by Michael Ra'am, ex-Panorama consultor. Its purpose is to be a place for sharing knowledge and ideas.
I believe you'll see me in both forums. See ya!

 | 
Monday, February 09, 2009 10:34:32 PM (Jerusalem Standard Time, UTC+02:00)
 Thursday, February 05, 2009

In some of our projects, we develop the Panorama views in the development environment along with the Data Warehouse, the ETL, the Cubes, etc. That's because the customers want to see how their product will look like before we deploy the views in the production environment. So, how do you deploy Panorama views from one environment to the other?

  1. Create the new book - If it's a new briefing book, create it using the Panorama NovaView Administrator program. If it's already exist you can skip this step.
  2. Copy the content - The book's content is by default in c:\<Panorama Folder>\E-BI\books\<Book Name>. Copy the content of this directory from the dev machine to the production machine. This is not enought because the views are still looking at the dev environment, so:
  3. Change the view's properties - You need to have a very simple program (let's call it PanoramaDeployUtil) that iterates over all the views in the given folder (and its sub folders, recursively) and change its properties. I recommend opening the view's file using xml reader and change the element \pnView\Root\Cube\Properties. You need to set its properties CubeAddress, CubeName & CubeDB according to the new environment's values (CubeAddress is the server address). Just run this program and the views will look at the new environment.
  4. Check - check yourself and make sure everything is ok by opening Panorama Web Access or Panorama Desktop and see that the values shown are the production's values.

Know that you can always open the Panorama Desktop and change the views one by one by hand.

Enjoy.

 | 
Thursday, February 05, 2009 10:21:08 PM (Jerusalem Standard Time, UTC+02:00)
 Saturday, January 03, 2009

You should know that when you run a program by starting a job with a CmdExec step, the directory in which the program is running in will be c:\<windows dir>\system32. How can this affect you? For example, I created a .Net console application that has a settings file with it. When I ran it using the SQL Server Agent, it couldn't find the settings file (worse - it used the default settings and that caused many trouble finding the problem). After some research, I found that it's looking for it in the directory I mentioned.

Sunday, January 04, 2009 6:24:58 AM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, December 03, 2008

I've been thinking lately about the new Microsoft Chart controls which are based on the Dundas acquisition (made in April 2007). What is the meaning of this to us, the BI developers?

Until now we were always counting on the abilities of our BI products. Let's take Hyperion/Oracle Essbase for example. Let's say I want to have a special graph of a new type or a new feature in a graph. I couldn't do it at all, because the product's code is closed (someone has to do money, doesn't it?) and I can't add any more graphs or features. There are some products where I can do things like this. For example, in Panorama NovaView I can build a new KPI type or doing a sophisticated visualization using JavaScript and Panorama SDK, but that's a lot of coding.

Now, we have the ability to write graphs by coding them without large amount of code. We can customize them as we want and we're not limited by any product. The drawbacks are maintenance and knowledge that we need to have here, but these are things that we need in every product anyway. I didn't learned this framework yet so I can't tell where are the limits, but they seems pretty far. Alex Gorev is writing about it in his blog (web, rss) so you can learn more about it there. It will take time to see if it affects the BI development world, so all what left to do is to sit and wait.

Thursday, December 04, 2008 5:46:20 AM (Jerusalem Standard Time, UTC+02:00)

Today I had a very disturbing coincidence.
My friend Ariel worked on a SSAS solution with no version control (we're using VSS). Instead of using that, he developed by opening the database on the server. I told him that he must fix it and we must have a recent version-controlled solution. In the past we asked Microsoft support how to do that (we lost all our vss files and had only the databases). They simply said that it's not possible. Ariel has found today that it can be made very easily using File -> New Project -> Import Analysis Services Database, as you can see in the picture:

Thursday, December 04, 2008 5:09:58 AM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, November 18, 2008

This is a little bit tricky. Unlike the AdomdClient assembly, the AdomdServer assembly  doesn't have a descriptive name. It's called msmgdsrv.dll and it is located in Program Files\Microsoft SQL Server\MSSQL.2\OLAP\bin. Why it's not documented anywhere?

Tuesday, November 18, 2008 11:17:16 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, November 17, 2008

After announcing the MdxInjection program I got several requests for additional details and for the ability to run it without using Visual Studio. So, here are some important points:

  • When I published it I had developers in mind because I'm sure than anyone will want to do his little modification before using it for his own needs. That's why I published it as a solution and not as executable.
  • I written it down using VS2008 but only with the .Net 2 framework. Those of you who uses VS2005 won't be able to open the solution.
  • The program has only one public method - InjectMdx, who takes two arguments: The location of the CommonMdx file and the location of the xml configuration file.
  • The CommonMdx.mdx file contains the common MDX script. The relevant part has to start with /* Common MDX */ and then the common mdx script. Anything written before it won't be treated. That gives you the ability to save some data or comments for yourself in this file.
  • Example of the configuration xml file can be found in the Test libary inside the solution. Basically, it enable you to define in which servers, databases and cubes you want to inject the common script. Pay attention that you have to write the connection strings in this file.
  • Note that the program will detect cube dimensions with their name changed and will know how to replace them. That means that if you mention the Time dimension in the common script and inject it to AdventureWorks cube, the script will replace the string "Time" with the "ShipmentDate" string, for example.

For those of you who want simple execution file, I added a windows console project in the solution.

Link to only executable program
Link to the solution with the added windows application project
Link to the solution without the windows application project
Tuesday, November 18, 2008 6:54:17 AM (Jerusalem Standard Time, UTC+02:00)

In the previous post I talked about the DRY principle in the BI Development. I mentioned that one of the major problems in the principle's implementation is in the common MDX code. Chris Commented:

"I'd like to be able to have a global MDX Script and be able to do something like a #include to bring calculations into specific cubes. One to add to my wishlist for the next version..."

And as I said there that I have a temporary good solution until we'll have it in the next SQL Server release (if someone from Microsoft is reading...).

The MdxInjection program takes your common MDX Script and a very simple xml file that defines where to inject this script. It injects the script into your desired cubes and even replaces the dimensions' names where necessary (it is relevant where you put dimension in a cube with a different name to thr dimension or when you use Role Playing Dimensions). I couldn't hold myself from writing some test code so it's also included in the project. The project is written in C# 2 using much AMO code. All the technical little details are inside.

Enjoy.

Download Link

Monday, November 17, 2008 8:29:02 AM (Jerusalem Standard Time, UTC+02:00)
 Friday, October 17, 2008

This month we're really busy with a very important project and a short schedule. This made me think of ideas for agile development for BI, but I'll leave it for other time for now. In order to make us better BI developers, I decided to take one Pragmatic Programmer principle and use it. I took one of the most important (for my opinion) principles - DRY (Don't Repeat Yourself). The DRY principle says that "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system". In classic programming it's simple to use: Use methods and generic classes to implements logic that will repeat itself in the project. But how do you do it in BI development? Here are some ideas I thought and even implemented some of them in my environment. Every layer/step in the BI development has it's own bulletin. I'll be happy to hear more from you.

  • First of all - use functions in your DataWarehouse's database. Do it as much as you can. Do not repeat any logic twice or more, no matter if it's in procedures, views or even CLR functions.
  • We all have much logic that repeats itself in the ETL process. For example, we found ourselves doing over and over the next process: When we build a fact table, we take every cell that points to a dimension table by a foreign key and "looking up" if it's found in the dimension table. If it's not there we replace it with Undefined, UD or null. That makes us feel very bad because we feel that we're doing the same all the time and it gives us the feeling of machines rather than programmers. The solution for this problem (and many other) is to build our own tasks (in SSIS) or transformations (in SSIS & Informatica). Alberto Ferrari did a beautiful work in this field in SSIS. I'll add some transformations of my own once I'll have release-ready versions of them.
  • My co-workers just loves the Calculated Member feature in the Data Source View in SSAS. In enables them to make a new column without making a view and with no touch in the underlying database. The problem here is that after a while we have a LOT of calculated members, many of them repeats themselves and when you look for logic you lost, you can look for hours in the never-ending DSV. The solution here is not using calculated members at all. Put all your logic in the database (and as I said - in functions). The only place where you should use calculated members is where you must - when you have no write permission to the DataWarehouse or when you build your DSV over an operative database and you don't have write permissions.
  • The same is with Names Queries in the Data Source View in SSAS. Don't use it.
  • There's much logic that you can do only in MDX. Here, the problem is that MDX scripts are defined over cubes and not over dimensions, meaning that if a dimension has MDX logic you have to repeat it in every cube's MDX script. The solution is to add the MDX programically using AMO. Every time the ETL process ends, it should run a program that takes the MDX script from a single file and place it in every relevant cube. I know it sounds a little bit wacky and I even didn't do it myself, but for what I know, it's the only solutions for DRY in MDX.

As I said, I'll love to hear your ideas about this topic.

Friday, October 17, 2008 9:46:22 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, September 22, 2008
My friends were stuck with a totally weird bug this week. After a day of frustration they called me for the rescue. It took me some time to figure it out and I think that every SSIS developer (and maybe every developer) can learn a thing or two from others' mistakes.

The mission: The data flow takes one table with duplicate rows and copies it to another table and makes sure that every row will appear only once. In the way, the data flow also adds some irrelevant fields. Among them is the Create_User and Create_Date fields which tells by who and when the package last ran.
How my friends did it: Again, it's a very simple flow. They only added Derived Column transformation to add the new fields and then they added an Aggregate transformation to make every row appear only once.

Note that this is not the real package. It's a sample I did on my machine to show it here.

The Bug: When I first seen this it seemed to me very simple flow and I asked myself how can it be that this is happening:

As you can see, it seems that the Aggregate transformation is not deterministic. Sometimes it outputs 99 rows, sometimes 198 and in some other times I get other results as well.
Investigating: I wanted to see what's the difference between the table that I got in the first time (99 rows) and the table I got in the second time (198 rows) so I changed the destination table and compared the two tables. I ran "select * from A where Column1+Column2+... not in (select Column1+Column2+... from B)"-style query but it was no use - it showed me that there were no rows that appeared only in one of the tables. In this step I really started to think (as my friends did) that maybe the Aggregate transformation has something wrong inside... Instead of blaming Microsoft, I decided to think. I needed to see what can make the flow non-deterministic. Then, it hit me.


The only non-deterministic component in the flow is the Derived Column because it has the getdate() function (it may be simple to see here, but in the original package the derived column transformation had many fields). The results of this function may differ in the milliseconds, especially for large tables. Then I looked in the Aggregate transformation and seen that the Create_Date column also was in the Group by operation, meaning that if two rows has different millisecond they will be placed twice in the destination table, although they are the same in every column. That's it, the bug was found. But still, one question remained: Why the query did not show me this? The answer is also simple but tricky to find: In the comparison query I concatenated all the columns in the tables in order to compare the results. When I did this, I casted the Create_Date to nvarchar which truncated the milliseconds.

Conclusions:
  • Pay attention to non-deterministic elements in what you do, whether it's code or ETL process.
  • When you do dummy stuff like checking all the checkboxes in a list - think what are the outcomes.
  • Call Miky when you're desperate.
Monday, September 22, 2008 8:10:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, September 20, 2008
This week I had something disturbing. When I installed Excel 2003 on the Panorama machine in order to use Excel functions in my MDX calculations, the NovaView Desktop stopped working. When I tried to load a view it threw an error in connection message. Calling to Panorama support, they told me that it's a known issue and it's hard to find by using the Panorama knowledge base. So here it is:

If you have connection issues in the Desktop program, enter the registry editor (Start -> Run -> regedit). Look for HKEY_CLASSES_ROOT\MSOLAP\CLSID and make sure it's the same as HKEY_CLASSES_ROOT\MSOLAP.3\CLSID. Remember - always copy from MSOLAP.3 to MSOLAP and not vise versa.

 | 
Sunday, September 21, 2008 6:38:48 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, September 04, 2008
In the last years I've seen many astonishing BI web sites. I always asked myself what I need to do to bring my customers such beautiful web-based BI solutions. After having much experience with Panorama NovaView and especially the Panorama SDK I started to run some questions in my mind: Why won't I build some re-usable puzzle pieces that can be joined together to a web site? These pieces can be web controls that using and even interacting Panorama views and Analysis services. Why won't publish it as open source and give it to the BI community?

The PanoramaBasedWebSite project is a toolkit that contains web controls you can easily use in your ASP.NET based web site. The project is written in ASP.NET 2.0 and C# 3.5. These web controls interacts with Panorama views (using Panorama SDK) and Analysis Services (using AMO).
The idea is that you can take these puzzle pieces, combine them as you like in your web site and create your good-looking BI web site with almost no programming. The project is only in its first steps, but I believe that publishing the design/idea is also important. This is why the first release is already published, although it has only two web controls so far. This is what we have so far and what I'm planning for the future. I'll be happy to hear your thoughts/ideas:

First Release Contents

  • PanoramaView web control - this is the main control of the project and it will probably take a lot of the project's weight. The control simply shows panorama view. For now, it doesn't do much rather then showing a view so there's a lot of work to do for this control. It gets two properties - BriefingBookName and ViewName. You can look at the TODO: comments in the code to see what future plans I have for this control.
  • UpdateDatePanel web control - this control shows the date and time when the last process of the cube was made. It can be used in two ways: You can only set the PanoramaViewID property. The control will extract the cube and the database name from the view and take the update date from the cube. The other way is to set the CubeName and DataBaseName properties.
Future Plans

  • KPIView - Already working on it. Similar to PanoramaView, but if the view shows KPI then a drilldown will be made when the user clicks on a gauge.
  • QueryList - Shows the result of MDX query. For example, the list shows the top 10 employees of the month (in sales perspective, for example). This list will be interactive, meaning that clicking on a row will make a drilldown, drill to data or replace the list with another query results.
  • DimensionPicker - Gives the user the ability to pick members of a dimension/hierarchy. After selecting, the control will slice all the views on the page (or only predefined set of views).
  • DatePicker - Same as DimensionPicker but for dates. It will show a calendar to the user and clicking on a date will perform a slice in the views.

The use of the controls in your aspx pages is very easy. You can see for yourself:

<PanoramaControls:UpdateDateLabel ID="UpdateDateLabel1" runat="server" PanoramaViewID="PanoramaView1" />
<PanoramaControls:PanoramaView ID="PanoramaView1" runat="server" Width="100%" Height="80%" BriefingBookName="MikysBook" ViewName="MyFirstView" />

I'll be happy to read your thoughts and ideas about this project. There will be more to come. Stay Tuned.

 |  |  | 
Friday, September 05, 2008 3:37:45 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, August 17, 2008

There is one tiny new feature in SSAS 2008 that you can easily miss. It called Empty Cube. When you create a new cube using the wizard, you can create an empty cube, meaning that it has no measures, dimension relationships, etc. The original use of it is for "users (who) want to create everything manually, or when all dimensions are linked dimensions" (taken from the description in the wizard)

In the past, I written about using SSAS with Visual SourceSafe in order to have source and version control for the SSAS project. I mentioned that it has many disadvantages but the big advantage (source & version control) is bigger so in the bottom line I recommend using it. One of the problems we experienced were that every time that someone creates a new object (cube, dimension, etc.) he has to check out the .dwproj file. The result is that sometimes we have a fight in the team for that file and we shouting: "who taken out the dwproj???" (yeah, I know that we can check who did it inside VSS but shouting is more fun).

The empty cube feature is a nice solution for this problem: When you create a new project you can create all the (empty) cubes and then the .dwproj file is free and no longer needed. I'm assuming that you know which cubes you'll have when starting a new project. The only thing remained is the same solution for dimensions. I'll recommend it in the Connect site (it's not working now for some reason).

Monday, August 18, 2008 3:50:41 AM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, August 12, 2008
Found a great site for BI beginners. Learn Microsoft BI has some videos about BI and SSAS which can place you in a good position as a beginner.

Tuesday, August 12, 2008 9:36:52 PM (Jerusalem Daylight Time, UTC+03:00)
I've been asked to review Widgenie and since I'm a nice person - why not? Widgenie is basically a widget creator that takes data from variety of sources: excel and csv files for now and in the future Google docs and more data sources. After you declare your data source you can change the look of the widget and then you can publish it in variety of ways: Facebook, blog, comments, etc.

I'm writing this post while creating my first widget so these comments are from the first encounter with the product, meaning that I can miss few things, but I believe you readers will get the picture:

  • The limitations on the excel source are way too much: Why the maximum file size is 2M? "The sheet should contain only the data and column headers. Titles, notes and other text outside of the data table will impede the upload" ? Why can't I take my old familiar excel and use it as is? Com'n guys, write some VBA scripts and work this out.
  • When I start from "Create new Widget" and then moving straight to "Create new Data File" because it is my first time, I want to go straight from the end of the data file wizard to when I've been in the create widget wizard. I don't want to start over the create widget wizard.
  • Every step in the wizard has a little question mark with it that explains the current part. It is very intuitive, nicely done and nicely put.
  • The widgets are very beautiful. As a BI developer, I'd be happy to put some of these on the CEO's dashboard.
  • What about multilanguage support? The hebrew columns appears as jibrish in the widgets.
  • The publishing process is not simple enought. I don't want to get a script that I need to place in my blogger/facebook/iGoogle. I want that the process will end at the target of the widget. For example, let's say I want to put the widget in my facebook's profile. I would expect that facebook will be open in the end of the wizard and a new Widgenie application will be created on my profile asking me to choose one of the widgets I created.
  • The text cloud widget is very simple and powerful. It can be very useful for managers.
For conclusion, Widgenie is a very beautiful product that has a long way to do if it wants to stand with the big sharks of the BI world. It has to fix some issues, support more sources and targets and have more capabilities (snapshots support, SSAS integration, excel-style chart editing and more). I don't know why I can't embed here the widgets (it's just ain't working) so here are links to the bar chart and the text cloud widgets.

Tuesday, August 12, 2008 9:12:53 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, August 03, 2008
update: Chris Webb and Mosha commented and made it clear that the reason for this error wrap is the NonEmptyCrossJoin function and nothing else. I also checked and I did not find any other function that wraps underlying errors.

This is something you need to be aware of when you're writing MDX. I don't know whether it's a bug or by-design. I'll be happy
to know (please comment if you know something that I don't).
Consider the following MDX:

SELECT
  NonEmptyCrossJoin
  (
    [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
   ,[Employee].[Employee Department].[Department].&[Sales]
  ) ON 0
FROM [Adventure Works];

The query will return with this error: The Set_Count argument of the NonEmptyCrossJoin function is either negative or larger than the number of sets provided. This is quite reasonable because I written the second argument as a member, where (NonEmpty)CrossJoin expects only sets. So, let's upgrade this member to a set:

SELECT
  NonEmptyCrossJoin
  (
    [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
   ,{[Employee].[Employee Department].[Department].&[Sales]}
  ) ON 0
FROM [Adventure Works];


All I did is wrapping the second argument with {} and we have a set. The query will return 19 columns.
Now, for the interesting part. Let's count the members of this CrossJoin before the we fix it:

WITH
  MEMBER [a] AS
    NonEmptyCrossJoin
    (
      [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
     ,[Employee].[Employee Department].[Department].&[Sales]
    ).Count
SELECT
  [a] ON 0
FROM [Adventure Works];


This return... 0.               
Where is the error?
My guess is that the Count function wraps the error. The NonEmptyCrossJoin returns null and the count of members in null is zero. The meaning of this is that if you'll ever forget to wrap the member with {} you'll always get zero and not an error. This can be very dangerous. Just for the check, running this query after the fix:

WITH
  MEMBER [a] AS
    NonEmptyCrossJoin
    (
      [Customer].[Customer Geography].[State-Province].&[NSW]&[AU].Children
     ,{[Employee].[Employee Department].[Department].&[Sales]}
    ).Count
SELECT
  [a] ON 0
FROM [Adventure Works];


will return 19. This been tested with both SSAS 2005 and 2008 (RC0). The examples here are from RC0.

Be careful with your MDX.

Monday, August 04, 2008 5:28:26 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, July 26, 2008
OpenSearch is one of the beautiful things I discovered lately. If you're using Firefox 2 and above or Internet Explorer 7, look at the search field in the right-top corner of the browser. See the shiny little thingy there? Click on it and you can instantly add two new search engines for fast search through your browser. The first one is my blog's search and the second (and more important) one is the ability to search BiBlogs right from the browser. Yeah, now you can search the whole BI community's blogs with only one click.

I call all the BI bloggers to add this too. It's 5 minutes work and it can help lot of people out there. See here for instructions.

Saturday, July 26, 2008 7:26:41 AM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, July 23, 2008
When you practice on SQL Server on your local machine you don't want that its services will start up with the computer. As I mentioned before, you should declare the startup method of these services as manual (see here). After that, you can build two simple batch files that will start and stop the services. Believe me - it's very comfotable to start and stop the services with only one mouse click. The first batch file (I called it sql.bat) contains only two lines:

net start MSSQLSERVER
net start MSSqlServerOLAPService

The second one (sqlend.bat) looks like that:

net stop MSSQLSERVER
net stop MSSqlServerOLAPService

Note that I only start/stop the SQL Server and analysis services, but you can do whatever you like.

Have fun.

Thursday, July 24, 2008 5:19:08 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, July 20, 2008
Sometimes the uninstall process does not succeed or even worse - the "Add or Remove programs" interface does not allow you to uninstall the product because it already uninstalled / doesn't exist / you name it. The problem is that the uninstalled program can't be removed from the list, it can't be uninstalled and it prevents another installation or re-installation. This happens a lot with Microsoft's heavy products such as SQL Server and Visual Studio but it can also happen with other products too.
What can you do?

Here's a small tip: Open the registry editor (Start -> Run -> regedit) and go to the path: My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall.
Under this folder you'll see many folders with GUI names such as {1268CDD4-0FED-3CE6-8A9D-C3B012ABCD8F}. To know what installation it is, look for the key named DisplayName under this GUI folder. In most of these folders you'll see a key named UninstallString. To uninstall this program, copy the value of UninstallString and paste it in the run dialog. This will start the uninstall process.

This trick will not always work, but it can help you a lot. Especially with broken installation of SQL Server.

Enjoy.

Monday, July 21, 2008 6:24:27 AM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, June 17, 2008
Well, I don't know what I expected but I'm a little disappointed. I'll split my review into two parts:
  1. The Analytics - This is the main issue for my organization, so here I expected to see some new & cool features, but all I've seen is only facelift. This is probably why the new version is 5.5 and not NovaView 6.
  2. The Google stuff - While this is not relevant for my organization, this was very cool and promising. I think that this relation between Panorama and Google will carry on and both sides will only benefit from it.
The greatest thing I got from the webinar is ideas of beautiful designs for sites containing Panorama applets views. The site that has been showed in the webinar was beautiful and intuitive. I just can't wait to give my customer a site that looks like it, completely sewed for him. Go on, look at the webinar and take some ideas for your site design.
If you haven't seen the webinar yet, you can download it or watch it.

 | 
Tuesday, June 17, 2008 7:24:21 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, June 01, 2008
My customer wanted to have the ability to show the last update time of the data in the Panorama's views' titles. He knows that I know to deliver :-) so in a couple of hours he had it. Using AMO and the view's xml manipulation it's very simple. Just note that changing views without NovaView Desktop is not supported by Panorama so watch out before you execute this program. In your first trials, always save the books dir (by default in c:\program files\panorama\e-bi\books) before you start. Also, be aware that this won't work for automatic views. The user must enter the view's title himself and write the string "Correct For" and the program will know to write the last update time after it. This is the program's code:

  1 using System;
  2 using System.Collection.Generic;
  3 using System.Text;
  4 using AMO = Microsoft.AnalysisServices;
  5 using System.Xml;
  6 using System.IO;
  7
  8 namespace CubeUpdateDate
  9 {
 10     class Program
 11     {
 12         static void Main (string[] args)
 13         {
 14             CubeUpdateDate cud = new CubeUpdateDate();
 15             cud.Go();
 16         }
 17     }
 18     
 19     class CubeUpdateDate
 20     {
 21         public void Go ()
 22         {
 23             DateTime cubeUpdateDate = GetCubeUpdateDate(GetConfigData("ServerName"),GetConfigData("DataBaseName"));
 24             UpdateViews(GetConfigData("BookDir"),cubeUpdateDate);
 25         }
 26         
 27         private void UpdateViews (string dirName, DateTime cubeUpdateDate)
 28         {
 29             foreach (string subDirName in Directory.GetDirectories(dirName))
 30             {
 31                 UpdateViews(subDirName, cubeUpdateDate);
 32             }
 33             
 34             foreach (string fileName in Directory.GetFiles(dirName))
 35             {
 36                 UpdateFile(fileName, cubeUpdateDate);
 37             }
 38         }
 39         
 40         private void UpdateFile (string fileName, DateTime cubeUpdateDate)
 41         {
 42             try {
 43                 XmlDocument xmlDoc = new XmlDocument();
 44                 xmlDoc.Load(fileName);
 45                 XmlNodeList titleTags = xmlDoc.GetElementsByTagName("Title");
 46                 if (titleTags.Count > 0)
 47                 {
 48                     string viewTitle = titleTags[0].ChildNodes[0].Attributes[0].Value;
 49                     if (viewTitle.Contains(@"Correct For"))
 50                     {
 51                         viewTitle = viewTitle.Substring(0, viewTitle.IndexOf("Correct For") + 11);
 52                         viewTitle += " " + cubeUpdateDate.ToShortTimeString() + " " + cubeUpdateDate.ToShortDateString();
 53                         titleTags[0].ChildNodes[0].Attributes[0].Value = viewTitle;
 54                         titleTags[0].ChildNodes[0].Attributes[1].Value = viewTitle;
 55                         xmlDoc.Save(fileName);
 56                     }
 57                 }
 58             }
 59             catch (Exception e)
 60             {
 61                 Console.WriteLine("Error reading/writing file: " + fileName);
 62             }
 63         }
 64         
 65         private string GetConfigData (string whichData)
 66         {
 67             XmlDocument xmlDoc = GetConfigXml();
 68             return xmlDoc.GetElementsByTagName(whichData)[0].InnerText;
 69         }
 70         
 71         private XmlDocument GetConfigXml()
 72         {
 73             XmlDocument xmlDoc = new XmlDocument();
 74             xmlDoc.Load("config.xml");
 75             return xmlDoc;
 76         }
 77         
 78         private DateTime GetCubeUpdateDate (string serverName, string dbName)
 79         {
 80             using (AMO.Server server = new AMO.Server())
 81             {
 82                 server.Connect("Data Source=" + serverName);
 83                 AMO.Database db = server.Databases[dbName];
 84                 return db.Cubes[0].LastProcessed;
 85             }
 86         }
 87     }
 88 }

The program uses xml config file that looks like this:

<?xml version="1.0encoding="utf-8?>
<Config>
    <ServerName>MyOlapServer</ServerName>
    <DataBaseName>MyDBName</DataBaseName>
    <BookDir>MyBookDirPath</BookDir>
</Config>

The program assumes that all the database has the same update time so it takes the last process time of the first cube in the database. If it's not true in your case you can change it in the method GetCubeUpdateDate.
Enjoy.

update: If you're getting trouble with XmlDocument.Load method because of hexadecimal characters in the view's xml file, look here.
 | 
Sunday, June 01, 2008 7:20:09 AM (Jerusalem Daylight Time, UTC+03:00)
 Friday, May 30, 2008
Finally, that looks like the answer for our needs. IBM Business Glossary (BG) is a product that manages our business vocabulary. It enables users to create business terms (also called entities), edit them, share them and to customize them. We've seen the product in IBM, Israel and we liked it very much. Here is a brief summary of the meeting:

Managing meta data in the organization is a difficult task. First of all, you need to know what kind of MD you want to manage. There are three main types:
  • Business MD - The vocabulary that contains the terms of the business.
  • Technical MD - Names and attributes of data storages, tables, columns, etc.
  • Operational MD - How the information flows inside the organization.

The BG gives common language to the organization and connects the business to the IT. First of all, it creates a contract - everybody knows exactly what is a "high value customer" for example. That supposed to be the end of confusion about business terms. It also helps to understand things, exposes knowledge and connect all the technical details.
In BG, all the terms has the same common attributes, such as name, description, example, related entities, etc. The users can define more custom attributes if they want. The product also manages the Data Stewardship, meaning that every entity has a father/manager. It can also have two fathers - one from the business and one from the technical aspect (Update: Not in the current version). The terms are divided into subject areas/context. This way you can go to a subject and learn it all by going over all its entities. You can see and use its custom attributes. For example, you can have a link there to reports that contains/lists that entity.

There's much more to say about BG. All I wanted is to give a brief overview of what it is and you can see if it can help you. I'll give my pros and cons for this product:

Pros:

  • Making order in the organization - everybody knows what you talking about when you say a term. Every entity has a defined father/business-expert.
  • Manages business knowledge over time. You can take a new employee and instead of taking other's employee's precious time to teach him everything, just tell him to go over the business glossary. (I'm not naive, but it will reduce time)
  • Fast lookup time - I want to know in which tables in the databases an entity is placed. I can find it in seconds.

Cons:

  • Security - BG has almost no security module at all, meaning that everybody sees everything.
  • Doesn't support services yet. I would like to see which services exposes and which services consumes an entity. I want to call the service, provide it with input and see the output.
  • The stewardship module is still weak. In the meantime, there is only one father of an entity.
  • The custom attributes are the same for the entire vocabulary. What if I would like to have a custom attribute only for one subject area?
  • There isn't a hebrew interface yet. The interface can be only in English, Spanish and French (if I'm not wrong).

For conclusion, I think that the product is good, even very good. The problem is that its development has to go on several iterations before it can be used a variety of organizations. It just doesn't have all the features that a business vocabulary must have. Wait a year and you'll see a wonderful product.

Saturday, May 31, 2008 2:04:53 AM (Jerusalem Daylight Time, UTC+03:00)
On June 10th, Panorama will show us the new version of NovaView - 5.5.  The show will be only on the web (that's why it called a webinar). We will see the new reports, flash-based dashboards and the results of the cooporation with Google. You can see the brochure here. I would happy to say that I'll see you there. The only problem is that we won't see each other and that's why I think that a real conference is better than a webby one. On the other hand, it's much simpler and cheaper to do a webinar so I can understand that move. Never mind, I'll see you in other time.

 | 
Friday, May 30, 2008 6:57:21 PM (Jerusalem Daylight Time, UTC+03:00)
 Monday, May 26, 2008
One more tip about installing the database samples: I believe that installing them is not enough. In order to improve your skills you need to have a deep knowledge of them. Therefore, don't deploy the SSAS project to the server and that's it. Build it yourself. Yes - create a new project called MyAdventureWorks or something like that and build all the objects by yourself. Indeed, this will take time and strength but this is worth this. After you'll do all the tricky things yourself then you really got it in hands. Learn the AW project and be a master.

Monday, May 26, 2008 7:44:47 AM (Jerusalem Daylight Time, UTC+03:00)
MDM
 
Everybody is talking MDM so we decided to go to IBM and talk with Darren Cooper, which is an expert on this subject. Darren gave some sense into this term and explained us exactly what it is and what it is not. There's a lot of confusion out there about this, so it is important to know things before you deploy them or buying a new MDM product...
This sketch can explain a lot of it:


Following the arrows, you can understand what is going on in this picture and what it is all about:
  • The operational systems contains some common critical data which we're tired of duplicating and maintaining all the time. So, we push this data (red in the picture) to the MDM in real time. This is it. That's MDM. From now on, we play with this golden egg and gets all the benefits from it.
  • Hey, we have all the critical data in one place, so why shouldn't we push it to the clients whenever they need it? After we have MDM it doesn't make sense to give it to them through the Op. systems, is it?
  • Wait a second! A client is using an operational system. Will the critical data be saved in the Op. Systems? You guessed right. Be aware that now the client will send data to both places - MDM and Op. System.
  • MDM is not a replacement for the DataWarehouse. Their purposes are not the same and each one cannot perform what the other is doing. So they need each other. The DW is taking data from the MDM like it taking from any other system. On the other side, the MDM is taking data from the DW whenever he need it.
I believe that now you have more clear understanding about MDM. There are many points that should be discussed about this but it is too soon right now because we're only learning this, so I'll just point them out.

  • Security - We have all the critical data in one place. Very dangerous...
  • Flexibility - The MDM should react very quickly to every change in the other systems of the organization. Clients cannot wait long for the MDM to change for every movement in the organization.
  • Availability - It should be always up and cannot crash too much because everybody is relying on it.
  • Updated - The definition of MDM says that it should be always updated, but it's not always necessary. The IT architects should find these scenarios where they can ease on the MDM.
  • Formats - Every Op. systems has its own standards and formats, and the MDM has to support all of them.
  • Interation with other IT teams - You should build trust with them because you're taking their critical data from their hands. If your MDM will malfunction they will be happy to take the advantage of the moment and take their data back to them.
  • Implementation - Building MDM is a very long process. The IT architects has to design its different modules and build them one atop of the other.
  • Conflicting Data - Which system has the last word? How can we handle these cases? Oh yes, it will happen. It always do.
  • Viewer - Do you need MDM viewer? How should it look like?
  • Make sense - This is maybe the most difficult subject. BI is a bunch of attributes without any inner sense between them. MDM should fill the void by supplying knowledge given by its many critical attributes. How should you do it?
As you can see, there's a lot to talk about. If we'll decide to implement MDM in our company I'll be happy to share here. Good luck to us all.
Monday, May 26, 2008 7:04:12 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, May 25, 2008
I thought that it will be a simple next,next,next installation, but it turned out that it is more complex than I thought. It is not something very hard to do, but there are some tricky points, especially when installing it on my PC and not on a dedicated strong server.
The installation starts as a simple wizard. Just go on with it but pay attention to this screen:



Here, you need to specify account for every service installed. Because it is a CTP installation and not a real server installation, you can make easy life for yourself and just use an administrator account for all the services because security is not an issue now. In the bottom of the screen, enter account and password of an administrator account and click on "Apply to all".
Now, for the really important note - the startup type. There are three startup types in windows services:
  1. Automatic - The service will wake up with the operation system.
  2. Manual - The service will start only by a process or an admin user.
  3. Disabled - The service can't start at all.
This choice is very important. If you're making the installation on a dedicated machine then you can choose Automatic because you'll need the service to be always running. But - if you installing this on a personal computer then you don't want these CPU & memory consuming services to be up all time long. In this case you need to choose Manual and start these services only when you need them. When you do, you start them by typing "services.msc" in the Start -> Run dialog and then find the service and click on start. I don't see any reason the choose the Disabled startup type in this screen. By the way, there's a new type in Vista called "Delayed", which starts the process only after the Automatic ones have been started. This option doesn't exist here and I don't see any reason to use it anyway.

Now for the big problem - installing the sample databases. The samples are not a part of the CTP so you'll need to download them from codeplex. Make sure that you download the samples that fit your CTP version. If you don't have the latest CTP then don't download the latest samples. Find your version in the releases section. After you have downloaded your samples, start the wizard. When you get to this screen:


you'll get stuck (if you haven't read this first, of course) with this message:

Error 27502. Could not connect to Microsoft SQL Server '(local)'. [DBNETLIB][ConnectionOpen (Connect()). SQL Server does not exist or access denied. (17) [I copied that for the ones who will find this by google search]

It got me a while to resolve this, so this is what you need to do before you install the samples:
Open the SQL Configuration manager (Start -> Programs -> Microsoft SQL Server 2008 -> Configuration tools) and enable TCP/IP protocal in the server:


That should solve it. After that, go to the directory "c:\Program Files\Microsoft SQL Server\100\Tools\Samples" and there you'll find the samples with a document that explain how to attach them to the server.

I hope this is helpful to those who got stuck and those who haven't got stuck with it yet. Enjoy.
Sunday, May 25, 2008 7:33:43 AM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, May 13, 2008
I started a long conversation about this subject in the MDSN SSAS forum. I think that it's a question and a principal that every advanced MDX programmer should be familiar with.

It all started with a customer that needed a standard deviation aggregation. I thought that it would be simple because there's a StdDev function in MDX, but it turned out that my customer had major plans for me: He wanted this aggregation to act for every dimension he puts on his axis. This means that the aggregation is not defined over a specific dimension (such as date), but the std-dev is defined over the current dimension in the axis.

The solution for this problem consists of a principle and an answer.

The Principle
Aggregation or a measure that is based on the current user's query is bad. This can and will result two users to see different results using the same measure. This will cause confusion and disinformation. The sacred principle of One Truth will be desecrated. Taken from the thread, in Chris Webb's words:

"I quite often see people wanting to write calculations that behave differently depending on the query that's being run, and I always tell them not to do it. You can hack something but it's almost impossible to get it work properly for every single possible query - MDX just doesn't work like that"

In the end I explained that to the user and he agreed. One more reason for his approval is that std-dev often doesn't really says something about the data. In other words, it isn't informative. "The standard deviation is 0.432. That means that... ???"


The Answer
If you (or the customer) still insists on that crazy measure, the following MDX will work.

With
Member [Measures].[RowSTDOrders] as
iif(Count(NonEmpty(StrToSet("Axis(1)").Item(0).Hierarchy.Children,
{[Measures].[Order Quantity]}) as ChildSet) < 2,
Null,
StDev(ChildSet, [Measures].[Order Quantity]))
 
select
[Date].[Calendar Year].[Calendar Year] on 0,
Non Empty [Product].[Product Categories].Members on 1
from [Adventure Works]
where [Measures].[RowSTDOrders]

Thanks for Deepak Puri for this code. Notice that the StrToSet function will cause performance degrade, but this is the only way that the code will also work in MDX script and not only in queries.

P.S
It doesn't matter if you write StDev or StdDev.
Wednesday, May 14, 2008 6:28:38 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, May 08, 2008
This tiny thing cost me a minute today, but it may take longer time to some of you, so I'm writing this.

As some of you know, in order to sort a dimension's attribute you need to change to OrderBy property of the attribute. You can make the attribute to be sorted according to other attribute (it's a very common thing in SSAS). In order to do so, you set the OrderBy property to AttributeKey and in the OrderByAttribute property you pick up the desired attribute (the one you want to define the order).

Note that if the first attribute (the one you want to sort) doesn't have attribute-relationship to the second attribute, you won't be able to pick up the second attribute in the OrderByAttribute property. These properties must have an attribute-relationship.
One more thing: You don't have to show the end-user the attribute which defines the order. If you want to hide it just set the property AttributeHierarchyVisible to false. It is a common pattern to make an attribute which will sort another attribute and hide it from the user.

Thursday, May 08, 2008 7:32:59 AM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, May 06, 2008
Last May I started my new blog with many questions: What exactly I will write about? Will anyone read me? What do I have to apply to all those blogs out there? and a lot more.
After a year of blogging I'm happy with my choice of starting a blog and I believe that this blog is good. On the other hand, I know there's a lot of things I can make better. This is a list of what I like and dislike about my blog. In the dislike list I wrote down what can I do to make it better, whenever possible. This list is mostly for me to make order in my mind, but maybe one of you can find useful things in it.

Like List
  • Release. A place where I can toss away thoughts from my mind to the world.
  • Share. I love to share good ideas and implementations. I belive it helps the community and the good comments I get make me understand that's right.
  • Save. Over the last year I found that the blog can be a very good place to save knowledge. When I need a piece of my code and I'm in a customer's place and not in my office it can be very helpful.
  • Be a part. Owning a blog positioning me in a community of people with shared interests. This promotes me in knowledge and as a person.
Dislike List
  • Not enough. This is the thing that bugging me the most. I'm not writing enough, or at least not as much as I want to write. This is frustrating even more when I see that my posts help many people out there. Finding the time to write and managing the time between reading and writing is hard. I will do much effort in the future to write more.
  • Screen Shots. I work in a closed-network in my company so I can't get out code or screen shots that can be very effective and helpful for you, the readers. I hope to install in my computer some of the programs I'm using so I can show you the results of my work.
  • Respond. I didn't responded you in time when you commented me. I will configure my blog to send me mail whenever you comment and I promise I will respond you more quickly.
This is it. Just two more ideas I have in mind. One is already implemented, the second maybe will be in the future.
  • When I started this blog I thought I will write about jewish stuff as much as I write about BI stuff. I was completely wrong. I found that writing about jewish stuff in english is very hard for me and that writing deep and serious thoughts is even harder. I changed the title of the blog to "Business Intelligence, Analysis Services, MDX, DataWarehousing and more..." (you can see it up there in the banner). I will focus on these subjects, but I will continue to write about other things that make interest.
  • I thought to add a box in the right column of the page titled "Upcoming Posts". That's because I know about the subjects I going to write about much time before I do it. I think it can be a cool feature but the question is: Will it interest someone? Is there someone who's waiting for it? I thought not. :-)

Tuesday, May 06, 2008 7:06:04 AM (Jerusalem Daylight Time, UTC+03:00)
 Tuesday, April 22, 2008

One more thing about getting a file from the web/SharePoint and using it as a source in SSIS: If you need to authenticate just change the xml.open command to:

xml.open "GET", URL, false, "user", "password"

where user and password are the user & password that has permissions to the desired file. Note that it is VERY recommended to have an application user, so the password won't be changed in the future. If you don't have such user and you must change your password in the future, do not forget to change it in the script. My tip: add a reminder in your calendar to change the password in the script.

In this point I don't know if you can authenticate using SSL or stronger protocols using VB script.

Tuesday, April 22, 2008 10:20:05 PM (Jerusalem Daylight Time, UTC+03:00)
 Monday, March 31, 2008
We got many client requests for the ability to show in their web sites the "last updated" date of the data.
It doesn't matter how you show the data of the SSAS - the customers will always want to know for which date the data is true.
My solution includes a ASP.NET 2.0 web site that uses the AMO class libary. It takes the date from the server and shows it to the user.

What you need to do is:
1. Open a new ASP.NET web site using Visual Studio 2005/8.
2. Add the AMO dll (Microsoft.Analysis Services). You'll find it in the SSAS server.
3. In the already-made default.aspx page, just add one Label.
4. Add a configuration file which will hold the name of the SSAS server. That way, when you install the site from the development environment to the production environment, you'll only have to change this file. Call this file config.xml and write in it the following:
<?xml version="1.0" encoding="utf-8" ?>
<ServerName>YourServerFullNameHere</ServerName>

5. In the code-behind file (default.aspx.cs) write the following code instead of what you already have there:

using System;
using System.Data;
using System.Configuration;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using AMO = Microsoft.AnalysisServices;
using System.Xml;

public partial class _Default : System.Web.UI.Page
{
  protected void Load_Page(Object sender, EventArgs e)
  {
    Label1.Text = GetCubeUpdateDate(Request.QueryString["DBName"],Request.QueryString["CubeId"]);
  }

  private string GetCubeUpdateDate (string dbName, string cubeId)
  {
    using (AMO.Server asServer = new AMO.Server())
    {
      asServer.Connect("Data Source=" + GetAnalysisServerName());
      AMO.Database db = asServer.DataBases.FindByName(dbName);
      if (db == null)
      {
        return "DB Name not found";
      }

      AMO.Cube cube = GetCubeById(cubeId, db);
      if (cube == null)
      {
        return "Cube Name not found";
      }

      DateTime lastProcessed = cube.LastProcessed;
      return lastProcessed.Day.ToString() + "/" + lastProcessed.Month.ToString() + "/" + lastProcessed.Year.ToString();
    }
  }

  private string GetAnalysisServerName ()
  {
    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load(Request.PhysicalApplicationPath + "config.xml");
    return xmlDoc.GetElementsByTagName("ServerName").Item(0).InnetText;
  }

  private AMO.Cube GetCubeById (string cubeId, AMO.Database db)
  {
    foreach (AMO.Cube cube in db.Cubes)
    {
      if (cube.ID.Equals(cubeId))
      {
        return cube;
      }
    }
    return null;
  }
}

Eventhough the code is self-explained, here are some points referring it:
  • I chose not to include the server name in the web.config file because I like to seperate application-related configuration and web configuration.
  • If you want you can get the cube name from the user (in the query string) and then the code is even shorter - just get the cube like I got the database.
  • I wanted to show the date in the format DD/MM/YYYY, so that's why I did the long return statement in the GetCubeUpdateDate method. If you want to return the date in the MM/DD/YYYY format you can use the lastProcessed.GetShortDateFormat() method.
  • Note that when you publish the web site you need to create a dedicated virtual folder in the IIS.
  • The user uses this site in the following way: All he need to do is to create a frame with this site's address as its source and add it the DBName & CubeId in the query string. In SharePoint it's even easier - the uses only need to create a page shower web part.
enjoy.

Monday, March 31, 2008 11:05:20 PM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, March 29, 2008
Two weeks ago I showed you the leds map. This time I'll describe how it is done.

The leds map is basically a web page with a lot of java-script and Panorama applets which together bring the user a feeling of Ajax & DHTML based web site.
Look at the picture of the map in the previous post. The leds are simply Panorama applets which show Panorama views. Each view shows only one led. Using the Panorama SDK, I did the following:
  • Take the led's value from the view and show it in the tooltip
  • Take the led's color from the view and let the user filter the map according to the desired color(s)
  • Take the led's view path and after clicking the view, show the related views (the departments' views)
The rest of it is just java-script games and tricks.
The leds map is a beautiful example of what you can do with imagination, thought and good will to give your customer a good working BI tool to work with.

 | 
Sunday, March 30, 2008 5:27:29 AM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, March 19, 2008
Yesterday, my friend Ilya asked me how to perform average for dates. I explained him that actually, the dates in SQL Server are represented as numbers, where zero is 01/01/1900. All you need to do is to cast the dates to numbers, make average on them and getting the result back as date. Assuming that the date column called MyDate, Here is the code:

Cast(Avg(cast(MyDate as float)) as datetime)

Thursday, March 20, 2008 12:04:23 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, March 16, 2008
A month ago I posted about the necessity of sharing ideas in the BI world. I really think that if we all share our smart ideas then we'll be better in our work.

I want to show you a work I finished few weeks ago. I'm very proud of this work as it will be in the desktop of our CEO and I got many compliments for it.
Note that what you see in the picture is not the real screenshot of the work (It's much more beautiful in the reality...). There's a problem getting out screenshots out of my company, so I did a sketch in Power Point.

This is the functionality of the leds map (my design, if you have any comments):
  • The leds map is simply a web site, meaning zero-footprint in the client's computer. Some computers in my company has java compatibility problems, so I added a parameter you can send with the site's URL which changes the applet's java version (see more in the next post, which will be more technical).
  • The leds map has to be small, about a quarter of the screen. That's because it's intended to be a part of the CEO's desktop.
  • When the map loads, a picture with a turning-around The Thinker statue is shown with a "Loading" message below (our CEO loves that statue...).
  • After the map has been loaded, the user sees two axis with the leds in them. The two axis can represent any Meta-Measures you'd like: Short-Term Profit Vs. Long-Term Profit, Client's Satisfaction Vs. Company's Profit, etc. This is a point that many people have difficulty to understand, so I'll give an example: The yellow led is in the top-right corner, so that says that the underlying measure is very important in both the meta-measures. Going on with the example, that says that this measure is very important for theClient's Satisfaction and for the Company's Profit. Note that the leds never move. Only their color changes.
  • When you move the cursor on a measure in the map, a small tooltip appears next to it. The tooltip shows the measure's name and its value (You can see it in the left-bottom led). Design Change: As my team master recommended, now each led has its measure's name above it. The tooltip shows only the value.
  • When the map loads, only the red leds are shown. In the top-left corner of the screen, there's three radio buttons which filters the shown leds by their colors. In the picture, all the leds are shown because all the radio buttons are enabled.
  • Clicking on a measure on the map drills-down to the different department's leds, as you can see in the left side of the picture.
  • Clicking on a department's led makes the map to vanish and instead of it there's a drill-down of the department, meaning that the measures of its sub-departments are shown instead of the map.
  • After the last drill-down was made, there are two possible actions: Close the new view and return to the map or open the new view in full screen, where you can slice-and-dice and play with the data.
In the next post I'll describe how the leds map was built using the Panorama SDK.

 | 
Sunday, March 16, 2008 7:55:46 AM (Jerusalem Standard Time, UTC+02:00)
 Saturday, March 15, 2008
It figures that using simple excel file as a source in SSIS is not so trivial, especially if your source is in the web or in your SharePoint portal. At first you'll think it's easy - just declare the excel source as a url (the url of the excel file, for example) and it will succeed. The problem is that Microsoft let you think it's working. Click on the excel source and you'll see in its properties that the source path is the local temporary internet files, meaning that the source is a local copy which is not up-to-date, so it's worth nothing.
Here's what I tried to do and the final (and successful solution):

1. Use the File System task. It won't work because you can't declare an URI there.
2. In the MSDN forum (I can't find the link right now) they say to write a script, so I also tried this. Using the Script task, I written a code in VB.NET which using the System.IO libary of the .NET framework and copies the excel file (using its URI) to the desired location in the local computer. Running it, I got an error saying that the script can't use URIs...
After trying this I understood that every code or action running in the SSIS context won't work with URIs. I'm not sure I know why Microsoft developers built it that way (or maybe it's just another bug). Anyway, the next step is the solution.
3. Build an executable file that performs the desired copy task. You can't use regular batch (.bat) file because DOS/CmdExec does not know how to work with URIs. So, there are two ways to perform this:
a. Download this and use it as a copier from the web.
b. Use the following code and save it as a Visual Basic Script file (*.vbs):

'GetRemoteBinaryFile.vbs
TheFile = "myExcelFile.xls"
DestFolder = "C:\SSIS_Sources"
URL = "http://mySite/myFolder/myExcelFile.xls"
Set xml = CreateObject("Microsoft.XMLHTTP")
xml.Open "GET", URL, False
xml.Send
set oStream = CreateObject("Adodb.Stream")
Const adTypeBinary = 1
Const adSaveCreateOverWrite = 2
Const adSaveCreateNotExist = 1
oStream.type = adTypeBinary
oStream.open
oStream.write xml.responseBody
' Overwrite an existing file
oStream.savetofile DestFolder & TheFile, adSaveCreateOverWrite
oStream.close
set oStream = nothing
Set xml = Nothing

After you have your file (vbs or exe) you can use the Execute Process Task in order to make the copy. In the task, declare that you want it to run your exe or vbs.  After that, just use a normal Data Flow Task, where the source excel file is in the local computer (the file that was copied in the previous task) and the destination is your desired DB.

Note that:
1. Before executing you must have the excel file already placed in your local computer, meaning that you must make the first copy before the first time you run the package. This is because SSIS performs integrity check before running the package and it checks that the file exists.
2. Even if the copy process is long (because it's coming from the web), don't worry. SSIS work synchroniously, meaning that the Data Flow task will not start until the Execute Process task which copies the file will end.

Enjoy.

Update: I added a post about authentication.

Sunday, March 16, 2008 6:47:26 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, February 25, 2008
With Panorama SDK you can do cool stuff as I will show you in the future. Though, There are important (and undocumented) things you must know before you start. A very common task is to change the shown view views. Note that:

If you load the view using the Parameter "Alias" with the full view path (ends with  ".xml") you won't be able to change the view. later. Worse: The applet will not return an error. It will just won't respond. So, if you want to enable the dynamic change of the view, in the "Alias" parameter only enter the name of the Briefing Book where the desired view is. Next, add another parameter named "FirstView" and there enter the relative path of the view, meaning that you'll have to remove the name of the server and the briefing book's name. Don't forget to replace the back-slashes (\) in double-back-slashed (\\), otherwise... the applet won't respond. Some examples:

use: AttachParameter("Alias", "http://<myServer>//<ThePanoramaDirectoryPath>//<myBriefingBook>//<myDirectory>//myView.xml"); to show a view with no option to change it later (not recommended). Note that here you don't need to use back-slash because this is just a regual URL.

use:
AttachParameter("Alias","myBriefingBook");
AttachParameter("FirstView","\\<myDirectory>\\myView.xml");
to show a view with an option to replace it later using the CallShowView function.

I recommend always using the second method. That's because you can't know what will be the next demand of your customer. Remember that this is a very common thing to do in the BI world.

 | 
Monday, February 25, 2008 11:39:22 PM (Jerusalem Standard Time, UTC+02:00)
My team master always says that Oracle was left behind in the BI world because they don't have good visual tools over the OLAP cubes. Microsoft, for example, has good visual tools such as Panorama. Today I've seen that they have beautiful visualations over relational DBs which called ADF Data Visualization Components. The problem is that these tools can't look over OLAP cubes. In this link (look for Oracle OLAP Q&A) you can see what we'll see from Oracle in the future: These visual components will be able to show OLAP data. Maybe that will make Oracle really be able to fight Microsoft in the BI scene.

Monday, February 25, 2008 11:07:02 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, February 10, 2008
My boss called me today to ask me if some things can be achieved using our technologies (mainly Analysis Services and Panorama NovaView). These things were simple ideas of how BI can be shown to the end users. As we talked I thought of many great ideas that can be done. As you well know, one of the biggest problems in the IT world is that the user doesn't always know what he wants. Drilling down to the BI world, I can say that the problem is that the user can't dream. He can't know what he can ask for and sometimes - how easy it can be achieved. One of our many tasks as BI consultants/designers/developers is to help them dream. We can show them things that we've done and things that others have done. This is where you, the reader, can help.

Let's share ideas. In your blogs (or as comments in blogs) you can write about beautiful things you did in your organization. This can be idea or real UI that you can show. Don't worry, I'm not only-talking man. In the near future I will show here something very beautiful (and big ROI of course...) that I did. Stay tuned.

Monday, February 11, 2008 4:38:22 AM (Jerusalem Standard Time, UTC+02:00)
 Friday, February 01, 2008
This is a good one: When you build a flat file connection to a csv file, you can preview the data. There, there's an option to skip some rows (Data rows to skip). If you'll leave it with a number greater than zero - the process itself will skip these rows!! I still wonder if this bug is By Design or not. If you wish, you can track this bug in Microsoft Connect.

Friday, February 01, 2008 2:00:16 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, January 21, 2008
Today I was in the Microsoft's Data Mining Conference which took place in the Sheraton City Tower, Ramat-Gan (Israel, of course). First of all - the food was good. :-) Now, seriously: All the lectures were great, although they were performed by one man - Rafal Lukaweichi, which is a very talented speaker. I think that I haven't seen such enthusiasm for many years in a lecture in the IT world. Anyway, what have I learned today?
  • The Data Mining world is very interesting indeed. Microsoft has a lot to offer in DM and it is all ready-to-use in BIDS.
  • Microsoft's approach is DM to the masses, which I don't believe it myself. Even though the tools are very simple and even the code (DMX) is easy (in contradiction to MDX), I don't think that an inexperienced developer can bring good results. The SAS approach says that you need to have deep knowledge in statistics (which is bad), but I don't think that DM can be made by the masses.
  • There are many different DM algorithms which you need to be introduced with before you start mining. As I mentioned, they tell you that all you need to know is what each algorithm does in general, but in fact there are many parameters which you need to adjust and play with, so in fact you need a good knowledge of these algorithms.
  • Visualization is very important in DM. Even after you have good results in your hands, you need good UI tools to show you the results in an efficient way or else you'll be lost in a jungle of data.
  • If you already have a datawarehouse, you're half way from mining models. The preparation of the data is a huge amount of the job in DM.
  • After you have good results and even after you got good visualization of the results, you need an expert from the company you work for (or in) that will look at the results and tell you whether they bring new knowledge or they trivial.
I don't believe there's a change I'll be mining in the near future, but maybe I'll play with it a little in my free time (which of course I don't have). Taking the data of our datawarehouse and mining it can bring some interesting stuff. Who knows.

Tuesday, January 22, 2008 3:06:14 AM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, January 15, 2008

Although SSAS willl let you use them, some other application such as Panorama won't function properly. I'll give one example: When you perform Drillthrough in Panorama, on the fly the engine gererates a web page that will take the user to the next view. The next view will be sliced as the current view, so this web page needs to pass the dimensions parameter (meaning - the current slices). That's why this web page contains this line:

AttachParameters("Slicers","%Slicers%")

A big problem will occur if one of the sliced dimensions will be sliced on a member with a name that contains inverted commas ("). The JavaScript will result an error because there are three inverted commas in the second parameter of the line. This is only one example of what can happen in a BI consumer program if you'll use special characters in member names. So - be careful not to pass these characters from the DW (build the ETL so it will drop these characters) or giving those names in SSAS, such as the All member name.

Tuesday, January 15, 2008 11:08:32 PM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, January 08, 2008

I feel like I don't have the right to write about it after so many bits of information were moving in the web about this subject but I would like to add my point of view (or in fact - my point of code). First of all, I must mention some of those who written about this subject before me. Mosha Pasumansky wrote a long post about it in last may. This post contains some ideas of how to come along with this problem, but none of them is perfect. In fact (as always) - there is no perfect solution for this problem. Another important source of knowledge can be found here in the MSDN forums, where Chris Webb, Thomas Pagel and others discussed it. Now, I would like to add my solution. Take it or leave it - your choice.

First of all, create a column in the time dimension that will be the current day indicator. Thanks to the Data Source View or the UDM approach of SSAS 2005 you don't have to change the relational table itself. Just add a named calculation in the DSV with expression that will be 1 for the row of the current day and null or zero for the others. The expression syntax itself depends on the underlying DB so I won't write it here, but it's very simple. Add this column as an attribute (let's call it CurrentDayInd) in the dimension structure and select its AttributeVisible propertiey to false. That's because we don't need such Attribute Hierarchy in our time dimension. After that, create a new User Hierarchy (you can call it Current Day), where the first level in it will be CurrentDayInd and after it place the day (key) attribute.

Now, what we have here? We have a hierarchy (Current Day) with two members - 1 and zero. The 1 member has only one child which is the current day. Link that member to your regular hierarchy (it's called YSQMD at my place) and here you have it. For example, you can use it that way in the MDX script:

Create Set [Last 30 Days] as
LinkMember([Time].[Current Day].[CurrentDayInd].&[1].Children.Item(0),[Time].[YSQMD])
:
LinkMember([Time].[Current Day].[CurrentDayInd].&[1].Children.Item(0),[Time].[YSQMD]).Lag(30);

I think that this solution is good and elegant. First of all, you don't have to use external functions such as Now(). The second pro is that we use the native OLAP mechanism which saves time and makes the queries run faster. Believe me, you'll feel the difference with big cubes. The last thing is that this solution is easy to understand (at least I think so) and it is easy for maintenance. The big con is that you have to process this dimension (and related cubes) every day. I don't think that it's so bad because most of the organizations do make process every day.


 

Wednesday, January 09, 2008 5:39:58 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, January 07, 2008

In the previous parts (1, 2) I showed how to connect Informatica with MS-OLAP, meaning that a mapplet can process cube or dimension. The thing is that I focused on the side of MS-OLAP. In the second part I even wrote the T-SQL code itself. Now I want to close the loop by describing what's going on in the Informatica side. This part was made by my friend, Alex, who permitted me to post here about what he did.

First of all, there's a table which contains the parameters to call with to the MS-OLAP procedure (object id, type, user name, etc.). This table is the source (& source qualifier, of course) of the mapplet. Each row in this table calls the stored procedure in the MS-OLAP side (in fact, the procedure is part of the relational DB, but never mind now). The call to the SP is made with Informatica's Procedure block. The connection is a regular ODBC connection, as mentioned in the previous part. Now for the interesting part: In the mapplet, the result of the procedure (zero for success, one for failure) goes into a Java Transformation block. This java block will fail the mapplet if one or more procedure calls returned failure.

How to do this java block? Double click on it to enter its properties. Go to the "Java Code" tab. There you'll see tab for every event in this block's life cycle. Here is the code for every tab (only the relevant ones):

Helper Code:

static int errorCounter = 0;
static Object lock = new Object();

On Input Row:

if (returnValue != 0)
{
 synchronized(lock)
  {
   errorCounter++;
  }
}

On End of Data:

synchronized(lock)
{
 if (errorCounter > 0)
  {
   failSession("OLAP Objects failed");
  }
}

Note that:

  • I'm not sure that the lock mechanism is required here. sync, lock, semaphore, etc. mechanisms are often used when atomic write is needed in order to solve problems like deadlocks, mutual write, etc. Here I simply don't care. Even if two parallel threads will read the errorCounter as zero and they both will increase it to one (when in fact it needs to have the value of two) it won't be a bug because the session will fail anyway. Alex & I need to talk about this point...
  • failSession is a function which is part of Informatica's API. As you might guess, it will fail the whole mapplet.
  • Very important: Calling all the MS-OLAP objects at once will cause an error in the Analysis Services server and all the objects will be in the state of Unprocessed. The Informatica side has to call the dimensions first and only then the cubes. The cubes must not be called all at once if they have relationships between them. This will cause deadlock too.
Tuesday, January 08, 2008 5:35:17 AM (Jerusalem Standard Time, UTC+02:00)
 Saturday, January 05, 2008

The Panorama NovaView Desktop program can't always deal with huge crossjoins. The reason is that this program is written in VB6 - very old platform for client programs. One thing you can try is to go to the crosstab properties and in the Advanced tab, click on the "Optimize huge crossjoins". The problem is that this won't always help. The best solutions I've found so far is to go to the Web Access site (or click on the IE button in the desktop program) and there you can choose the size of the chunk of data you'll receive on every click. Starting with 100 rows in the first chunk, this may help you with huge crossjoins.

 | 
Sunday, January 06, 2008 5:48:11 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, December 31, 2007

As many of you already know, installation on Microsoft Excel on the SSAS 2005 server is needed in order to use Excel functions in MDX. That's very helpful because MDX is lack of many important functions such as Round (!). Many organizations don't like it at all, but here's something that may help. In the SSAS 2005 server you don't need to install the whole program, only the .Net programmability support. In the installation, choose to manually pick up which components you wish to install and then choose the .Net programmability Support as seen in the picture:

Notice that this issue will not be fixed in SSAS 2008, so this tip will be relevant for a long time.

Monday, December 31, 2007 5:06:00 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, December 30, 2007

Just got home. Most of my day (and my co-worker's also) went on with a big installation of the second block of our BI project. In the morning we really thought that maybe this time, yeah - just this time things will go better. After more than 12 hours I laughing at myself: How could I be so naive? Many things that could go bad just did but after it all ended (with a happy ending, otherwise I wouldn't be here, writing in my home sweet home) I can say that the big blame is on Informatica PowerCenter. We're using version 8 of the software. It's not new software that started its way yesterday: It's a very old and familiar software. So how can it be that when we copy mapplets (ETL processes, for those of you who don't know Informatica) from one repository to another, some lines are just deleted from the mappings? After that you check your dimensions in MS-OLAP and you don't understand what happened there. A whole level in a big dimension that has only one member - 0 ?? Zero member is null. Yeah, we were right - the line in the mapping just been deleted by our precious Informatica so the column is all null.

Well, I happy we're through with this. Good night.

P.S.
Tomorrow I'm taking a day off... :-)

P.S 2

Although many things went wrong in the installation, I really think we had a good block this time. This block contains many beautiful things in MS-OLAP, MDX and Informatica. You'll see it here in the next few days, after I'll calm down. :-)

Monday, December 31, 2007 6:23:16 AM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, December 19, 2007

My team master Yaron asked me to check some things in the Panorama Dashboards:

1. Can have two hands in one gauge.
2. Can I show two values in the text of every gauge.

Here are the answers. I think that the second answer is a beautiful one. In fact, I really enjoyed while I thought how to do this.

1. This is simple: Just use the Goal hand as the second hand. In the KPI Wizard go to the Define Goal step and choose Custom formula. Enter the measure you want to see in the second hand.

2. This is beautiful: In the KPI Wizard, go to the Finish step and to the Title part. Click on the little blue arrow and click on "Edit MDX...". Then, write this MDX:

[My Dimension].[My Hierarchy].CurrentMember.Name + '\n' +
[Measures].[First Measure].Name + ': ' +
Generate({[My Dimension].[My Hierarchy].CurrentMember},[Measures].[First Measure]) + '\n' +
[Measures].[Second Measure].Name + ': ' +
Generate({[My Dimension].[My Hierarchy].CurrentMember},[Measures].[Second Measure])

Note that:

  • This solution may apply to other BI applications, not only to Panorama.
  • This way you can show many values and data, not only two values.
  • What the Generate function doing there? The '+' operator needs to have two strings in both sides, so writing only the [Measures].[First Measure] or [Measures].[First Measure].Value will return a numeric value which will cause error. The Generate function used this way will return a string. It generates for the set (which contains only our member) the value of the measure (in the second argument of the formula) and as mentioned, returns it as string.
  • '\n' will jump to the next line
 |  | 
Thursday, December 20, 2007 4:18:35 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, December 09, 2007

Last week I participated in Microsoft's BI conference in Ra'anana, Israel. After the conference I asked myself: What have I really learned today? Well, here is what I remember:

  • Microsoft figured out that the eternal BI tool is and will be Excel. People just love their Excel sheets and they will stay there. This is why the mission is to bring the BI into their Excel sheets. Their new product - Excel Services, will manage our excel sheets in one central place which is connected to our Analysis Services cubes.
  • In my point of view, SQL Server 2008 is just a bunch of many performance issues and it is not really a new product. There are a lot of new "performance features". For example, most of our MDX queries will run faster, especially those who has null cells. The new Cell-By-Cell calculations performance improvements will cause these queries to run faster. I think that SS2008 could be one big Service Pack. If I'm wrong, please do comment me.
  • SQL Server 2005 has many products that we don't know good enough. Some products that I need to learn about are: Replication, SQL Server Agent, SQL CLR and more. I do know what they do and even played with them a little bit, but I want to know how they can help me and improve my work.
  • Many new features in SS2008 come from two old sources: BIDS Helper (SSAS open source addin) and of-course, Oracle...
  • My big wish - IntelliSense for Analysis Services will not be in SSAS2008 and maybe won't be at all. This is because the guessing is MDX is very hard. There are too many options in every statement you write.
  • We won't need to upgrade to Office 2007 in order to use Excel Services. Only the developers will need it.

This is what I remember for now. I'll update this post if something new will come around in my mind.

Sunday, December 09, 2007 9:26:17 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, December 02, 2007

In the last post, I explained the architecture of our BI project. The final part of the process is processing MS-OLAP object (cube/dimension) from Informatica mapplet. As explained earlier, the trick is to call Stored Procedure from the Informatica server. But first there is one more thing to do: How do you connect the Informatica server (Linux) with MS-OLAP (windows server)?

Informatica ships with number of drivers that can connect it with other servers. The drivers are called DataDirect and I'll discuss 4.20. You need to define this driver on the Informatica server (look in Informatica knowledge base for more information). This is an easy thing to do. Notice that you have to enter a full server name (including domain) and the password. Remember that if you'll change the password in the future the process will fail. You have to enable the protocol named "Named Pipes" in the MS-OLAP server. How to do this? Enter the Configuration Manager in the MS-OLAP server and in the section of MSSQLSERVER protocols enable the Named Pipes protocol. This will enable the connection from the Informatica server. On the Informatica server, make a regular ODBC connection.

Here is the code of the SP on the MS-OLAP side. This SP must be on the msdb Database on the Database engine.

ALTER PROCEDURE [dbo].[ProcessObject]
@databaseId varchar(100),
@objectType varchar(100),
@objectId varchar(100),
@login_name varchar(100),
@returnValue int output,
@errorMessage nvarchar(1024) output
AS
BEGIN
declare @jobName varchar(200)
declare @xmla varchar(1000)
declare @jobId binary(16)
declare @ReturnCode int
declare @stop int

--Set job name
set @jobName = 'Process' + @objectType + '_' + @objectId

--Delete the job if already exists
if exists (select * from msdb.dbo.sysjobs where name = @jobName)
exec msdb.dbo.sp_delete_job @job_name = @jobName

--Create the job
Exec msdb.dbo.sp_add_job @job_name=@jobName
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'process OLAP object',
@category_name=N'[Uncategorized (Local)]',
@owner_login_name=@login_name, @job_id=@jobId output

exec msdb.dbo.sp_add_jobserver @job_name=@jobName, @server_name=@@SERVERNAME

--Declare XMLA for OLAP object
if (@objectType = 'Cube')
set @xmla = '

' + @dataBaseId + '
' + @objectId + '

ProcessFull
'
else if (@objectType = 'Dim')
set @xmla =



' + @dataBaseId + '
' + @objectId + '

ProcessFull


'
else
Begin
set @returnValue = 0
return @returnValue
End

--Add the job step
Exec msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Process Object',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'ANALYSISCOMMAND',
@command=@xmla,
@server=@@SERVERNAME,
@database_name=N'master',
@flags=0

--Run the job
Execute sp_start_job @jobName

Waitfor delay '00:00:05'
set @returnValue = (select run_status from dbo.sysjobhistory
where job_id = @jobId
and step_id = 1)

-- Loop until the job ends and return its result
set @stop = 0
if @returnValue is null
while @stop <> 1
Begin
set @returnValue = (select run_status from dbo.sysjobhistory
where job_id = @jobId
and step_id = 1)

if @returnValue is not null
set @stop = 1

waitfor delay '00:00:10'
End

--Return error message (if exists)
If @returnValue = 0 --failed
set @errorMessage = (select [message] from dbo.sysjobhistory
where job_id = @jobId
and step_id = 1)
End

update: I see that the xmla code went bad in the post because it is not recognised html code. It doesn't matter, I believe you got the point...

Sunday, December 02, 2007 5:59:42 PM (Jerusalem Standard Time, UTC+02:00)
 Saturday, December 01, 2007

In the past I mentioned some fragments of the architecture of our end-to-end BI solution. Now I'll discuss how it is done. I will only write about the things that I done (I mean, developed) but I'll describe the whole picture.

Our architecture goes like this: Control M -> Informatica -> MS-OLAP (Analysis Services 2005).

In words: ControlM is the most common scheduler in big companies. We use it to schedule our ETL processes in Informatica. Our system team made it possible to start Informatica processes from ControlM. I don't know exactly how it is done. All I know is that ControlM raises a flag in a table, and Informatica scans the table every X seconds and start the process if it finds the flag that was raised by ControlM. Don't ask me about the technical details - it wasn't my job.

The more interesting thing (because I did it...) is how Informatica calls MS-OLAP and tells it to process a cube. In this part I'll describe the big picture and in the next one I'll give some of the code and discuss some technical views of the process. First, the Informatica mapping moves the data from the source to the target, which is the dimension or fact table just like it always does. After that, Informatica calls a Stored Procedure on the MS-OLAP server which process the cube. Informatica calls this SP with some parameters, including the object type to process (cube/dimension), its ID and some more parameters. In return, the MS-OLAP returns return code (in order to point whether the process succeeded) and message describing the error if it occurred.

How the SP process the cube/dimension? Unfortunately, there is no SP that can process OLAP object so I needed to use the following steps in my SP:

  1. Delete any existing job that does the same action (read on, you'll understand)
  2. Create an empty job
  3. Add a step to that job that will process the object. This step contains XMLA code that contains the parameters that were given from Informatica
  4. Run the job
  5. Loop until the job (or process) ends and send back the return code and the error message, if exists.

In the next part I'll write some of the code and discuss some technical issues.

 

Sunday, December 02, 2007 5:19:47 AM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, November 21, 2007

I never thought of it until one of my users said it. Sorting the KPI can be a very good idea. Instead of always having the same indicators (gauges, traffic lights, etc.) in the same position of the screen, sorting it can make the viewer expect that the most relevant indicator will be placed in the top-left corner of the screen, the second most-relevant will be placed after it, etc.

Sorting the KPI is a very easy thing. Every end user, even with no clue in MDX can do it by following this:

In the Define KPI wizard, go to the "Select Set" step. Copy the current set. For the example, let's say that the current set is [Products].Members and you want to sort it according to the Sales measure in descending order. Click on the advanced button on the right of the set (The button with the "..." on it) and enter the following MDX statement:

Order([Products].Members, [Measures].[Sales], DESC)

If you want to order in ascending order you can replace the DESC with ASC or not to mention it at all.

 |  | 
Thursday, November 22, 2007 6:24:44 AM (Jerusalem Standard Time, UTC+02:00)

If you see "No Data" after you entered a view with defined KPI, most chances that the reason has something to do with privileges, but today I've seen one more thing you can do that will make the KPI go crazy.

One of my users built a view and he removed all the measures but one. After that, when he defined the set of his KPI, he chose the set of the measures. That caused the "No Data" title when he wanted to see his KPI view.

 | 
Thursday, November 22, 2007 5:45:19 AM (Jerusalem Standard Time, UTC+02:00)

One of my users had a very weird problem today. When he entered a view with KPI gauges, he could see everything but the gauges. All was there: The titles, numbers, etc. but the gauges itself. The problem occurred in the Panorama Web Access site and also in the dashboard site. I checked with other users and they didn't have this problem (with the same views, of course).

After a few minutes I found the problem: The Explorer process in windows used too much memory and it caused visual problems in the browser. I ended the process, restarted it (Ctrl + Shift + Esc -> File -> New task -> explorer) and everything went back to normal.

 | 
Thursday, November 22, 2007 5:39:21 AM (Jerusalem Standard Time, UTC+02:00)

Important note: The user working with Panorama NovaView Desktop must have write privileges on the Panorama folder (The default is: C:\Program files\Panorama). The program saves its data there, so it will cause many troubles if it won't be able to save. For example, when you start the program and click on the globe (work on briefing book from the server) you'll have to enter the panorama's server name every time you start the program (if you don't have the mentioned privilege).

Don't worry: The user watching the views doesn't need to have any privilege on its computer. He only needs the right to see the view or the dashboard page.

 | 
Thursday, November 22, 2007 5:27:33 AM (Jerusalem Standard Time, UTC+02:00)

I had a little challenge back in work and I solved it (in a couple of hours). Here is the description of the problem and its solution:

Let's say that I have a Products dimension, time dimension and a fact table that describes all the faults which occurred in these products. The new requirement is that: Given a product-tree that describes all the parts of every product, I want to know how many faults occurred to every part. The problem is that the fact table points only to the products and the customer wants to know how many faults happened to the parts.

The two solutions I thought about are:

1. The trivial solution: Build a view above the fact table that takes every row and adds a row for each of its product's parts. That way, the fact will contain every fault that happened to every part. The problem with this solution is that the view is very long to compute.

2. The good solution: Build a parent-child dimension out of the parts table which will describe all the parts of every product. Notice that this dimension is not ragged, meaning that one member can have 2 children while another member in the same level can have 10 children. The next step is to add this MDX Script:

Calculate;


Scope([Measures].[Faults],
         Descendants([Products].[Products].[All],
                           1,
                           After));


   This = Ancestor([Products].[Products].CurrentMember,
                         [Products].[Products].[Level 02]);


End Scope;


Scope([Measures].[Faults],
         [Products].[Products].[Level 02].Members);


   This = ([Products].[Products].CurrentMember)
             -
             Sum([Products].[Products].CurrentMember.Children);


End Scope;

Explanation: The first level of the dimension is the [All] member. The second level is the products and the other levels contain the parts. The first block takes all the parts and inherits their amount of faults from their product ancestor. The second block solves the aggregation problem: Let's say that The bike product has 3 children. The bike had 4 faults in our slice of time, so according to the first block of the script every child has 4 faults. Now, the cube makes its aggregation and now the bikes has 16 faults - 4 of its own and more 4 for every of its parts. The second block decreases the sum of the product's children from the products and the result is that the products have their original number of faults.

 | 
Thursday, November 22, 2007 5:12:21 AM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, November 06, 2007
This very useful option in Panorama can prevent much pain. Often, you don't want your CEO(s) to play with the views you created in Panorama. The beautiful dashboard page you created is what you want them to see and that's it.
In the dashboards site, select the component holding the desired view and in its Toolbar options, check the "Disable Analysis" checkbox. This will prevent the user from slicing and dicing with your view.

 | 
Wednesday, November 07, 2007 5:10:19 AM (Jerusalem Standard Time, UTC+02:00)
After our first MS-OLAP first installation, I started asking myself some questions: The way we moved our cubes to the production server is by the Visual Studio. We just deployed the cubes into the production server. The problem was that we forgot that we had some changes done in the XMLA code that lies behind one of our cubes. That caused some troubles in the installation and the result was that we did some changes in the XMLA script in the production server after the installation...
So, how we should make the installation? Should we export the whole database and import it in the production server, or should we generate XMLA scripts and run it in the production server? Is there really a difference between these choices?
If someone has an answer I would be very happy to read it. Thanks...

Wednesday, November 07, 2007 4:17:55 AM (Jerusalem Standard Time, UTC+02:00)
Yesterday, my team installed our first end-to-end BI project which includes many familiar technologies (for us, of course) such as Informatica, Oracle and Control-M along with new technologies (again - for us) such as Analysis Services 2005 and Panorama. This is the first project we have with SSAS 2005. Cheers for us.

We had some failures along the way, so we sat down today with our DBA team in order to investigate the good and bad things we had in the installation. The most important conclusion we got to is that the installation documentation just wasn't good enough. We did it quick and quite dirty because we thought that its purpose is to keep up with the QA checks. We forgot some important issues and wrote down the other things in a shoddily way. My conclusion is this: First, write an installation document and keep attached to it when you're doing the installation even if you sure that you know the drill 'cause you did it a hundred times. Second, write a good installation document for you, your mates, and the future co-workers to come. Third, write everything you did in the installation if it wasn't in the document.

Don't ignore the installation document. It can make the difference between success and failure in the installation.

Wednesday, November 07, 2007 4:09:55 AM (Jerusalem Standard Time, UTC+02:00)
It took me a while (more than 15 minutes!) to find something so elementary. The thing is how to call a sleep function in T-SQL in SS2005. The statement is: Waitfor Delay <DelayLength>, where <DelayLength> can be in the format: '00:00:10' for ten seconds. the <DelayLength> can be also a parameter of the type char(8).

Wednesday, November 07, 2007 3:40:13 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, October 29, 2007

As you can see from the title, we use VSS with AS 2005. The reason for this is that we want to have backup and source control in our AS projects. There are some interesting points when working that way which I would like to share:

  • When you process a cube in a VSS-contolled AS project, the compiler writes the exact time and hour of the last process in the .database file in the solution. This file is a small and simple XML file. This piece of data is written in the <LastProcessed> element. That is the reason why VSS will ask to check-out this file every time you will process a cube. You can ignore it and cancel it twice and the cube will be processed anyway. Of course, the last processing time will not be correct.
  • The .dwproj file is the binder of the solution (as in regular code project). That means that Visual Studio will update this file every time you add, delete or rename an object to the solutions. When two (or more) developers doing these king of actions together it will cause a disaster. Only the objects of the developer who checked-in last will be saved in the project. What you need to do is manually edit this file and add the object's elements yourself. That's not very hard, but you'll need to concentrate. As always, I recommend using Notepad++.
  • When editing the .cube file yourself (when merging manually after two developers worked on the same cube), make sure that the dimensions ID's in the .cube file match the ID in the .dim files.
  • AS projects can be easily destroyed because of an error in a VSS-related decision. Think twice when clicking in its annoying dialog boxes. Make a label every day after building, deploying and processing the project.
Tuesday, October 30, 2007 4:03:45 AM (Jerusalem Standard Time, UTC+02:00)

We use VSS with Analysis Services 2005 in order to have source control and backup (I think this deserves a post of its own). We've been shocked to find out that our VSS didn't save our labes and our versions. The defect we experienced is that after clicking on the history button and then on the OK button in the "choose date range" dialog box, nothing happened. We were sure that no history was saved and that is the reason that nothing happened. After a long search in the web, I found this thread in the MSDN. The problem is a bug in VSS that has something to do with dates representation in Microsoft's code. The workaround is easy, but it seems to work only in Windows 2000: Open the VSS admin program, then Options -> Tools -> TimeZone, and set the time zone as none. That's it.

By the way, notice that my second post in my blog was about date formats and how tricky and dangerous they can be.

Tuesday, October 30, 2007 3:43:05 AM (Jerusalem Standard Time, UTC+02:00)

This is the second bug I found when working with AS and Oracle as my database (you can read about the first one here). Some background about our Datawarehouse architecture before I begin to complain about Microsoft:

We cannot afford risking that our users will experience faults or crashes while we refresh our DWH, process our cubes, etc. What we do to solve this problem is called a switch: We have two schemas in our Oracle DB. While our users watch the first schema, we update the other one. Only after we finish all our load process, we switch our user's tools to see the data from the second schema. In order to implement this architecture we use Oracle's synonyms. Let's say that the users watch the fact table "sales". We have a synonym which is called sales_syn. While it's pointing to the first schema (schema_a.sales_fact), we're loading into the second schema (schema_b.sales_fact). After that, we switch the synonym so it will point to the second schema (schema_b). The users always look on views that rely on synonyms. The views never change, only the synonyms do.

The problem starts when we make Named Queries in our Data Source Views in AS. Apparently, AS looks inside the view that we enterd into the DSV, finds the target of the synonym and saves it. Even after we make the switch (the synonym points on the other schema), the named query will be pointing on the first one. All our efforts to edit the named query have failed. When we open it again, we will always find that the first schema is "burned" there and cannot be changed.

The solution we chose is simply avoid using named queries. If we need a simple calculation we can add a Named Calculation in the Data Source View and if we need a complex view over our fact table, we write it ourself in the DB. This solutions breaks a little bit the main point of the data source view (designated place for all the logic of the DWH), but it is the best solution we could think of right now.

Dear Microsoft developers - it seems that you tried to be smart and look into our Oracle objects in order to enhance the multidimensional database's performance. Next time, please think twice before you do.

Tuesday, October 30, 2007 3:24:36 AM (Jerusalem Standard Time, UTC+02:00)

It seems that four days before I wrote my post about Panorama Hidden Settings, Panorama entered all the registry keys into their knowledge base. You can find them here. Strage: I seeked it for a long time and now I see that it was always there, under my nose.

 | 
Tuesday, October 30, 2007 2:58:13 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, October 22, 2007
If you read my blog from my home page and not via RSS or RSS-based sites, you may see it in the right column of the web page. You can click there and I will have a call from you (if you have Skype installed). If you need any help or explanation about BI stuff, just click and ask. Oh, by the way, you need to have some cash in your Skype account in order to pay.
I'm not trying to be greedy. I'm just trying to earn a little bit of money from my knowledge. If you need a little assistance regarding to BI, SQL Server or Panorama - ask me. If the answer will be quick I will not charge you at all.

So, pick up the phone... ;-)

Monday, October 22, 2007 7:51:52 AM (Jerusalem Standard Time, UTC+02:00)
I believe that every BI developer seen this in many Data Warehouses: Boolean Dimensions. As you may guess, boolean dimension is a dimension with only two members and of course with no hierarchy. For example: cash/credit card in sales cube, exists/not exists in inventory cube, etc. If you haven't seen this phrase before - relax - I just invented it. :-)
Now, the question is what to do about these dimensions:
a. Include them in the ETL process or just leave it as is?
b. If you put it in the ETL - how would you implement it?

Here's what I did in my project. You may disagree with me and I would like to see other approaches too.
a. Yes, I included it for some reasons. As every Pragmatic Programmer knows, everything can be changed so do not assume anything as globally-static. This rule takes place in here: Boolean dimensions may grow and have more members. For example, in the sales cube I mentioned above, maybe there will be another way to pay such as exclusive card of the shop (There is a network here in Israel who has it). Even male/female boolean dimension may have an Unknown member. So never exclude these dimensions from your ETL process. Wait - one more thing. You may think: Why interrupt my ETL process with these silly dimensions? If they'll grow up I'll add them to the process. As an answer think about the timings: You can never know how much time the dimension's ETL will take (although it will be very small), so in order to stay away of surprises - include it in your ETL process. just for case.
b. I implemented it as two hard-coded expressions and sent them to union. The result of this union will enter directly to the target table. In Informatica, the mapplet can't start without source table so just put a dummy table with only one row and connect it to the expression items. Why only one row? If the table will contain more than two rows then the Informatica server will consider the process as failed one.

As I said, I'll be happy to read other approaches other than mine.

Monday, October 22, 2007 7:44:45 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, October 15, 2007
I really think that the time dimension is the most complex dimension in 90% of the DWHs. The complexity is in two places: In the DWH design and also in Analysis Services (or any other BI tool).
First of all - why we didn't take the already-made Server Time Dimension which exists in SSAS 2005? For two reasons: The first is that the Project Real guys do not recommend using it (you can find their SSAS article here). The second is that we wanted to have some features that are not available in the server time dimension, such as Hebrew date. In a matter of fact, even if we didn't have such feature we still would build the time dimension ourselves because it's giving you much more control over the dimension. For example, you can always add some new attributes which Microsoft developers didn't think about.
I started myself to build the time dimension in excel. I figured out that this mission is little more complex that I thought it would. Most of the functions I wrote were simple, but there were some complicated ones. So here are some tips for you if you want to build your time dimension using Excel:
  • If you want to week number for every date, do not write the function yourself... Excel has function called weeknum. If you don't have it just add the function toolbox which has it (I can't recall its name right now. check in excel help).
  • If you want to have records for every level in your hierarchy (not only for days), put every level in its own excel file (not excel tab). It will help you later when you will transfer it to your DB.
  • Check yourself. Pick randomly some dates and check that all of its record has correct data.
After building the excel files I needed to transfer it to my Oracle server. I used SSIS because I didn't want to wait for my DBA to copy these files into the Informatica server (it can't use my the local files, it has to be in its server. SSIS can use local files). This also was a little tricky. First of all, close excel when running the SSIS packages, otherwise it will fail. Second, when moving the non-leaf levels, go into the columns section in the destination box and erase the irrelevant columns. It will reduce the chance for errors. Finally, click on the source box and click on "Show advanced editor". Enter the source's output columns options and define properly the columns' data types. This also will reduce the chance for errors.

I had a little bit of an argument with my DBA about how should the time dimension be. I think that the time dimension does not have to be processes at all. My time dimension is from 1960 until 2020, so no daily ETL is required. She says that all the logic has to be in Informatica so I need to develop a mapping for this dimension. I think that we both are right and that's because that in ideal world she is right. In every developers team, all the BL has to be in one place. But we don't have much time (the deadline is very close) so I won't spend the time building more mapping in Informatica when I have the time dimension already made in excel.

Maybe someday I will have the time to do this. Maybe not.

Monday, October 15, 2007 8:18:38 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, October 14, 2007
I guess that this will not be my last post on this subject, but I want to start sharing some thoughts and tips from my experience when designing and building DWH. In this post I will focus on the fact & dimensions tables relationship in terms of data completeness (if you wonder what it is, read on).

Before you start to design the DWH, sit and talk with the people who built the systems which you take your data from, including the DBA. For every table, ask them what is the primary key (it's NOT always defined properly in the DB), then ask them again and then ask them if they are sure. It happened to me that I discovered that the systems guys were wrong about their DB's primary keys.
The same thing is about Foreign keys and here you should be even more careful. Even if they claim so, check yourself that every foreign key in the fact table is placed correctly in the dimension table, especially when the fact table has far history records. Sometimes system developers or even worse - system DBAs delete records from the dimension tables that are not relevant. This will cause that these keys will still be in the fact's history records but will not be found in the dimension table, causing uncomplete relationship between the fact and the dimension table.

So far is about the part when you talk and "investigate" the system developers (the DWH design). What to do when you actually developing the DWH? First, develop the tables of your dimensions tables. Do not forget to add the primary keys in the dimensions tables and the primary and foreign keys in the fact table. Then develop the ETL processes and go for the dimensions first.  If you know that the dimension has completeness problems with the fact table that you will develop later (you talked with the system developers, remember?), add UNDEFINED (UD key) record for the dimension table. Later, when developing the fact table's ETL process, make Join with the dimension table and check that the records' foreign key exists there. If not - change the key to UD. In SSIS and Informatica (and I guess that also in other products I don't know, such as DataStage) you can use Lookup instead of Joiner if the dimension table is less that 1G records. That will optimize the ETL process. After you developed all your ETLs, run the dimension processes. After they finish (assuming everything went OK) run the fact table's ETL process. If it succeeded you can go and have a drink. If not - check what went wrong. If you want to know which keys didn't showed up in the dimension table and causes the incompleteness problem, you can disable (not delete) the foreign key from the fact table and run the process again. Then, with a simple SQL query, check which foreign keys don't exist in the dimension table. Go back to your ETL design and check what you did wrong. As I pointed before, in this step you might be very angry at the system developers...

That is all for now. As I said, I assume that more ideas will come on in the future.

Sunday, October 14, 2007 7:11:49 AM (Jerusalem Standard Time, UTC+02:00)
 Sunday, September 30, 2007
This post is about Panorama because it is the UI tool I'm working with, but this can be made with every BI UI tool.

My customer wanted to get the effect shown by Analysis Services 2005 when browsing a dimension (see the picture below). He wanted to see some properties of the members shown in the rows, along with the usual measures. Unfortunately, Panorama (and I'm sure that also other tools) does not have this option in the GIU. The solution is this code:

Create Member CurrentCube.[Measures].[MyProperty] as
  iif(IsLeaf([MyDimension].[MyHierarchy].CurrentMember),
     [MyDimension].[MyHierarchy].CurrentMember.Properties("MyProperty"),
     Null)

Note that declaring only the third row will cause that every member that is not a leaf will cause an error, which is something we don't want the viewer to see. If the dimension has properties for members in other levels too, you can adjust this decleration. This member can be declared either in the DataBase's Script (after the CALCULATE expression) or inside the session/query (not recommended in Panorama). Now, all you have to do is to show the dimension's members in the rows and this new measure in the columns (after or before the regular measures), and you'll get what you want.

Monday, October 01, 2007 3:47:12 AM (Jerusalem Standard Time, UTC+02:00)
 Monday, September 24, 2007
My friend, Ilya, had a problem in SSIS. He had a .csv file with too many commas. The meaning is that strings that started and ended with inverted commas (") and had commas inside it were recognized by SSIS as new column. For example, the row:
"My name, is Miky", 200, 10 was recognised by SSIS as four columns instead of three. Ilya wrote down a code for SSIS (in VB) that run before the package begin its work. Here it is, hope it will help who ever seen this.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Text
Imports Microsoft.VisualBasic.FileIO
Public Class ScriptMain
Public Sub Main()
  Dim csvFileFullPath As String
  Dim tabFileFullPath As String
  csvFileFullPath = Dts.Connections("Your CSV Connnection").ConnectionString
  tabFileFullPath = Dts.Connections("Your Table Connection").ConnectionString
  Using tabStreamWriter As New StreamWriter(tabFileFullPath, False, System.Text.Encoding.GetEncoding(1255))
  Using csvFileReader As New StreamReader(csvFileFullPath, System.Text.Encoding.GetEncoding(1255),True)
  Dim currentRow As String

  currentRow = csvFileReader.ReadLine()
  tabStreamWriter.WriteLine(currentRow)
  While Not csvFileReader.EndOfStream
    Dim outputRow As New Text.StringBuilder()
    Dim tmp, tmp1 as String
    Dim offset as Int32 = 1
    Dim beginS, endS As Int32

    beginS = 1
    currentRow = csvFileReader.ReadLine()
    beginS = InStr(offset, currentRow, """")
    While Not beginS = 0 Or offset > Len(currentRow)
      endS = InStr(beginS+1, currentRow, """")
      tmp = Mid(currentRow, beginS, endS - beginS)
      tmp1 = Replace(tmp, ",", " ")
      currentRow = Replace(currentRow, tmp, tmp1)
      offset = endS + 1
      beginS = InStr(offset, currentRow, """"")
    End While
    outputRow.Append(currentRow)
    tabStreamWriter.WriteLine(outputRow.ToString())
  End While
End Using
End Using
Dts.TaskResult = Dts.Result.Success
End Sub
End Class

The solution here is to search for any comma (,) that is between two inverted commas (") and replace it by space.
Although it is a good solution, I would take another solution: Replace any comma by special string, such as &Miky&, convert the csv file into table, and after that go over that column(s) and replace any &Miky& by comma.

Monday, September 24, 2007 7:31:56 PM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, September 09, 2007
I was asked how to get to SSIS log to see how much time took for the package to run.
Well, that depends.
On Development:
When developing new package, after running the process (click on the green arrow or press F5) there's a new tab called Progress. Clicking it will show you everything about the package's execution, including the time it started and the time it finished.

On Production:
When developing the package, open the SSIS menu (Yes, there is a menu called as the product's name. Microsoft...) and click on Logging... There, you can define logs for your package. You can log in many ways: Writing to SQL sever, output file, XML file and more. I recommend logging into SQL server and logging only the big and "hard" parts in your data flow. In the Details tab, pick up only the exceptional events, such as onError, onTaskFailed and onWarning. If you wish to know how much time took for you package to run, also pick up onProgress.

Follow this link to read about every event in SSIS.

Monday, September 10, 2007 6:09:23 AM (Jerusalem Daylight Time, UTC+03:00)
I won't cover here the topic of Exception handling in MDX, but show you a funny thing that I have never seen in any computer language. Consider this MDX code:

iif (1.0e+40 * 1.0e+40 = (1/0), "Overflowed", "Didn't Overflow")*

On some processors, this code will output "Overflowed". That's because this multiplication will overflow and (1/0) also overflows, so what we have here is two "overflow values" that are equal.

Where on earth have you seen something like this???


* Taken from the book "MDX Solutions" second edition, p. 136


 | 
Monday, September 10, 2007 5:54:32 AM (Jerusalem Daylight Time, UTC+03:00)
I'm almost done with my exams, so my writing can continue.

This post is not about how to customize your Dashboard (well, not only about it). Its purpose is to say it loud: Customize your Dashboard!
When the executives of your company (Yeah, I guess you work in a company. Does someone building Bi Portal for himself?) see the customized gauges with their company logo on it, they'll love it. No matter what these gauges will show them, you got their attention and their sympathy for the Dashboards site you made. Now, everything is easier. The bosses are in your hands.

For the Panorama NovaView users:
  1. Follow this link to learn how to do this.
  2. Do NOT start working before you backup your E-BI/KPI folder !
  3. I recommend using Notepad++ or another good XML editor when writing in the XML files. Otherwise, you can mix the whole file and you'll have to start all over again.

 | 
Monday, September 10, 2007 5:30:52 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, August 11, 2007
We've been working for a while to enable SSO in our Panorama's Dashboard site. In a matter of fact, the responsibility for this was under the skilled hands of our system team. After a short time they succeeded and SSO was established in our site. We saw it when we entered the site: Instead of login page we directly entered the dashboard page.
After a few days, when I entered into the settings section of the dashboard site, I saw this:



Yes, that's right. No security at all. This is why we entered directly to the dashboard page instead of the login page...
The system team claims that they never said that the SSO succeeded and we say they did. No one will prove he's right, so there's no one to blame. But blaming is not everything. The important thing here is to learn for the next time: When you think you got a feature - check it. Things not always as they seems to be.
Sunday, August 12, 2007 5:52:53 AM (Jerusalem Daylight Time, UTC+03:00)
 Wednesday, August 01, 2007
While reading the first chapter of the book "MDX Solutions With MS SQL Server Analysis Services 2005 And Hyperion Essbase", I wrote down some important notes, especially for the MDX beginners. Even if you're experienced user, check this out. You may find something useful.

  • If you were a code programmer in your past, you can relax: MDX don't care about capitalization.
  • Don't even try to skip an axis: It's impossible and it is meaningless. Use the predefined names for the axis, such as: columns, rows, pages, etc.
  • You're new to MDX and the whole OLAP gives you a headache? Try to imagine this as a hypercube. It can help you a lot.
  • When writing large queries, pay attention to the "readability" of your MDX. Use the Monospace fonts whenever possible.
  • Do NOT think of SQL when learning or working with MDX. Although the syntaxes may look alike, these languages are totally different when you get to know them.
  • .Members will give you all regular members. .AllMembers will also include calculated members.
  • An expression like [Time].Members won't work if the Time dimension has multiple hierarchies.
  • The asterisk (*) can replace the CrossJoin function. It may improve readability of the code.
  • When using Order() function, you can specify a sorting criteria which is not shown in the result grid.

 |  | 
Thursday, August 02, 2007 3:04:47 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, July 19, 2007
For some reason (and don't ask me why), the Panorama NovaView's documentation doesn't contain any information about some of the most important settings. Here some of them, hoping that this will help many users:

  • In the Panorama Web Access web site, by default a user can  save his views (after he made his modifications) only in his private book. To enable him save his views in the Briefing Book (which means - the public book), do the following: Inside the Panorama server, open the Registry Editor (Start -> Run -> regedit) and go to the path HKEY_LOCAL_MACHINE\Software\Panorama\Nova View 5\Admin. Add a new string value named "PublicBookAdmin". As its value, enter all the users you want to give them the option to save their views in the Briefing Book by this template: <User1Domain>\<User1Name>,<User2Domain>\<User2Name>, etc. For example: panoramaDevServer\PowerUser, MSHOME\Miky.
  • The subscriptions web part will show you only the views that you registered to, but by default no one can register himself to the views. What you need to do is to open the Registry Editor in the panorama server, go to the same path as mentioned above and add a string value named "ShowSubsAndAlerts" with the value 1. After that, every user will be able to right click on any view in the Panorama Web Access web site, click on Register and it will be added to the subscriptions web part for him.
  • For some users, the loading animations which are shown before every applet appears in the Dashboard website will stay forever, meaning that the user will never see the dashboard itself. I think this has something to do with the Java or Microsoft VM of the user. Anyway, a nice workaround is to cancel this animation. To do this, enter the path C:\Program Files\Panorama\E-BI\Dashboard\include (replace the beginning if you installed to panorama software in a different location) and inside the Config.asp file, change the constant "ShowAnimationWhileLoadingApplets" value to false. This is a good workaroung because anyway, the applets should appear in a second or two. Otherwise - buy a faster server.
As I go on working with Panorama I'll write some more tips & tricks. Stay Tuned.

 | 
Friday, July 20, 2007 2:20:52 AM (Jerusalem Daylight Time, UTC+03:00)
 Sunday, July 15, 2007
When designing a dimension in Analysis Services, there's a funny button called Add Business Intelligence. Clicking it opens a beautiful wizard which letting you define some basic properties of the dimension such as ordering and enabling writeback. I'll take the writeback feature as an example: When doing this using the wizard, it's taking you through many screen where all you have to do except for clicking next, next is to check a checkbox in one of the screens. That's it. After that, all this long wizard doing it setting a property called WriteEnabled to True. I think that it's a strange software design of Microsoft. Maybe it's for making the product seem more professional. You know - Add Business Intelligence sounds like a heavy operation. Anyway, I don't know what they had in mind.

     
Sunday, July 15, 2007 7:48:56 AM (Jerusalem Daylight Time, UTC+03:00)
I never thought that I'll do a commercial to Microsoft, but Project Real is a great thing that they did and they should get the credit for it. This project is a full end-to-end BI solution, including ETLs (using SSIS), Analysis Services cubes and mining modules, Reporting Services reports, end-user Panorama views and more.

We work with Panorama as our main GUI tool to show our users the cube's data as tables, charts, dashboards, etc, so this project is really helping us to learn how to implement our project from the first ETL step all the way to the last Panorama step.

Recommended.

Sunday, July 15, 2007 7:05:54 AM (Jerusalem Daylight Time, UTC+03:00)
 Saturday, June 02, 2007
knowing the rules doesn't mean you know how to play.
A great post in the Panorama blog with a riddle in MDX. No knowledge in MDX is required for this riddle, because they teach you what you need to know to solve this riddle.
Have fun. Believe me - you will.

Update: Look at Mosha Pasumansky's blog for another review at this MDX riddle. He claims that the answer in Panorama's Blog is not complete. While I'm new to MDX, I understand that their answer is not 100% complete, but I think it's enought in order to make their point.

 | 
Sunday, June 03, 2007 5:43:35 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, May 31, 2007

In the data source view, when you edit the SQL by yourself (right click on the table in the data source view, Replace table -> With new named query) be careful when using Oracle DB tables. When writing in the SQL the Oracle's table name with small casing, the SQL parser will add commas to the table name, making the SQL not work because the Oracle does not recognize this table name (with the commas).

Solution: Enter the table name with big casing, which will make the parser leave the table name as is. Also, remember: when creating or editing a named query always check and syntax, but also run the query and check that you get the desirable result before hitting the OK and saving the new named query.

by the way, I wonder: what were we doing if the Oracle was case sensitive?

Friday, June 01, 2007 6:36:37 AM (Jerusalem Daylight Time, UTC+03:00)

Few months ago we were given an assignment to copy/move all our DTSs that were running with SQL Server 2000 to the new SQL Server 2005 Integration Services (SSIS). My friend Michael did it and wrote some important notes that he discovered when building ETLs with Integration Services. I decided to list them here because they are important and useful, especially for those who haven't got the time to develop with SSIS so far.

  • One of the greatest improvements in SSIS is that between the source and the destination of the ETL process you can do many things, such as making new fields, sorting, converting data types, union all between different sources, implement your logic on a field, and much more. This is much easier than ever because all you need to do is to add a block to the data flow task and define it for your purposes.
  • SSIS ships with a tool for migrating SQL Server 2000 DTSs. Do not use this tool. Sometimes the result of the conversion is not good enough and in all cases you can't edit the new migrated data task.
  • When making a connection to a non-Microsoft DB, such as Oracle, use OLE DB client instead of the out-of-fashion ODBC.
  • When the destination field is shorter (string type) than the source, add a data conversion block and cut the string. Otherwise, there will be an annoying warning even if the truncation is wanted.
  • In many times (when working with non-Microsoft providers) the automatic recognition of the length & types of the source fields is not correct. Enter the source block and edit these properties by yourself.
  • When moving a Unicode field (data type DT_WSTR) to a non-Unicode destination field (DT_STR), a data conversion block is required.
  • SQL Server 2000 Stored Procedures will work in SSIS, but the Linked Servers definitions are problematic. Consider another options rather than using linked servers.
  • When the source/destination is a CSV file, use Flat File Connection. But if it is an Excel file (.xls), use a Microsoft Jet OLE DB connection and define the source as OLE DB Source (yes, it will work with Excel files).

Again, thanks for Michael for making and sharing these notes.

Friday, June 01, 2007 6:17:10 AM (Jerusalem Daylight Time, UTC+03:00)

For some weeks we were fighting with (or against?) the Panorama software in order to make it work right and show a nice pilot of BI dashboard screen to our managers. After three weeks we managed to show a good opening position by building a nice dashboard screen including a map, graphs, gauges and crosstabs.

I found that the installation of the SQL Server and the Panorama server was not good enough in my company, so I decided to try it for myself. The installation of the SQL Server is quite easy (Next, Next, Next ...), but installing the Panorama server is a complicated process. Paying attention to so many small details, knowing and remembering what to do inside the Windows server, IIS, Windows services, Windows registry and more is not so easy. Finally, after two days I managed to do this. Now I know that some things in the installation in my work place were not so good and I can point them out.

You can see the exciting (for me, at least) result in this picture, as it is a nice dashboard taken from my screen. Many posts about Panorama and SQL Server 2005 will come ahead. Now I can relax - Panorama is on the way...

Friday, June 01, 2007 5:09:10 AM (Jerusalem Daylight Time, UTC+03:00)
 Thursday, May 10, 2007

I decided that I should blog about the world of BI.

My company has just bought Panorama's NovaView, which is a BI tool that focus on the UI level. this tool shows beautiful dashboards, including metrics, charts, cubes and more. I think that it worth looking - it's just beautiful.

So, as a beginning: What is BI? the google definition ("define:BI") says: "Technologies that help companies make better business decisions". I think it summarize it up quite well, but I can add one more thing: Technologies that make/help managers and decision-makers to see, or to understand much better what they have in their hands, what have they done so far and what can they do with their resources in the future.

What I'm learning right now is the Panorama tools, advanced MDX and some other stuff.

Wish me good luck.

Thursday, May 10, 2007 7:37:16 AM (Jerusalem Daylight Time, UTC+03:00)