Thursday, March 04, 2010

Our users had a weird requirement: They wanted to search all the columns in one action. We thought how to do it. In this post I'll describe the solution. Note that this solution is not good in terms of performance, but this is the best we've got.

  1. In every report (/answer/request) you develop, add a column and call it SearchColumn. Define its formula as the concatenation of all the columns in the report. You also need to cast the numeric columns to char in order to concatenate them. For example: FirstColumn || cast(NumericColumn as char) || LastColumn. To make things easier, I written a little C# program that takes the request XML and outputs this column's formula. It's a very simple program so I won't put it here (unless someone will ask for it). You can write this simple program in every language you know.
  2. Hide the column (Column Properties -> Column Format -> Hide).
  3. Create a filter based on this column. In the operator drop-down-list, choose "contains any". Then, click on Add -> Variable -> Presentation. Enter Term in the "Variable Expr" box (Term will be our variable name) and enter space in the (default) box.
  4. Now, we'll let the user to enter the search term. Create a new Dashboard Prompt based on any column you'll like. In the operator, choose "is equal to / is in". Control - "Edit Box". Show Default To - "Variable Expression" and enter Term. Set Variable - "Presentation Variable" and enter Term here too. Label - "Search", for example.
  5. All you have to do is to add the prompt in every page you want to let the user search.

Note that the search term will stay until the user will clear the prompt field. As I said, this solution is problematic for number of reason (performance, "why the other page filters too?" and maintenance).

 |  | 
Thursday, March 04, 2010 10:29:14 PM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, March 03, 2010

This is quite simple, so I'll let the JavaScript talk by itself (attached). I assume that you know how to add your JS to OBIEE. You need to call the function hideTabs from every dashboard page and edit the hiddenTabs array.

This was written on version 10.1.3.4 of OBIEE. Note that if the DOM will be changed in a future version it won't work.

hideDashboardPages.txt (2.03 KB)
 |  | 
Thursday, March 04, 2010 3:51:32 AM (Jerusalem Standard Time, UTC+02:00)

Non Empty Behavior is a very good optimization method you can use with your calculated measures. It allows you to define a list of measures that their emptiness defines the emptiness of your calculated measure. To understand this, look at the picture (taken from the MSDN):

This is the calculation expressions pane from the Calculations tab in the cube designer. From MSDN:

"The measures you specify in the Non-empty behavior list are used to resolve NON EMPTY queries in MDX. When you specify one or more measures in the Non-empty behavior list, Analysis Services treats the calculated member as empty if all the specified measures are empty. If the Non-empty behavior property is blank, Analysis Services must evaluate the calculated member itself to determine whether the member is empty."

This is not new. NEB is quite old feature in SSAS. The new thing (at least for me) is that NEB can also be used in Panorama formulas! For example, M1+M2 /*NONEMPTY_BEHAVIOR={M1,M2}*/ will apply M1 & M2 as NEB. You can see more Panorama optimization methods here.

 |  | 
Thursday, March 04, 2010 2:01:11 AM (Jerusalem Standard Time, UTC+02:00)
One of the basic requirements of a modern BI system or Data Warehouse is to be non-volatile, meaning that the DWH will save the history. In this post I'll explain how to implement it in the DWH while assuming that the history can be changed and (the more interesting part) how to implement it in Analysis Services.

Before starting with history saving in the DWH, I need to say that this has nothing to do with Slowly Changing Dimensions. You can work with Type 1, 2 or 3. All you need to take care of is that the fact table will be consistent with the dimension tables. Assuming that the history can change (this is the scenario in my organization), we need to extract all the fact data every time. If you work with staging level/environment/layer you can use it to calculate which records has been changed or inserted and only upsert them to the DWH itself (it's more secure). In our case, we'll work without such staging level so we need to load all the fact data in every load to the DWH.

Loading the fact in the ETL is very simple - we'll save the datetime of the execution start, round it and put it in every row in the fact table. We can call this field CreateTime. You probably ask yourself why we need to round it. The answer is that we don't want the time dimension to have every possible value of datetime. It's huge and the users don't need that kind of accuracy. That's why I round it in my projects to half-hour, meaning that 14:27 becomes 14:00 and 14:50 becomes 14:30. Let's look at an example of such fact table:

RowNum

Dimension1

Dimension2

Measure1

Measure2

CreateTime

1

X

A

4

20

03/03/2010 15:00

2

X

B

56

19

03/03/2010 15:00

3

Y

B

2

3

03/03/2010 15:00

4

Y

B

1

10

03/03/2010 15:00

5

Y

C

7

5

03/03/2010 15:00

6

X

A

4

20

03/03/2010 15:30

7

X

B

56

40

03/03/2010 15:30

8

Y

B

2

3

03/03/2010 15:30

9

Y

B

10

10

03/03/2010 15:30

10

Y

C

7

5

03/03/2010 15:30

Here we can see two executions of the ETL: The first between 15:00 and 15:00, the second between 15:31 and 15:59. Note the differences in rows 2 vs. 7 and 4 vs. 9. This means that the source table(s) has changed between the executions.

In the DWH, we create foreign keys from Dimension1 and Dimension2 to their dimension tables. We'll also create FK between CreateTime and the time dimension. I won't get into the time dimension design here, you can read about it a past post. I assume that the time dimension has the hierarchy Year -> Month -> Day -> Hour -> Half-Hour. It seems that now we have a good fact table for use in Reporting Services (or any other reporting tool). Create a parameter called Extraction Time/ETL Time and assign it values from distinct on CreateTime. The user can pick a date and the report will filter on this date. Perfect.

The problematic part starts in Analysis Services. Only one extraction time can be the current member. There is no sense in summing or aggregating more than one extraction time. There are two ways to deal with it:

The wrong way - I'm writing this down so you'll learn from this design mistake. We created a calculated member which takes the last extraction time:

Create Member CurrentCube.[Extraction Time]. [Calendar].[All].[Last Data Time]
AS
Tail ([Extraction Time].[Calendar].[HALF HOUR].Members,1).Item(0),
Visible = 1;

In every Panorama view, we sliced the Extraction time on the [Last Data Time] member, which will show the user the data of the last ETL run. Our basic assumption was that the user wants to see the current data and when he'll want to see the historical data he will pick another member from the Extraction time dimension. This is the point we missed and it soon  enough bumped into our face.

We forgot that every system, even BI system which is built for analysts should be intuitive and idiot-proof. The users picked members from the day level in the Extraction time dimension. When we asked them why they did it they said that "we wanted to see the data from 03/03/2010 and suddenly the numbers got bigger!". That happened, of course, because the member 03/03/2010 is the sum of 03/03/2010 15:00 and 03/03/2010 15:30. The solution is that no matter which member the user picks in the Extraction time dimension, he should see only one leaf (half-hour level) member at a time. This takes us to

The right way - We set the [Last Data Time] member to be visible = 0 (I kept it for debug purposes) and I written MDX script which I call NonLastEmptyLeaf. This script makes sure that no matter which member in the Extraction time hierarchy will be picked, the user will be sliced only on the last non empty leaf. For example, if the user picks 03/03/2010 he will see the data of 03/03/2010 15:30. This is what will happen also when he'll slice on 03/2010, 2010 and even the [All] member. This is the MDX script:

Freeze([Extraction time].[Calendar].[All]);
Scope(
  Descendants (
   [Extraction time].[Calendar]. [All],
   0,
   SELF_AND_AFTER
  )
);
  this = Tail (
   Filter(
    Descendants(
     [Extraction time]. [Calendar].CurrentMember,
     [Extraction time].[Calendar].[HALF HOUR],
     LEAVES
    ),
    [Measures].CurrentMember > 0
   )
  ).Item(0);
End Scope;

Now, there is no meaning of [Last Data Time] member because slicing on the [All] member will show the current data, meaning that no slicing at all will show the current data because [All] is the default member.

To conclude, history browsing can be dangerous if we let the users the option to slice on more than one data time. The open question that I have is how can I prevent the user from picking more than one member in this hierarchy in Panorama views? Should I even try to do that or should I count on his minimal common sense?

Wednesday, March 03, 2010 9:34:48 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, February 14, 2010

I knew for a long time that measures can have text and not only numbers, but last week I had a scenario where it was the perfect solution for my problem.

Let's say that I work in number of jobs. In every job I got some tasks that I need to do. The fact table looks like this:

Job

Task

Tasks To Do

Tasks Completed

Developer

Develop BI System

1

1

Developer

Help Friends

1

0

Manager

Manage

1

1

Therapist

 

0

0

Summarizing the "Tasks To Do" column will get the total tasks that I need to do. Summarizing the "Tasks Completed" column will get the tasks that I did. Note that in the Therapist job I didn't get any tasks.
In the OLAP cube, I created a calculated measure which is the percent of the completed tasks (for every job or for any other dimension). The problem is: What is the answer in case that there are no any tasks to do? Some will say 0% and some will say 100%. That's why a text measure such as "There are no tasks" can be a perfect solution for this kind of problem. This is the calculation of the calculated measure:


Create Member CurrentCube.[Measures].[Completed Percentage] as
IIF(IsEmpty([Measures].[Tasks To Do]),
  null,
  IIF([Measures].[Tasks To Do] = 0),
    'There are no tasks',
    [Measures].[Completed Tasks] / [Measures].[Tasks To Do]
  )
)
This scenario can happen in many cubes. Tell me if you have any other solution for this kind of problem.

Update: Read the important comments below.

Sunday, February 14, 2010 9:49:00 PM (Jerusalem Standard Time, UTC+02:00)
 Tuesday, January 19, 2010
This is a basic post which I'll assume you know in the posts to come.
Basically, There are two ways of adding JS into the OBIEE dashboard:

If the script is for one page only - Enter the dashboard editor and edit your page. Add a new Text object from the bar on the left. It doesn't matter where you add it in the page. Click on its Properties button and write your script in the text box. Don't forget to start it with <script> and end it with </script>. Check the "Contains HTML Markup" box. There is no need to hide this object because it doesn't contain anything visual. Rename the text object and give it a descriptive name.

If the script is common to more than one page or may be in the future - Create a new request which contains only one column. It can be any column from any table, but I recommend you to use a column which doesn't have many rows (you can use a small dimension if you have one). Click on the Results tab and enter the Narrative view. Write your script in the Narrative text box. Like before, don't forget to start it with <script> and end it with </script>. Check the "Contains HTML Markup" box and enter zero in the "Rows to display" text box. You can save your request now. Every dashboard page that contains this request will run the JS you just written.

How to develop correctly with JS in OBIEE
Don't enter bunch of code in the request. It'll only get you messy once the length of the code will grow. Your code inside the dashboard should only call functions. For example, it can look like this:
<script>
if (myFunction)
{
  myFunction();
}
if (mySecondFunction)
{
  mySecondFunction();
}
</script>

You will write and maintain these functions in external JS file. Take the common.js file (search it under the OBIEE dir in the server) which is called in every page in OBIEE dashboards (don't forget to backup first!!) and add your code in the bottom of the file. Now you have a convenient place where you can edit your JS code. There is one last thing you need to know before proceeding - the browser caches the JS files so you'll need to clear the browser's cache every time you change your code. In Internet Explorer, you can do it in Tools -> Internet Options -> Delete Files -> Delete all offline content. You'll do it a lot while developing, so having a button that does that will be great. You can use Microsoft's IE developer toolbar or another developer toolbar.

Now we can apply new features to OBIEE.

 |  | 
Tuesday, January 19, 2010 11:46:07 PM (Jerusalem Standard Time, UTC+02:00)

This tip can save you a lot of time because when you don't follow it or you're not aware of it, finding the bug will take you a lot of time.

When you build your workflow or worklet in Informatica, you usually define dependencies between the tasks, meaning that task A will start only after task B finishes. You do it using the "Link Tasks" button in the toolbar. The issue is when you define for example that task C will run only after both task A and task B will finish. In that case, linking the tasks is not enough. You need to edit task C and choose "AND" in the "Treat the input links as" box. The default is "OR", meaning that the task will run when either task A or task B finishes. This can cause you a lot of trouble if task C is using data from task A and B (otherwise, why there's dependency between them?).

Tuesday, January 19, 2010 11:40:13 PM (Jerusalem Standard Time, UTC+02:00)
 Wednesday, January 06, 2010
While developing reports in OBIEE, I found myself making the same action over and over again: Styling a column, clicking on "Copy Cell Format", going through all other columns and pasting the format. A better way to do this kind of actions in one shot is by changing the Request XML. Let's say you applied style to the first column and you want to copy it to the other columns. This is how you'll do this:
  1. In the report editor, click on the Advanced tab.
  2. Copy the Request XML.
  3. You'll need to edit the xml in a text editor which supports macros. My favorite is Notepad++. Paste the xml in the text editor.
  4. Now you'll have to learn by yourself how the XML is built. Don't worry, it's very simple. Note that each column is a xml element called saw:column and the style is stored in its child element called saw:displayFormat.
  5. Start recording the macro and copy the desired elements to the next column. After that, stop the recording. Pay attention to where the keyboard cursor is before and after the recording.
  6. Place the keyboard cursor in the desired place and run the macro several times as much as the columns you need to update. The shortcut to do it in Notepad++ is Ctrl+Shift+P.
  7. Paste the xml back in the report: Go back to the page where you copied the xml from, paste the new xml and click on "Set XML". That's it.
Learning the schema of the Request XML, you can do many other customizations to your report in one click. You'll see another example in my future post about searching all columns in one click. Thinking about this approach, you may find that it can help you to make your work shorter and cleaner.
 |  | 
Wednesday, January 06, 2010 6:33:30 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, January 04, 2010

One of the (many) missing features in OBIEE is record/row count. In this post I'll show how to implement record count and show it below the table, or report in the OBIEE dashboard. I know that you can find this in other blogs and forums, but I intend to show more sophisticated methods in my next posts that will be heavily based on this, so I recommend you to read it first.

  1. Create the record count column in your report: click on any column in the sidebar to add a new column in your report. Edit its formula using the fx button. In the formula text area, enter Max(RCount(XXX || YYY)), where XXX and YYY are two fields which define the key, or lowest granuality of the report. In other words: The combination of XXX and YYY will not show more than once in the report. You can add more columns if you need (column adding is made using the Column button in the "Edit Column Formula" window). If the type of the column is not char, wrap the column with cast(XXX as char).
  2. Check yourself: Preview the report. The last column should contain the record count. The value is the same in every row. If it's not working, go back and fix the formula.
  3. Hide the formula: Click the "Column Properties" button in the column (the button with the hand), go to the "Column Format" tab and check the Hide checkbox. Now, check and remember the location of the column in the report. In other words, remember if it's the fifth column or the tenth column, for example.
  4. Create the narrative: Click the Results tab and in the drop-down list choose Narrative. In the prefix, enter: "Row Count:" or something like that, and in the narrative itself, enter: <div id="RecordCount">@XX</div>, where XX is the location of the new column we created. In rows to display, enter 1. Return to the compound layout and look how the narrative below the table looks like. You can return to the narrative and change its style if you don't like it. Just leave the RecordCount div as it is. We'll use it a lot in the upcoming posts.

Problems with this implementation:

  1. The count is made in the BI Server after all records are coming from the database. If you'll check query time with and without the record count, you'll see a big difference.
  2. The record count is getting scrambled if the user sorts the table.

I'll post here any updates, if there will be any, about these problems.

 |  | 
Monday, January 04, 2010 7:07:41 PM (Jerusalem Standard Time, UTC+02:00)

For documentation purposes: When you get "Unexpected error occured: Error in Application" error in SSAS while trying to view the cube's calculations tab, you'll need to close BIDS and do some file copy tasks.

Monday, January 04, 2010 6:22:08 PM (Jerusalem Standard Time, UTC+02:00)
 Sunday, January 03, 2010

It's been a while since I last posted here. It's because it was the end of the year with annoying deadline and a new reporting tool we needed to learn - OBIEE. We learned a lot about it and even extended it, so the next posts will be about that. I'll start with a collection of tips & tricks we collected over the development process. Some tips will be helpful only to newbies, where the others will be also relevant to experienced developers. I divided it to two sections: The administration program and the webby part. Have fun.

The Administration program

  • There is no way to auto arrange to ERDs of the physical and the Business model (aka logical) layers. The best approach to make things more simple is to finish the tables import in the physical layer and arrange it by yourself: choose to show all tables, zoom to 50% and move the tables so it will create a nice and clear star schema. after that, just print it out and sent it to your co-developers.
  • Naming standards are, as always, very helpful, especially when working as a team. A foreign-key column (non measure) name should end the same for all columns in the project. Measure field name should end with _sum, _count, etc.
  • In order to develop in the administration program, the developer should be defined as administrator...
  • The "check out objects" dialog box will get annoying after few hours of development. To get rid of it, go to Tools -> Options -> General, and select "check out objects automatically". Pay attention to what you do from now on, because even viewing a table's properties will check it out.
  • You can always convert a table to a view in the physical layer. Right click on it, properties, and choose "Select" in the table type. Then, write the SQL query.
  • When working online against the repository and making an action against the DB (such as row count or view data), the connection to the DB is made using the BI server's data source. Only when working in offline mode the DB connection is made using the local ODBC.
  • When dragging a physical table into the logical layer it won't save the tables relationships. Only when dragging a number of tables into the logical layer the relationships will also be imported.
  • Circular relationships between the tables are not permitted in the logical layer but permitted in the physical layer.
  • Always use complex join in the logical layer. The BI Server supposes to understand in which physical relationship to use. Change it only when it doesn't.
  • When defining one-to-many join in the ERD you drag the line from the single to the many table.
  • In a dimension with a hierarchy (not flat one), the lowest level should contain the table's key column. Every column which won't be associated to a level will be automatically associated to the lowest level.
  • In 90% of the cases, all the measures will have sum as aggregation rule. You can define it in a single shot by selecting all the measure columns, right click on them and choose "Set aggregation".
  • The best practice is to build a dimension over any dimension-table, even if it's a flat hierarchy. Also, you should connect each fact table to its levels in the dimensions. You enter into the fact's source's properties and in the content tab you select the logical level for every dimension which is connected to the fact table.
  • You can quickly build a dimension by right-clicking the dimension-table and choose "Create dimension". It won't always be good enough, but it's not so bad.

The Webby part (Answers & Dashboard)

  • The error "Incorrectly defined logical table source (for fact table XXX) does not contain mapping for YYY" means that the BI Server cannot reach from table XXX to table YYY. It can happen because of two reasons: The first one is that there are no relationships defined in the logical or the physical layers (or maybe both). Note that the connections can be indirect. The second one is that the relationships doesn't makes sense. For example, going from a fact table to a dimension table through many-to-many table cannot happen. You'll see this error many times as a beginner and it can be quite frustrating at first. You need to understand that the BI Server can only understand simple and classic star schemas. More complex ERDs need to be split by using alias tables.
  • While developing a complex report (answer, request, you name it) with many columns, save it from time to time. Don't forget that you're working with a webby application. Closing the window won't save your work and won't alert you if you didn't.
  • After styling a column in the report, you can copy the style to other columns. In the styling window there's a "copy format" button. You can click on paste in the other columns' style windows.
  • In order to make a customized report, you can use the Narrative view. To insert a column's value, use @ and the column's location. For example, @2 is the placeholder for the second column. You can also apply javascript in the narrative to make dynamic columns and effects.
  • When you add a column in a fact-based report, bring it from the dimension table and not from the fact table. This will help you when you'll make navigation from this report to the dimension-based report.
  • The last button () in the navigation toolbar won't go to the last page as expected but will bring all the rows. In a case of a big table, this operation will take a long time and will make the browser freeze.
  • After a change in the RPD you need to update the answers site: Click on "Reload Server Metadata" below to side bar. Then, Settings -> Administrator -> Manage Sessions -> Close all cursors.

This is definitely not the last post about OBIEE. The next posts will be much more interesting and helpful. I promise!

 |  | 
Sunday, January 03, 2010 10:54:45 PM (Jerusalem Standard Time, UTC+02:00)
 Monday, November 09, 2009

In both SSIS and Informatica, the union transformation doesn't delete duplicate rows. In SSIS it's clear because the transformation is called "Union all", which reminds us the union all statement in SQL which doesn't remove duplicate rows. The confusing part is in Informatica, where the transformation is called Union although it doesn't remove duplicates. One of my teammates got confused this week so I thought it's worth blogging about.

One more tip about ETLs with union transformations: When you'll investigate, maintain or fix ETL it will be much help if you'll know where each row came from. That's why I recommend adding a column named "src" or something like that and store there the source of the row before the union happened.

Monday, November 09, 2009 11:41:31 PM (Jerusalem Standard Time, UTC+02:00)