Jun 112014

It’s called synonym both in Oracle and in SQL Server – a shortcut to another table.

The only certain thing in information systems is the consistent change. You should always prepare for any change that may occur and develop your ETL to be future-changes-driven. One of the ways of doing that is to set your source to synonym and not to a real view or table. This will allow you to change the synonym to point to another table, synonym or view when you need it.

You may think that a view can do the same thing, but the problem is (and I can say it after many years of ETL development) that after some time the developers will always want to change the view and after some time it will become invalid for a reason (some error in function, another table that has been dropped, DB link that became invalid, etc.). The big pro of synonym is that you can’t do it there. meaning, the con of synonyms is a pro for us.

 Posted by at 8:46 pm
Jul 082013

This post is for those who moved from Informatica to DataStage, as I did.

When you want to extract your data from a single source to multiple flows or targets, you should forget what you did with Informatica. There, you could easily connect the source to multiple flows and that’s it. In DataStage, every source stage’s output has a different tab in the source stage’s properties. Defining the same output (table name, WHERE clause, etc.) twice will make the SQL run twice in the source DB (!). As I said, the way to do this is different in DS: You should create a single output on the source stage and then create a Transformer stage with multiple outputs.

When I first encountered this method, it seemed preety ugly compared to Informatica. After I moved on with it and got some experience with DS, I found it has a great advantage: You can use the Transformer stage to define a different constraint to every output. In Informatica, you had to make a different Filter to every link. In DS, this “partition logic” is centralized in one place.

 Posted by at 9:55 pm
Jun 132012

DataStage contains a very impressive repository search feature. Say you want to search for jobs that use a certain table, you can do it with few clicks. The problem is that the repository search only looks at table names you typed in the “Table Name” field (in the source or target stages). Usually, we use “Custom SQL” and type in our own SQL queries, and DataStage doesn’t scan it when you look for tables in the repository search. So, what can you do?

The trick I found works in many technologies, not just DataStage. You can export the whole project’s jobs to XML file (right click on the root of the project -> Export). This XML will probably be a big one (90M in my case), so notepad will get stuck while trying to find an expression in the file. You can use Notepad++ or Textpad and you’ll find your table in seconds even if it’s in Custom SQL.


 Posted by at 8:50 pm
Jun 112012

The best thing we can make from our mistakes is learning, but it’s much more fun to learn from others’ mistakes, so you better listen. These two mistakes were made by different organizations, but it can happen to all of us.

First Story

The organization installed Oracle’s BI suite in a cluster which consists from two servers. Oracle Enterprise Management (EM) is their basic monitoring tool, so they decided to install in “on the way”. The only problem was that they installed it on one of the cluster’s machines. Now you can guess what happened.  Unfortunately, the machine with the EM went off in the middle of the day and the IT staff wanted to see the error logs, of course. The only problem was that the EM was down also. All they could do is the bring up the machine and look in the logs between the failures. The Lesson: You must separate the monitoring system from the systems they monitor. They can’t reside on the same physical machine.

Second story

The other organization created its own monitoring system. When an ETL process starts, it notifies the monitor that it started and it makes the same thing when it ends. The monitor system awakes once in a while and checks for processes that started and didn’t finish for a long time (depends on the process, of course). The IT people got used to enter the monitor every morning and check for processes that didn’t finish. The only problem was with a process that didn’t start at all  so the monitor didn’t know that there’s a problem at all (!!). It took a couple of weeks for the users to find out the data is not up-to-date and it wasn’t nice. The Lesson: You must separate the monitoring system from the systems they monitor. The process shouldn’t notify the monitor. The monitor needs to check the system independently.

I’ll be happy to read more stories of the same manner. Feel free to comment.

 Posted by at 9:38 pm
Jun 112012

As you can see, the last post was written almost half a year ago. I asked myself why and got to the conclusion that the technical issues which involves writing makes me postpone the writing more and more. I decided to try the famous 5 minutes installation of WordPress and it actually worked, especially with GoDaddy hosting services, which makes the installation for you.

These instructions helped me a lot and I moved all the posts, comments, images from dasBlog to WordPress in minutes. The only problem was moving the permament link of dasBlog. I tried WordPress’ redirection plugins but for some reason they didn’t work. The solution was a simple web page which replaces the old blog which makes 301 (moved permamently) responses. This way, the permalink moves the user to the new WordPress blog.

One more tip: before moving the posts from any blog framework to another, rewrite the images (and any resources) urls.

I hope you’ll enjoy the new look and most important – the new content.


 Posted by at 9:11 pm
Sep 142011

You must be wondering why this blog wasn’t updated for a long time. The reason is that I started a new job. :-)  Along with other stuff, I’m playing with QlikView and I’d like to share some interesting ideas.

In QlikView, The list boxes which filters the entire sheet (or document) is a very common thing. I thought that if we have a box list with a small amount of values, it might be more beautiful to replace each value with an image. For exmaple, let’s say that we have three divisions in the company. Instead of putting a list-box with their names, we can put three images with their logos and make that a click on the image will filter on these values (very simple with the Actions tab). The problem is that the user won’t know on which division the sheet is filtered on. That’s why we need to make the other images more transparent. Using macro, that’s very easy and the result is awsome:

sub SetTransparentImage
  set image = ActiveDocument.GetSheetObject(“ImageIDHere”)
  set imageProps = image.GetProperties
  imageProps.Layout.BkgAlpha = 50
  image.setProperties imageProps
end sub

Now, put that in the macro script inside your document. In order to add images to your sheet, add new Text boxes and set their background to the image you want. In the Actions tab, set the desired value on click and also run this subroutine.


 Posted by at 2:23 pm
Jun 152011

For the google search: Informatica service start and stops

Yeah, that’s exactly what happened to me. Everything looked just fine until the last step of the installation, where the installer pings the new integration service, but there is no answer.
Looking at the services manager, I see that the InformaticaService is stopped. When I restart it, it stops after 5-10 seconds. I thought it’s because of the service’s account settings, but it’s not. After some time of search and re-attempts to make it work, this is what I can say if you’re stuck as I was:

  • While you’re in the installation wizard, on the node configuration step, press the “Advances Settings” button. There, you’ll see the min. and max. port numbers. Go back to the node configuration and change the port number so it will be in that range (why it wasn’t in the first place? …)

  • The Windows PATH environment variable must contain the informatica server binary folder: <INFORMATICA_INSTALL_DIR>\Server\bin

  • If you’re still stuck, look after the service.log file in the informatica directory. Look at the error and google it.
Hope I saved somebody a few hours.

 Posted by at 8:57 pm
Jun 022011

Yes, and it’s great!

Finally, the semantic web is here. This means that from now on, the search engines won’t have to guess what is the content of the (our?) web pages. The answer will be there, behind the scenes.
Technically, we (the web developers) will add new properties to our HTML tags that will describe the content of the text. For example, you can add a property to your DIV or SPAN that will tell the search engine that this is an article, creative idea or a critique.

What is the meaning of this for us, the web developers or independent bloggers? That depends. If you want to make your blog (or site) to appear in more relevant search results, you should add these properties. Yes, you’ll have to manually edit your HTML or to make a template. Of course you don’t have to do this.
For those who want to have a deeper understanding of this subject, go to the common standard site – Schema.org

 Posted by at 8:08 pm
May 232011

After six years of work with SSAS, I decided to start re-reading the best book on the subject – Microsoft SQL Server 2005 Analysis Services. Why read about 2005 when there are more recent versions? That’s because this is the most comprehensive book on the subject and the only one that was written by the SSAS’s development team. The book gives you a better understanding of the technlogy and shows you the motivation for every architectual decision of the creators. While reading, I come across subjects that I wanted to post about and I believe that along with reading the book I’ll post more.

The book cover

At the university, I was taught that half crazy is more dangerous than crazy. The reason is simple: You know that you should always be careful from the crazy guy, but regarding the half crazy person you don’t know when to be careful and when to learn from him. Information systems are the same and especially DWH: There is no half truth or not precise truth. If you know that there’s a data problem – bubble it up and don’t show you user half a truth (because it’s also half a lie).

Every time we bind a DB column to a dimension attibute, a DataItem object is created. The DataItem defines the connection between the conceptual and the physical model. For example, it defines from which table and column the data is taken. Along with other properties, there is one important property called NullProcessing which defines how the server behaves with null data. The default is ZeroOrBlank, meaning that the null data will be shown to the end-user as zero (if the DataType is not string) or empty string (if the DataType is string). The problem is in case where the attribute is a parent in an hierarchy. This is what happened:
We defined “Sales Amount” measure under the hierarchy Category -> Sub-Category -> Product. By mistake, one of the members in the categoty level got null value and it was shown as zero in the cube. The user was used to see the Shoes sub-category under the Clothes category. One day, he seen that the shoes sales amount was decreased dramatically – He seen products under the Shoes sub-category which is under the Clothes category but the sub-total was decreased. In the bottom of the table other products were “hidden” with the Shoes sub-category but with the zero category. The reason is that some of the shoes products were given null categories and in the report the shoes products were split between the Clothes category and the zero category. The user got half truth and he interpreted it as wrong data and that should never happen in a BI system.

What can we do? In my opinion, in most cases we need to set NullProcessing as Error, which will throw an error and prevent us from showing wrong data. This is how we (the IT people) will know about the problem before the end-users.
There are more options to NullProcessing:

  • Preserve – keeps the null value with no replacement. As I shown, this is not recommended. Furthermore, preserve uses more system resources.

  • Unknown Member – replaces the null with the unkown member you defined in the dimension. I recommend using this option only if the users know that the unknown member exists and what it means.

  • Automatic – The server will select the most appropriate behaviour according to the context. I don’t really understand what it means and the writers don’t explain it, but generally it is recommended to define deterministic behavior to your system.

Does it mean that now we’ll go over all our data bindings and change their NullProcessing proprety? Of course not ! We need to think where this behavior has a bad effect and where it doesn’t matter. For example, in a key attribute of a dimension which is also a leaf level of an hierarchy, the ZeroOrBlank option is quite good.

 Posted by at 10:56 am