<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pingback="http://madskills.com/public/xml/rss/module/pingback/" version="2.0">
  <channel>
    <title>Miky Schreiber's Blog - BI|SQL Server 2005|Integration Services</title>
    <link>http://www.miky-schreiber.com/Blog/</link>
    <description>Business Intelligence, Analysis Services, MDX, DataWarehousing and more...</description>
    <language>en-us</language>
    <copyright>Miky Schreiber</copyright>
    <lastBuildDate>Tue, 22 Jun 2010 12:40:51 GMT</lastBuildDate>
    <generator>newtelligence dasBlog 1.9.6264.0</generator>
    <managingEditor>miky@zahav.net.il</managingEditor>
    <webMaster>miky@zahav.net.il</webMaster>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
          <em>This is the fourth post about <a href="http://blogs.microsoft.co.il/blogs/itaybraun/">Itay
Braun</a></em>
          <em>'s seminar in the SQL &amp; BI conference. In this post, I'll write
some notes I collected from the third part of the seminar which discussed SSIS.</em>
        </p>
        <ul>
          <li>
Log everything that happens inside the package. Use the SSIS log providers and the
event handlers. The effect on performance is not significant. 
</li>
          <li>
Analyize the log data. You can analize total execution time, SSAS partition processing
time and much more.</li>
          <li>
Log detailed error information about rejected rows. You can even log the actual error
row.</li>
          <li>
Monitor also the execution of the ETL jobs.</li>
          <li>
Data Profiling is a new feature in SSIS 2008. It's used for analysis of the data's
quality in the DWH. Go and learn.</li>
          <li>
Configure everything you can in the package configuration, such as: tasks, containers,
variables, connection managers and data flow components. You can use XML config
file, config from parent package, registry configuration or even configuration from
a table.</li>
          <li>
The Cache Transformation saves data in the cache. It's very useful in lookups.</li>
          <li>
Data types: Make them as "small" as possible (better performance). Be aware of
precision issues when using money, float and decimal types.</li>
          <li>
Don't sort in SSIS unless it's absolutely necessary.</li>
          <li>
Sometimes using T-SQL will be faster then running it in SSIS.</li>
        </ul>
        <p>
Some of the tips are also relevant in Informatica. For example, I always let the DB
make the sort whenever possible. Sometimes it's also works with joins. Running packages
or procedures can be much more faster than running it in the ETL, but if you look
at it from maintenance point of view you'll do it in the ETL. It depends on much factors,
such as the time you can have to run the ETL, the knowledge in your team and more.
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3" />
      </body>
      <title>Notes from the SQL &amp; BI conference - SSIS</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3.aspx</link>
      <pubDate>Tue, 22 Jun 2010 12:40:51 GMT</pubDate>
      <description>&lt;p&gt;
&lt;em&gt;This is the&amp;nbsp;fourth post about &lt;a href="http://blogs.microsoft.co.il/blogs/itaybraun/"&gt;Itay
Braun&lt;/a&gt;&lt;/em&gt;&lt;em&gt;'s seminar in the SQL &amp;amp; BI conference. In this post, I'll write
some notes I collected from the third part of the seminar which discussed SSIS.&lt;/em&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
Log everything that happens inside the package. Use the SSIS log providers and the
event handlers. The effect on performance is not significant. 
&lt;/li&gt;
&lt;li&gt;
Analyize the log data. You can analize total execution time, SSAS partition processing
time and much more.&lt;/li&gt;
&lt;li&gt;
Log detailed error information about rejected rows. You can even log the actual error
row.&lt;/li&gt;
&lt;li&gt;
Monitor also the execution of the ETL jobs.&lt;/li&gt;
&lt;li&gt;
Data Profiling is a new feature in SSIS 2008. It's used for analysis of the data's
quality in the DWH. Go and learn.&lt;/li&gt;
&lt;li&gt;
Configure everything you can in the package configuration, such as: tasks, containers,
variables, connection managers and&amp;nbsp;data flow components. You can use XML config
file, config from parent package, registry configuration or even configuration from
a table.&lt;/li&gt;
&lt;li&gt;
The Cache Transformation saves data in the cache. It's very useful in lookups.&lt;/li&gt;
&lt;li&gt;
Data types: Make them as "small" as possible (better performance).&amp;nbsp;Be aware&amp;nbsp;of
precision issues when using money, float and decimal types.&lt;/li&gt;
&lt;li&gt;
Don't sort in SSIS unless it's absolutely necessary.&lt;/li&gt;
&lt;li&gt;
Sometimes using T-SQL will be faster then running it in SSIS.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Some of the tips are also relevant in Informatica. For example, I always let the DB
make the sort whenever possible. Sometimes it's also works with joins. Running packages
or procedures can be much more faster than running it in the ETL, but if you look
at it from maintenance point of view you'll do it in the ETL. It depends on much factors,
such as the time you can have to run the ETL, the knowledge in your team and more.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,43cbf1b9-4b77-4d7c-b8e9-5aa2dbf584d3.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services;BI/SQL Server 2008;BI/SQL Server 2008/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=801cefae-7de4-4e1c-9ad4-115c70c2813e</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,801cefae-7de4-4e1c-9ad4-115c70c2813e.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,801cefae-7de4-4e1c-9ad4-115c70c2813e.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=801cefae-7de4-4e1c-9ad4-115c70c2813e</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
          <img style="MARGIN: 0px 1em 1em" height="50%" src="http://www.miky-schreiber.com/blog/content/binary/horseshoe.jpg" width="25%" align="left" />In
both SSIS and Informatica, the union transformation doesn't delete duplicate rows.
In SSIS it's clear because the transformation is called "Union all", which reminds
us the union all statement in SQL which doesn't remove duplicate rows. The confusing
part is in Informatica, where the transformation is called Union although it doesn't
remove duplicates. One of my teammates got confused this week so I thought it's worth
blogging about. 
<br /><br />
One more tip about ETLs with union transformations: When you'll investigate, maintain
or fix ETL it will be much help if you'll know where each row came from. That's why
I recommend adding a column named "src" or something like that and store there the
source of the row before the union happened.
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=801cefae-7de4-4e1c-9ad4-115c70c2813e" />
      </body>
      <title>Union Transformations and Duplicate Rows</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,801cefae-7de4-4e1c-9ad4-115c70c2813e.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,801cefae-7de4-4e1c-9ad4-115c70c2813e.aspx</link>
      <pubDate>Mon, 09 Nov 2009 14:41:31 GMT</pubDate>
      <description>&lt;p&gt;
&lt;img style="MARGIN: 0px 1em 1em" height="50%" src="http://www.miky-schreiber.com/blog/content/binary/horseshoe.jpg" width="25%" align=left&gt;In
both SSIS and Informatica, the union transformation doesn't delete duplicate rows.
In SSIS it's clear because the transformation is called "Union all", which reminds
us the union all statement in SQL which doesn't remove duplicate rows. The confusing
part is in Informatica, where the transformation is called Union although it doesn't
remove duplicates. One of my teammates got confused this week so I thought it's worth
blogging about. 
&lt;br&gt;
&lt;br&gt;
One more tip about ETLs with union transformations: When you'll investigate, maintain
or fix ETL it will be much help if you'll know where each row came from. That's why
I recommend adding a column named "src" or something like that and store there the
source of the row before the union happened.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=801cefae-7de4-4e1c-9ad4-115c70c2813e" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,801cefae-7de4-4e1c-9ad4-115c70c2813e.aspx</comments>
      <category>BI;BI/Informatica;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services;BI/SQL Server 2008;BI/SQL Server 2008/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=03a8f9ca-790a-4e8a-91e3-f25ea1a32c94</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,03a8f9ca-790a-4e8a-91e3-f25ea1a32c94.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,03a8f9ca-790a-4e8a-91e3-f25ea1a32c94.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=03a8f9ca-790a-4e8a-91e3-f25ea1a32c94</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <img style="MARGIN: 0px 1em 1em" src="http://www.miky-schreiber.com/blog/content/binary/policeman.jpg" align="left" />
        <p>
Lucky for me, I have the opportunity to work with more than one ETL tool in my daily
work. In the far past, I worked much with DTS and PL/SQL packages (which is an ETL,
but not exactly a tool). Nowadays I work both with SSIS and Informatica so I
can compare them in several aspects. I don't think that one is better than the other.
It's just that they have different approaches to the ETL mechanism. Anyway, I always
expect that the data transformations will behave the same in every tool and I
found one transformation that behaves differently between the tools - the Router.
</p>
        <p>
The Informatica's Router, if you're not familiar with, receives a data row and
routes it a specific data flow. Its parallel transformation in SSIS is the Conditional
Split. As I said, I always thought that they behave the same. In both transformations
you provide boolean expressions that determine where the row will be routed. The difference
is that in SSIS, "<em>each input row can be sent to only one output, that being the
output for the first condition that evaluates to true</em>" (from SSIS's documentation).
In programmer's words, it behaves like switch-case statement with break in each condition.
In Informatica, <em>"If a row meets more than one group filter condition, the Integration
Service passes this row multiple times</em>" (from Informatica's documentation). Back
to the programmer's language, it's like switch-case statement with no break inside
the conditions.
</p>
        <p>
If you know other transformation that behaves different between ETL tools, I'll be
happy to know.
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=03a8f9ca-790a-4e8a-91e3-f25ea1a32c94" />
      </body>
      <title>Different Router behaviour in SSIS and Informatica</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,03a8f9ca-790a-4e8a-91e3-f25ea1a32c94.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,03a8f9ca-790a-4e8a-91e3-f25ea1a32c94.aspx</link>
      <pubDate>Tue, 27 Oct 2009 14:10:34 GMT</pubDate>
      <description>&lt;img style="MARGIN: 0px 1em 1em" src="http://www.miky-schreiber.com/blog/content/binary/policeman.jpg" align=left&gt; 
&lt;p&gt;
Lucky for me, I have the opportunity to work with more than one ETL tool in my daily
work. In the far past, I worked much with DTS and PL/SQL packages (which is an ETL,
but not exactly a tool). Nowadays I work both with&amp;nbsp;SSIS and Informatica so I
can compare them in several aspects. I don't think that one is better than the other.
It's just that they have different approaches to the ETL mechanism. Anyway, I always
expect that the data transformations will behave the same in every tool&amp;nbsp;and I
found one transformation that behaves differently between the tools - the Router.
&lt;/p&gt;
&lt;p&gt;
The&amp;nbsp;Informatica's Router, if you're not familiar with, receives a data row and
routes it a specific data flow. Its parallel transformation in SSIS is the Conditional
Split. As I said, I always thought that they behave the same. In both transformations
you provide boolean expressions that determine where the row will be routed. The difference
is that in SSIS, "&lt;em&gt;each input row can be sent to only one output, that being the
output for the first condition that evaluates to true&lt;/em&gt;" (from SSIS's documentation).
In programmer's words, it behaves like switch-case statement with break in each condition.
In Informatica, &lt;em&gt;"If a row meets more than one group filter condition, the Integration
Service passes this row multiple times&lt;/em&gt;" (from Informatica's documentation). Back
to the programmer's language, it's like switch-case statement with no break inside
the conditions.
&lt;/p&gt;
&lt;p&gt;
If you know other transformation that behaves different between ETL tools, I'll be
happy to know.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=03a8f9ca-790a-4e8a-91e3-f25ea1a32c94" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,03a8f9ca-790a-4e8a-91e3-f25ea1a32c94.aspx</comments>
      <category>BI;BI/Informatica;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services;BI/SQL Server 2008;BI/SQL Server 2008/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=f590b1f7-4d99-4539-bf47-349c121fd431</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,f590b1f7-4d99-4539-bf47-349c121fd431.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,f590b1f7-4d99-4539-bf47-349c121fd431.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=f590b1f7-4d99-4539-bf47-349c121fd431</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">My friends were stuck with a totally weird
bug this week. After a day of frustration they called me for the rescue. It took me
some time to figure it out and I think that every SSIS developer (and maybe every
developer) can learn a thing or two from others' mistakes.<br /><br /><u>The mission:</u> The data flow takes one table with duplicate rows and copies it
to another table and makes sure that every row will appear only once. In the way,
the data flow also adds some irrelevant fields. Among them is the Create_User and
Create_Date fields which tells by who and when the package last ran.<br /><u>How my friends did it:</u> Again, it's a very simple flow. They only added Derived
Column transformation to add the new fields and then they added an Aggregate transformation
to make every row appear only once. 
<br /><img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregationDataFlow.GIF" border="0" /><br />
Note that this is not the real package. It's a sample I did on my machine to show
it here.<br /><br /><u>The Bug:</u> When I first seen this it seemed to me very simple flow and I asked
myself how can it be that this is happening:<br /><img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregation.GIF" border="0" /><img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregation2.GIF" border="0" /><br />
As you can see, it seems that the Aggregate transformation is not deterministic. Sometimes
it outputs 99 rows, sometimes 198 and in some other times I get other results as well.<br /><u>Investigating:</u> I wanted to see what's the difference between the table that
I got in the first time (99 rows) and the table I got in the second time (198 rows)
so I changed the destination table and compared the two tables. I ran "select * from
A where Column1+Column2+... not in (select Column1+Column2+... from B)"-style query
but it was no use - it showed me that there were no rows that appeared only in one
of the tables. In this step I really started to think (as my friends did) that maybe
the Aggregate transformation has something wrong inside... Instead of blaming Microsoft,
I decided to think. I needed to see what can make the flow non-deterministic. Then,
it hit me.<br /><br /><img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregationDerivedColumn.GIF" border="0" /><img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregationAggregate.GIF" border="0" /><br />
The only non-deterministic component in the flow is the Derived Column because it
has the getdate() function (it may be simple to see here, but in the original package
the derived column transformation had many fields). The results of this function may
differ in the milliseconds, especially for large tables. Then I looked in the Aggregate
transformation and seen that the Create_Date column also was in the Group by operation,
meaning that if two rows has different millisecond they will be placed twice in the
destination table, although they are the same in every column. That's it, the bug
was found. But still, one question remained: Why the query did not show me this? The
answer is also simple but tricky to find: In the comparison query I concatenated all
the columns in the tables in order to compare the results. When I did this, I casted
the Create_Date to nvarchar which truncated the milliseconds. 
<p></p><u>Conclusions:</u><ul><li>
Pay attention to non-deterministic elements in what you do, whether it's code or ETL
process. 
</li><li>
When you do dummy stuff like checking all the checkboxes in a list - think what are
the outcomes. 
</li><li>
Call Miky when you're desperate.</li></ul><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=f590b1f7-4d99-4539-bf47-349c121fd431" /></body>
      <title>Non Deterministic Aggregation in SSIS ?</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,f590b1f7-4d99-4539-bf47-349c121fd431.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,f590b1f7-4d99-4539-bf47-349c121fd431.aspx</link>
      <pubDate>Sun, 21 Sep 2008 22:10:48 GMT</pubDate>
      <description>My friends were stuck with a totally weird bug this week. After a day of frustration they called me for the rescue. It took me some time to figure it out and I think that every SSIS developer (and maybe every developer)&amp;nbsp;can learn a thing or two from others' mistakes.&lt;br&gt;
&lt;br&gt;
&lt;u&gt;The mission:&lt;/u&gt; The data flow takes one table with duplicate rows and copies it
to another table and makes sure that every row will appear only once. In the way,
the data flow also adds some irrelevant fields. Among them is the Create_User and
Create_Date fields which tells by who and when the package last ran.&lt;br&gt;
&lt;u&gt;How my friends did it:&lt;/u&gt; Again, it's a very simple flow. They only added Derived
Column transformation to add the new fields and then they added an Aggregate transformation
to make every row appear only once. 
&lt;br&gt;
&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregationDataFlow.GIF" border=0&gt;
&lt;br&gt;
Note that this is not the real package. It's a sample I did on my machine to show
it here.&lt;br&gt;
&lt;br&gt;
&lt;u&gt;The Bug:&lt;/u&gt; When I first seen this it seemed to me very simple flow and I asked
myself how can it be that this is happening:&lt;br&gt;
&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregation.GIF" border=0&gt;&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregation2.GIF" border=0&gt;
&lt;br&gt;
As you can see, it seems that the Aggregate transformation is not deterministic. Sometimes
it outputs 99 rows, sometimes 198 and in some other times I get other results as well.&lt;br&gt;
&lt;u&gt;Investigating:&lt;/u&gt; I wanted to see what's the difference between the table that
I got in the first time (99 rows) and the table I got in the second time (198 rows)
so I changed the destination table and compared the two tables. I ran "select * from
A where Column1+Column2+... not in (select Column1+Column2+... from B)"-style query
but it was no use - it showed me that there were no rows that appeared only in one
of the tables. In this step I really started to think (as my friends did) that maybe
the Aggregate transformation has something wrong inside... Instead of blaming Microsoft,
I decided to think. I needed to see what can make the flow non-deterministic. Then,
it hit me.&lt;br&gt;
&lt;br&gt;
&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregationDerivedColumn.GIF" border=0&gt;&lt;img src="http://www.miky-schreiber.com/Blog/content/binary/NonDeterministicAggregationAggregate.GIF" border=0&gt;
&lt;br&gt;
The only non-deterministic component in the flow is the Derived Column because it
has the getdate() function (it may be simple to see here, but in the original package
the derived column transformation had many fields). The results of this function may
differ in the milliseconds, especially for large tables. Then I looked in the Aggregate
transformation and seen that the Create_Date column also was in the Group by operation,
meaning that if two rows has different millisecond they will be placed twice in the
destination table, although they are the same in every column. That's it, the bug
was found. But still, one question remained: Why the query did not show me this? The
answer is also simple but tricky to find: In the comparison query I concatenated all
the columns in the tables in order to compare the results. When I did this, I casted
the Create_Date to nvarchar which truncated the milliseconds. 
&lt;p&gt;
&lt;/p&gt;
&lt;u&gt;Conclusions:&lt;/u&gt; 
&lt;ul&gt;
&lt;li&gt;
Pay attention to non-deterministic elements in what you do, whether it's code or ETL
process. 
&lt;li&gt;
When you do dummy stuff like checking all the checkboxes in a list - think what are
the outcomes. 
&lt;li&gt;
Call Miky when you're desperate.&lt;/li&gt;
&lt;/ul&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=f590b1f7-4d99-4539-bf47-349c121fd431" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,f590b1f7-4d99-4539-bf47-349c121fd431.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services;BI/SQL Server 2008;BI/SQL Server 2008/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=359beb7d-d8dc-42f0-96b0-5782626bee19</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,359beb7d-d8dc-42f0-96b0-5782626bee19.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,359beb7d-d8dc-42f0-96b0-5782626bee19.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=359beb7d-d8dc-42f0-96b0-5782626bee19</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
One more thing about <a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6.aspx">getting
a file from the web/SharePoint and using it as a source in SSIS</a>: If you need to
authenticate just change the xml.open command to:
</p>
        <p>
xml.open "GET", URL, false, "user", "password"
</p>
        <p>
where user and password are the user &amp; password that has permissions to the desired
file. Note that it is VERY recommended to have an application user, so the password
won't be changed in the future. If you don't have such user and you must change your
password in the future, do not forget to change it in the script. My tip: add a reminder
in your calendar to change the password in the script.
</p>
        <p>
In this point I don't know if you can authenticate using SSL or stronger protocols
using VB script.
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=359beb7d-d8dc-42f0-96b0-5782626bee19" />
      </body>
      <title>Using Web or SharePoint File as a Source in SSIS - Authentication</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,359beb7d-d8dc-42f0-96b0-5782626bee19.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,359beb7d-d8dc-42f0-96b0-5782626bee19.aspx</link>
      <pubDate>Tue, 22 Apr 2008 12:20:05 GMT</pubDate>
      <description>&lt;p&gt;
One more thing about &lt;a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6.aspx"&gt;getting
a file from the web/SharePoint and using it as a source in SSIS&lt;/a&gt;: If you need to
authenticate just change the xml.open command to:
&lt;/p&gt;
&lt;p&gt;
xml.open "GET", URL, false, "user", "password"
&lt;/p&gt;
&lt;p&gt;
where user and password are the user &amp;amp; password that has permissions to the desired
file. Note that it is VERY recommended to have an application user, so the password
won't be changed in the future. If you don't have such user and you must change your
password in the future, do not forget to change it in the script. My tip: add a reminder
in your calendar to change the password in the script.
&lt;/p&gt;
&lt;p&gt;
In this point I don't know if you can authenticate using SSL or stronger protocols
using VB script.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=359beb7d-d8dc-42f0-96b0-5782626bee19" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,359beb7d-d8dc-42f0-96b0-5782626bee19.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6</wfw:commentRss>
      <slash:comments>1</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">It figures that using simple excel file
as a source in SSIS is not so trivial, especially if your source is in the web or
in your SharePoint portal. At first you'll think it's easy - just declare the excel
source as a url (the url of the excel file, for example) and it will succeed. The
problem is that Microsoft let you think it's working. Click on the excel source and
you'll see in its properties that the source path is the local temporary internet
files, meaning that the source is a local copy which is not up-to-date, so it's worth
nothing.<br />
Here's what I tried to do and the final (and successful solution):<br /><br />
1. Use the File System task. It won't work because you can't declare an URI there.<br />
2. In the MSDN forum (I can't find the link right now) they say to write a script,
so I also tried this. Using the Script task, I written a code in VB.NET which using
the System.IO libary of the .NET framework and copies the excel file (using its URI)
to the desired location in the local computer. Running it, I got an error saying that
the script can't use URIs...<br />
After trying this I understood that every code or action running in the SSIS context
won't work with URIs. I'm not sure I know why Microsoft developers built it that way
(or maybe it's just another bug). Anyway, the next step is the solution.<br />
3. Build an executable file that performs the desired copy task. You can't use regular
batch (.bat) file because DOS/CmdExec does not know how to work with URIs. So, there
are two ways to perform this:<br />
a. Download <a href="http://users.ugent.be/~bpuype/wget/">this</a> and use it as a
copier from the web.<br />
b. Use the following code and save it as a Visual Basic Script file (*.vbs):<br /><br /><div dir="ltr"><div style="TEXT-ALIGN: left"><b></b>'GetRemoteBinaryFile.vbs<br />
TheFile = "myExcelFile.xls"<br />
DestFolder = "C:\SSIS_Sources"<br />
URL = "http://mySite/myFolder/myExcelFile.xls"<br />
Set xml = CreateObject("Microsoft.XMLHTTP")<br />
xml.Open "GET", URL, False<br />
xml.Send<br />
set oStream = CreateObject("Adodb.Stream")<br />
Const adTypeBinary = 1<br />
Const adSaveCreateOverWrite = 2<br />
Const adSaveCreateNotExist = 1 
<br />
oStream.type = adTypeBinary<br />
oStream.open<br />
oStream.write xml.responseBody<br />
' Overwrite an existing file<br />
oStream.savetofile DestFolder &amp; TheFile, adSaveCreateOverWrite<br />
oStream.close<br />
set oStream = nothing<br />
Set xml = Nothing<br /></div></div><br />
After you have your file (vbs or exe) you can use the Execute Process Task in order
to make the copy. In the task, declare that you want it to run your exe or vbs. 
After that, just use a normal Data Flow Task, where the source excel file is in the
local computer (the file that was copied in the previous task) and the destination
is your desired DB.<br /><br />
Note that:<br />
1. Before executing you must have the excel file already placed in your local computer,
meaning that you must make the first copy before the first time you run the package.
This is because SSIS performs integrity check before running the package and it checks
that the file exists.<br />
2. Even if the copy process is long (because it's coming from the web), don't worry.
SSIS work synchroniously, meaning that the Data Flow task will not start until the
Execute Process task which copies the file will end.<br /><br />
Enjoy.<br /><br /><u>Update:</u> I added a <a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,359beb7d-d8dc-42f0-96b0-5782626bee19.aspx">post
about authentication</a>.<br /><p></p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6" /></body>
      <title>Using Web or SharePoint File as a Source in SSIS</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6.aspx</link>
      <pubDate>Sat, 15 Mar 2008 21:47:26 GMT</pubDate>
      <description>It figures that using simple excel file as a source in SSIS is not so trivial, especially if your source is in the web or in your SharePoint portal. At first you'll think it's easy - just declare the excel source as a url (the url of the excel file, for example) and it will succeed. The problem is that Microsoft let you think it's working. Click on the excel source and you'll see in its properties that the source path is the local temporary internet files, meaning that the source is a local copy which is not up-to-date, so it's worth nothing.&lt;br&gt;
Here's what I tried to do and the final (and successful solution):&lt;br&gt;
&lt;br&gt;
1. Use the File System task. It won't work because you can't declare an URI there.&lt;br&gt;
2. In the MSDN forum (I can't find the link right now) they say to write a script,
so I also tried this. Using the Script task, I written a code in VB.NET which using
the System.IO libary of the .NET framework and copies the excel file (using its URI)
to the desired location in the local computer. Running it, I got an error saying that
the script can't use URIs...&lt;br&gt;
After trying this I understood that every code or action running in the SSIS context
won't work with URIs. I'm not sure I know why Microsoft developers built it that way
(or maybe it's just another bug). Anyway, the next step is the solution.&lt;br&gt;
3. Build an executable file that performs the desired copy task. You can't use regular
batch (.bat) file because DOS/CmdExec does not know how to work with URIs. So, there
are two ways to perform this:&lt;br&gt;
a. Download &lt;a href="http://users.ugent.be/~bpuype/wget/"&gt;this&lt;/a&gt; and use it as a
copier from the web.&lt;br&gt;
b. Use the following code and save it as a Visual Basic Script file (*.vbs):&lt;br&gt;
&lt;br&gt;
&lt;div dir=ltr&gt;
&lt;div style="TEXT-ALIGN: left"&gt;&lt;b&gt;&lt;/b&gt;'GetRemoteBinaryFile.vbs&lt;br&gt;
TheFile = "myExcelFile.xls"&lt;br&gt;
DestFolder = "C:\SSIS_Sources"&lt;br&gt;
URL = "http://mySite/myFolder/myExcelFile.xls"&lt;br&gt;
Set xml = CreateObject("Microsoft.XMLHTTP")&lt;br&gt;
xml.Open "GET", URL, False&lt;br&gt;
xml.Send&lt;br&gt;
set oStream = CreateObject("Adodb.Stream")&lt;br&gt;
Const adTypeBinary = 1&lt;br&gt;
Const adSaveCreateOverWrite = 2&lt;br&gt;
Const adSaveCreateNotExist = 1 
&lt;br&gt;
oStream.type = adTypeBinary&lt;br&gt;
oStream.open&lt;br&gt;
oStream.write xml.responseBody&lt;br&gt;
' Overwrite an existing file&lt;br&gt;
oStream.savetofile DestFolder &amp;amp; TheFile, adSaveCreateOverWrite&lt;br&gt;
oStream.close&lt;br&gt;
set oStream = nothing&lt;br&gt;
Set xml = Nothing&lt;br&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;br&gt;
After you have your file (vbs or exe) you can use the Execute Process Task in order
to make the copy. In the task, declare that you want it to run your exe or vbs.&amp;nbsp;
After that, just use a normal Data Flow Task, where the source excel file is in the
local computer (the file that was copied in the previous task) and the destination
is your desired DB.&lt;br&gt;
&lt;br&gt;
Note that:&lt;br&gt;
1. Before executing you must have the excel file already placed in your local computer,
meaning that you must make the first copy before the first time you run the package.
This is because SSIS performs integrity check before running the package and it checks
that the file exists.&lt;br&gt;
2. Even if the copy process is long (because it's coming from the web), don't worry.
SSIS work synchroniously, meaning that the Data Flow task will not start until the
Execute Process task which copies the file will end.&lt;br&gt;
&lt;br&gt;
Enjoy.&lt;br&gt;
&lt;br&gt;
&lt;u&gt;Update:&lt;/u&gt; I added a &lt;a href="http://www.miky-schreiber.com/Blog/PermaLink,guid,359beb7d-d8dc-42f0-96b0-5782626bee19.aspx"&gt;post
about authentication&lt;/a&gt;.&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,c4c8328e-a5b8-4d6e-99ad-83f9f316b2d6.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services;Tips &amp; Tricks</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=06b1f3d3-fb6f-435a-9c63-769100013c02</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,06b1f3d3-fb6f-435a-9c63-769100013c02.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,06b1f3d3-fb6f-435a-9c63-769100013c02.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=06b1f3d3-fb6f-435a-9c63-769100013c02</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">This is a good one: When you build a flat
file connection to a csv file, you can preview the data. There, there's an option
to skip some rows (Data rows to skip). If you'll leave it with a number greater than
zero - the process itself will skip these rows!! I still wonder if this bug is By
Design or not. If you wish, you can track this bug in <a href="https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=325384">Microsoft
Connect</a>.<br /><p></p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=06b1f3d3-fb6f-435a-9c63-769100013c02" /></body>
      <title>SSIS Flat File Preview (Funny) Bug</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,06b1f3d3-fb6f-435a-9c63-769100013c02.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,06b1f3d3-fb6f-435a-9c63-769100013c02.aspx</link>
      <pubDate>Fri, 01 Feb 2008 05:00:16 GMT</pubDate>
      <description>This is a good one: When you build a flat file connection to a csv file, you can preview the data. There, there's an option to skip some rows (Data rows to skip). If you'll leave it with a number greater than zero - the process itself will skip these rows!! I still wonder if this bug is By Design or not. If you wish, you can track this bug in &lt;a href="https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=325384"&gt;Microsoft
Connect&lt;/a&gt;.&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=06b1f3d3-fb6f-435a-9c63-769100013c02" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,06b1f3d3-fb6f-435a-9c63-769100013c02.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=73f9d050-05bd-4dc9-b719-ae9d256557b3</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,73f9d050-05bd-4dc9-b719-ae9d256557b3.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,73f9d050-05bd-4dc9-b719-ae9d256557b3.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=73f9d050-05bd-4dc9-b719-ae9d256557b3</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">I really think that the time dimension
is the most complex dimension in 90% of the DWHs. The complexity is in two places:
In the DWH design and also in Analysis Services (or any other BI tool).<br />
First of all - why we didn't take the already-made Server Time Dimension which exists
in SSAS 2005? For two reasons: The first is that the <a href="http://www.miky-schreiber.com/blog/default.aspx#aa68059bf-fe59-48d5-a9e8-dd366e6a861c">Project
Real</a> guys do not recommend using it (you can find their SSAS article <a href="http://www.microsoft.com/technet/prodtechnol/sql/2005/realastd.mspx">here</a>).
The second is that we wanted to have some features that are not available in the server
time dimension, such as Hebrew date. In a matter of fact, even if we didn't have such
feature we still would build the time dimension ourselves because it's giving you
much more control over the dimension. For example, you can always add some new attributes
which Microsoft developers didn't think about.<br />
I started myself to build the time dimension in excel. I figured out that this mission
is little more complex that I thought it would. Most of the functions I wrote were
simple, but there were some complicated ones. So here are some tips for you if you
want to build your time dimension using Excel:<br /><ul><li>
If you want to week number for every date, do not write the function yourself... Excel
has function called weeknum. If you don't have it just add the function toolbox which
has it (I can't recall its name right now. check in excel help).</li><li>
If you want to have records for every level in your hierarchy (not only for days),
put every level in its own excel file (not excel tab). It will help you later when
you will transfer it to your DB.</li><li>
Check yourself. Pick randomly some dates and check that all of its record has correct
data.</li></ul>
After building the excel files I needed to transfer it to my Oracle server. I used
SSIS because I didn't want to wait for my DBA to copy these files into the Informatica
server (it can't use my the local files, it has to be in its server. SSIS can use
local files). This also was a little tricky. First of all, close excel when running
the SSIS packages, otherwise it will fail. Second, when moving the non-leaf levels,
go into the columns section in the destination box and erase the irrelevant columns.
It will reduce the chance for errors. Finally, click on the source box and click on
"Show advanced editor". Enter the source's output columns options and define properly
the columns' data types. This also will reduce the chance for errors.<br /><br />
I had a little bit of an argument with my DBA about how should the time dimension
be. I think that the time dimension does not have to be processes at all. My time
dimension is from 1960 until 2020, so no daily ETL is required. She says that all
the logic has to be in Informatica so I need to develop a mapping for this dimension.
I think that we both are right and that's because that in ideal world she is right.
In every developers team, all the BL has to be in one place. But we don't have much
time (the deadline is very close) so I won't spend the time building more mapping
in Informatica when I have the time dimension already made in excel.<br /><br />
Maybe someday I will have the time to do this. Maybe not.<br /><p></p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=73f9d050-05bd-4dc9-b719-ae9d256557b3" /></body>
      <title>Building my Time dimension with excel</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,73f9d050-05bd-4dc9-b719-ae9d256557b3.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,73f9d050-05bd-4dc9-b719-ae9d256557b3.aspx</link>
      <pubDate>Sun, 14 Oct 2007 23:18:38 GMT</pubDate>
      <description>I really think that the time dimension is the most complex dimension in 90% of the DWHs. The complexity is in two places: In the DWH design and also in Analysis Services (or any other BI tool).&lt;br&gt;
First of all - why we didn't take the already-made Server Time Dimension which exists
in SSAS 2005? For two reasons: The first is that the &lt;a href="http://www.miky-schreiber.com/blog/default.aspx#aa68059bf-fe59-48d5-a9e8-dd366e6a861c"&gt;Project
Real&lt;/a&gt; guys do not recommend using it (you can find their SSAS article &lt;a href="http://www.microsoft.com/technet/prodtechnol/sql/2005/realastd.mspx"&gt;here&lt;/a&gt;).
The second is that we wanted to have some features that are not available in the server
time dimension, such as Hebrew date. In a matter of fact, even if we didn't have such
feature we still would build the time dimension ourselves because it's giving you
much more control over the dimension. For example, you can always add some new attributes
which Microsoft developers didn't think about.&lt;br&gt;
I started myself to build the time dimension in excel. I figured out that this mission
is little more complex that I thought it would. Most of the functions I wrote were
simple, but there were some complicated ones. So here are some tips for you if you
want to build your time dimension using Excel:&lt;br&gt;
&lt;ul&gt;
&lt;li&gt;
If you want to week number for every date, do not write the function yourself... Excel
has function called weeknum. If you don't have it just add the function toolbox which
has it (I can't recall its name right now. check in excel help).&lt;/li&gt;
&lt;li&gt;
If you want to have records for every level in your hierarchy (not only for days),
put every level in its own excel file (not excel tab). It will help you later when
you will transfer it to your DB.&lt;/li&gt;
&lt;li&gt;
Check yourself. Pick randomly some dates and check that all of its record has correct
data.&lt;/li&gt;
&lt;/ul&gt;
After building the excel files I needed to transfer it to my Oracle server. I used
SSIS because I didn't want to wait for my DBA to copy these files into the Informatica
server (it can't use my the local files, it has to be in its server. SSIS can use
local files). This also was a little tricky. First of all, close excel when running
the SSIS packages, otherwise it will fail. Second, when moving the non-leaf levels,
go into the columns section in the destination box and erase the irrelevant columns.
It will reduce the chance for errors. Finally, click on the source box and click on
"Show advanced editor". Enter the source's output columns options and define properly
the columns' data types. This also will reduce the chance for errors.&lt;br&gt;
&lt;br&gt;
I had a little bit of an argument with my DBA about how should the time dimension
be. I think that the time dimension does not have to be processes at all. My time
dimension is from 1960 until 2020, so no daily ETL is required. She says that all
the logic has to be in Informatica so I need to develop a mapping for this dimension.
I think that we both are right and that's because that in ideal world she is right.
In every developers team, all the BL has to be in one place. But we don't have much
time (the deadline is very close) so I won't spend the time building more mapping
in Informatica when I have the time dimension already made in excel.&lt;br&gt;
&lt;br&gt;
Maybe someday I will have the time to do this. Maybe not.&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=73f9d050-05bd-4dc9-b719-ae9d256557b3" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,73f9d050-05bd-4dc9-b719-ae9d256557b3.aspx</comments>
      <category>BI;BI/DataWarehousing;BI/Informatica;BI/SQL Server 2005/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=01276594-f240-4611-940c-6626b856977c</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,01276594-f240-4611-940c-6626b856977c.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,01276594-f240-4611-940c-6626b856977c.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=01276594-f240-4611-940c-6626b856977c</wfw:commentRss>
      <slash:comments>5</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">I guess that this will not be my last post
on this subject, but I want to start sharing some thoughts and tips from my experience
when designing and building DWH. In this post I will focus on the fact &amp; dimensions
tables relationship in terms of data completeness (if you wonder what it is, read
on).<br /><br />
Before you start to design the DWH, sit and talk with the people who built the systems
which you take your data from, including the DBA. For every table, ask them what is
the primary key (it's NOT always defined properly in the DB), then ask them again
and then ask them if they are sure. It happened to me that I discovered that the systems
guys were wrong about their DB's primary keys.<br />
The same thing is about Foreign keys and here you should be even more careful. Even
if they claim so, check <b>yourself</b> that every foreign key in the fact table is
placed correctly in the dimension table, <b>especially when the fact table has far
history records</b>. Sometimes system developers or even worse - system DBAs delete
records from the dimension tables that are not relevant. This will cause that these
keys will still be in the fact's history records but will not be found in the dimension
table, causing uncomplete relationship between the fact and the dimension table.<br /><br />
So far is about the part when you talk and "investigate" the system developers (the
DWH design). What to do when you actually developing the DWH? First, develop the tables
of your dimensions tables. <b>Do not forget</b> to add the primary keys in the dimensions
tables and the primary and foreign keys in the fact table. Then develop the ETL processes
and go for the dimensions first.  If you know that the dimension has completeness
problems with the fact table that you will develop later (you talked with the system
developers, remember?), add UNDEFINED (UD key) record for the dimension table. Later,
when developing the fact table's ETL process, make Join with the dimension table and
check that the records' foreign key exists there. If not - change the key to UD. In
SSIS and Informatica (and I guess that also in other products I don't know, such as
DataStage) you can use Lookup instead of Joiner if the dimension table is less that
1G records. That will optimize the ETL process. After you developed all your ETLs,
run the dimension processes. After they finish (assuming everything went OK) run the
fact table's ETL process. If it succeeded you can go and have a drink. If not - check
what went wrong. If you want to know which keys didn't showed up in the dimension
table and causes the <span style="font-size: 12pt; line-height: 115%; font-family: &quot;Times New Roman&quot;,&quot;serif&quot;;">incompleteness </span>problem,
you can <b>disable</b> (not delete) the foreign key from the fact table and run the
process again. Then, with a simple SQL query, check which foreign keys don't exist
in the dimension table. Go back to your ETL design and check what you did wrong. As
I pointed before, in this step you might be very angry at the system developers...<br /><br />
That is all for now. As I said, I assume that more ideas will come on in the future.<br /><p></p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=01276594-f240-4611-940c-6626b856977c" /></body>
      <title>Tips for use when designing and building your DataWarehouse</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,01276594-f240-4611-940c-6626b856977c.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,01276594-f240-4611-940c-6626b856977c.aspx</link>
      <pubDate>Sat, 13 Oct 2007 22:11:49 GMT</pubDate>
      <description>I guess that this will not be my last post on this subject, but I want to start sharing some thoughts and tips from my experience when designing and building DWH. In this post I will focus on the fact &amp;amp; dimensions tables relationship in terms of data completeness (if you wonder what it is, read on).&lt;br&gt;
&lt;br&gt;
Before you start to design the DWH, sit and talk with the people who built the systems
which you take your data from, including the DBA. For every table, ask them what is
the primary key (it's NOT always defined properly in the DB), then ask them again
and then ask them if they are sure. It happened to me that I discovered that the systems
guys were wrong about their DB's primary keys.&lt;br&gt;
The same thing is about Foreign keys and here you should be even more careful. Even
if they claim so, check &lt;b&gt;yourself&lt;/b&gt; that every foreign key in the fact table is
placed correctly in the dimension table, &lt;b&gt;especially when the fact table has far
history records&lt;/b&gt;. Sometimes system developers or even worse - system DBAs delete
records from the dimension tables that are not relevant. This will cause that these
keys will still be in the fact's history records but will not be found in the dimension
table, causing uncomplete relationship between the fact and the dimension table.&lt;br&gt;
&lt;br&gt;
So far is about the part when you talk and "investigate" the system developers (the
DWH design). What to do when you actually developing the DWH? First, develop the tables
of your dimensions tables. &lt;b&gt;Do not forget&lt;/b&gt; to add the primary keys in the dimensions
tables and the primary and foreign keys in the fact table. Then develop the ETL processes
and go for the dimensions first.&amp;nbsp; If you know that the dimension has completeness
problems with the fact table that you will develop later (you talked with the system
developers, remember?), add UNDEFINED (UD key) record for the dimension table. Later,
when developing the fact table's ETL process, make Join with the dimension table and
check that the records' foreign key exists there. If not - change the key to UD. In
SSIS and Informatica (and I guess that also in other products I don't know, such as
DataStage) you can use Lookup instead of Joiner if the dimension table is less that
1G records. That will optimize the ETL process. After you developed all your ETLs,
run the dimension processes. After they finish (assuming everything went OK) run the
fact table's ETL process. If it succeeded you can go and have a drink. If not - check
what went wrong. If you want to know which keys didn't showed up in the dimension
table and causes the &lt;span style="font-size: 12pt; line-height: 115%; font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;;"&gt;incompleteness &lt;/span&gt;problem,
you can &lt;b&gt;disable&lt;/b&gt; (not delete) the foreign key from the fact table and run the
process again. Then, with a simple SQL query, check which foreign keys don't exist
in the dimension table. Go back to your ETL design and check what you did wrong. As
I pointed before, in this step you might be very angry at the system developers...&lt;br&gt;
&lt;br&gt;
That is all for now. As I said, I assume that more ideas will come on in the future.&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=01276594-f240-4611-940c-6626b856977c" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,01276594-f240-4611-940c-6626b856977c.aspx</comments>
      <category>BI;BI/DataWarehousing;BI/Informatica;BI/SQL Server 2005/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=c9a3aa65-47b5-4115-b68a-df6f19c6b4a0</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,c9a3aa65-47b5-4115-b68a-df6f19c6b4a0.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,c9a3aa65-47b5-4115-b68a-df6f19c6b4a0.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=c9a3aa65-47b5-4115-b68a-df6f19c6b4a0</wfw:commentRss>
      <slash:comments>3</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">My friend, Ilya, had a problem in SSIS.
He had a .csv file with too many commas. The meaning is that strings that started
and ended with inverted commas (") and had commas inside it were recognized by SSIS
as new column. For example, the row:<br />
"My name, is Miky", 200, 10 was recognised by SSIS as four columns instead of three.
Ilya wrote down a code for SSIS (in VB) that run before the package begin its work.
Here it is, hope it will help who ever seen this.<br /><br />
Imports System<br />
Imports System.Data<br />
Imports System.Math<br />
Imports Microsoft.SqlServer.Dts.Runtime<br />
Imports System.IO<br />
Imports System.Text<br />
Imports Microsoft.VisualBasic.FileIO<br />
Public Class ScriptMain<br />
Public Sub Main()<br />
  Dim csvFileFullPath As String<br />
  Dim tabFileFullPath As String<br />
  csvFileFullPath = Dts.Connections("Your CSV Connnection").ConnectionString<br />
  tabFileFullPath = Dts.Connections("Your Table Connection").ConnectionString<br />
  Using tabStreamWriter As New StreamWriter(tabFileFullPath, False, System.Text.Encoding.GetEncoding(1255))<br />
  Using csvFileReader As New StreamReader(csvFileFullPath, System.Text.Encoding.GetEncoding(1255),True)<br />
  Dim currentRow As String<br /><br />
  currentRow = csvFileReader.ReadLine()<br />
  tabStreamWriter.WriteLine(currentRow)<br />
  While Not csvFileReader.EndOfStream<br />
    Dim outputRow As New Text.StringBuilder()<br />
    Dim tmp, tmp1 as String<br />
    Dim offset as Int32 = 1<br />
    Dim beginS, endS As Int32<br /><br />
    beginS = 1<br />
    currentRow = csvFileReader.ReadLine()<br />
    beginS = InStr(offset, currentRow, """")<br />
    While Not beginS = 0 Or offset &gt; Len(currentRow)<br />
      endS = InStr(beginS+1, currentRow, """")<br />
      tmp = Mid(currentRow, beginS, endS - beginS)<br />
      tmp1 = Replace(tmp, ",", " ")<br />
      currentRow = Replace(currentRow, tmp, tmp1)<br />
      offset = endS + 1<br />
      beginS = InStr(offset, currentRow, """"")<br />
    End While<br />
    outputRow.Append(currentRow)<br />
    tabStreamWriter.WriteLine(outputRow.ToString())<br />
  End While<br />
End Using<br />
End Using<br />
Dts.TaskResult = Dts.Result.Success<br />
End Sub<br />
End Class<br /><br />
The solution here is to search for any comma (,) that is between two inverted commas
(") and replace it by space.<br />
Although it is a good solution, I would take another solution: Replace any comma by
special string, such as &amp;Miky&amp;, convert the csv file into table, and after
that go over that column(s) and replace any &amp;Miky&amp; by comma.<br /><br /><p></p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=c9a3aa65-47b5-4115-b68a-df6f19c6b4a0" /></body>
      <title>Fixing comma problem in CSV file in SSIS</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,c9a3aa65-47b5-4115-b68a-df6f19c6b4a0.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,c9a3aa65-47b5-4115-b68a-df6f19c6b4a0.aspx</link>
      <pubDate>Mon, 24 Sep 2007 09:31:56 GMT</pubDate>
      <description>My friend, Ilya, had a problem in SSIS. He had a .csv file with too
many commas. The meaning is that strings that started and ended with
inverted commas (") and had commas inside it were recognized by SSIS as
new column. For example, the row:&lt;br&gt;
"My name, is Miky", 200, 10 was recognised by SSIS as four columns instead of three.
Ilya wrote down a code for SSIS (in VB) that run before the package begin its work.
Here it is, hope it will help who ever seen this.&lt;br&gt;
&lt;br&gt;
Imports System&lt;br&gt;
Imports System.Data&lt;br&gt;
Imports System.Math&lt;br&gt;
Imports Microsoft.SqlServer.Dts.Runtime&lt;br&gt;
Imports System.IO&lt;br&gt;
Imports System.Text&lt;br&gt;
Imports Microsoft.VisualBasic.FileIO&lt;br&gt;
Public Class ScriptMain&lt;br&gt;
Public Sub Main()&lt;br&gt;
&amp;nbsp; Dim csvFileFullPath As String&lt;br&gt;
&amp;nbsp; Dim tabFileFullPath As String&lt;br&gt;
&amp;nbsp; csvFileFullPath = Dts.Connections("Your CSV Connnection").ConnectionString&lt;br&gt;
&amp;nbsp; tabFileFullPath = Dts.Connections("Your Table Connection").ConnectionString&lt;br&gt;
&amp;nbsp; Using tabStreamWriter As New StreamWriter(tabFileFullPath, False, System.Text.Encoding.GetEncoding(1255))&lt;br&gt;
&amp;nbsp; Using csvFileReader As New StreamReader(csvFileFullPath, System.Text.Encoding.GetEncoding(1255),True)&lt;br&gt;
&amp;nbsp; Dim currentRow As String&lt;br&gt;
&lt;br&gt;
&amp;nbsp; currentRow = csvFileReader.ReadLine()&lt;br&gt;
&amp;nbsp; tabStreamWriter.WriteLine(currentRow)&lt;br&gt;
&amp;nbsp; While Not csvFileReader.EndOfStream&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim outputRow As New Text.StringBuilder()&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim tmp, tmp1 as String&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim offset as Int32 = 1&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; Dim beginS, endS As Int32&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; beginS = 1&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; currentRow = csvFileReader.ReadLine()&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; beginS = InStr(offset, currentRow, """")&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; While Not beginS = 0 Or offset &amp;gt; Len(currentRow)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; endS = InStr(beginS+1, currentRow, """")&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tmp = Mid(currentRow, beginS, endS - beginS)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tmp1 = Replace(tmp, ",", " ")&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; currentRow = Replace(currentRow, tmp, tmp1)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; offset = endS + 1&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; beginS = InStr(offset, currentRow, """"")&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; End While&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; outputRow.Append(currentRow)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; tabStreamWriter.WriteLine(outputRow.ToString())&lt;br&gt;
&amp;nbsp; End While&lt;br&gt;
End Using&lt;br&gt;
End Using&lt;br&gt;
Dts.TaskResult = Dts.Result.Success&lt;br&gt;
End Sub&lt;br&gt;
End Class&lt;br&gt;
&lt;br&gt;
The solution here is to search for any comma (,) that is between two inverted commas
(") and replace it by space.&lt;br&gt;
Although it is a good solution, I would take another solution: Replace any comma by
special string, such as &amp;amp;Miky&amp;amp;, convert the csv file into table, and after
that go over that column(s) and replace any &amp;amp;Miky&amp;amp; by comma.&lt;br&gt;
&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=c9a3aa65-47b5-4115-b68a-df6f19c6b4a0" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,c9a3aa65-47b5-4115-b68a-df6f19c6b4a0.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=a7fd1d19-64ca-493c-ba9b-9137583ff499</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,a7fd1d19-64ca-493c-ba9b-9137583ff499.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,a7fd1d19-64ca-493c-ba9b-9137583ff499.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=a7fd1d19-64ca-493c-ba9b-9137583ff499</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">I was asked how to get to SSIS log to see
how much time took for the package to run.<br />
Well, that depends.<br /><u>On Development:</u><br />
When developing new package, after running the process (click on the green arrow or
press F5) there's a new tab called Progress. Clicking it will show you everything
about the package's execution, including the time it started and the time it finished.<br /><br /><u>On Production:</u><br />
When developing the package, open the SSIS menu (Yes, there is a menu called as the
product's name. Microsoft...) and click on Logging... There, you can define logs for
your package. You can log in many ways: Writing to SQL sever, output file, XML file
and more. I recommend logging into SQL server and logging only the big and "hard"
parts in your data flow. In the Details tab, pick up only the exceptional events,
such as onError, onTaskFailed and onWarning. If you wish to know how much time took
for you package to run, also pick up onProgress.<br /><br />
Follow <a href="ms-help://MS.VSCC.v80/MS.VSIPCC.v80/MS.SQLSVR.v9.en/extran9/html/6f60cf93-35dc-431c-908d-2049c4ab66ba.htm">this
link</a> to read about every event in SSIS.<br /><p></p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=a7fd1d19-64ca-493c-ba9b-9137583ff499" /></body>
      <title>SSIS log and execution time</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,a7fd1d19-64ca-493c-ba9b-9137583ff499.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,a7fd1d19-64ca-493c-ba9b-9137583ff499.aspx</link>
      <pubDate>Sun, 09 Sep 2007 20:09:23 GMT</pubDate>
      <description>I was asked how to get to SSIS log to see how much time took for the package to run.&lt;br&gt;
Well, that depends.&lt;br&gt;
&lt;u&gt;On Development:&lt;/u&gt;
&lt;br&gt;
When developing new package, after running the process (click on the green arrow or
press F5) there's a new tab called Progress. Clicking it will show you everything
about the package's execution, including the time it started and the time it finished.&lt;br&gt;
&lt;br&gt;
&lt;u&gt;On Production:&lt;/u&gt;
&lt;br&gt;
When developing the package, open the SSIS menu (Yes, there is a menu called as the
product's name. Microsoft...) and click on Logging... There, you can define logs for
your package. You can log in many ways: Writing to SQL sever, output file, XML file
and more. I recommend logging into SQL server and logging only the big and "hard"
parts in your data flow. In the Details tab, pick up only the exceptional events,
such as onError, onTaskFailed and onWarning. If you wish to know how much time took
for you package to run, also pick up onProgress.&lt;br&gt;
&lt;br&gt;
Follow &lt;a href="ms-help://MS.VSCC.v80/MS.VSIPCC.v80/MS.SQLSVR.v9.en/extran9/html/6f60cf93-35dc-431c-908d-2049c4ab66ba.htm"&gt;this
link&lt;/a&gt; to read about every event in SSIS.&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=a7fd1d19-64ca-493c-ba9b-9137583ff499" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,a7fd1d19-64ca-493c-ba9b-9137583ff499.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=a68059bf-fe59-48d5-a9e8-dd366e6a861c</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,a68059bf-fe59-48d5-a9e8-dd366e6a861c.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,a68059bf-fe59-48d5-a9e8-dd366e6a861c.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=a68059bf-fe59-48d5-a9e8-dd366e6a861c</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">I never thought that I'll do a commercial
to Microsoft, but <a href="http://www.microsoft.com/sql/solutions/bi/projectreal.mspx">Project
Real</a> is a great thing that they did and they should get the credit for it. This
project is a full end-to-end BI solution, including ETLs (using SSIS), Analysis Services
cubes and mining modules, Reporting Services reports, end-user Panorama views and
more.<br /><br />
We work with Panorama as our main GUI tool to show our users the cube's data as tables,
charts, dashboards, etc, so this project is really helping us to learn how to implement
our project from the first ETL step all the way to the last Panorama step.<br /><br />
Recommended.<br /><p></p><img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=a68059bf-fe59-48d5-a9e8-dd366e6a861c" /></body>
      <title>Project Real</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,a68059bf-fe59-48d5-a9e8-dd366e6a861c.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,a68059bf-fe59-48d5-a9e8-dd366e6a861c.aspx</link>
      <pubDate>Sat, 14 Jul 2007 21:05:54 GMT</pubDate>
      <description>I never thought that I'll do a commercial to Microsoft, but &lt;a href="http://www.microsoft.com/sql/solutions/bi/projectreal.mspx"&gt;Project
Real&lt;/a&gt; is a great thing that they did and they should get the credit for it. This
project is a full end-to-end BI solution, including ETLs (using SSIS), Analysis Services
cubes and mining modules, Reporting Services reports, end-user Panorama views and
more.&lt;br&gt;
&lt;br&gt;
We work with Panorama as our main GUI tool to show our users the cube's data as tables,
charts, dashboards, etc, so this project is really helping us to learn how to implement
our project from the first ETL step all the way to the last Panorama step.&lt;br&gt;
&lt;br&gt;
Recommended.&lt;br&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=a68059bf-fe59-48d5-a9e8-dd366e6a861c" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,a68059bf-fe59-48d5-a9e8-dd366e6a861c.aspx</comments>
      <category>BI;BI/Panorama;BI/SQL Server 2005;BI/SQL Server 2005/Analysis Services;BI/SQL Server 2005/Integration Services</category>
    </item>
    <item>
      <trackback:ping>http://www.miky-schreiber.com/Blog/Trackback.aspx?guid=396609a8-c65f-4579-a5be-623c0c4953cf</trackback:ping>
      <pingback:server>http://www.miky-schreiber.com/Blog/pingback.aspx</pingback:server>
      <pingback:target>http://www.miky-schreiber.com/Blog/PermaLink,guid,396609a8-c65f-4579-a5be-623c0c4953cf.aspx</pingback:target>
      <dc:creator>mikypuff</dc:creator>
      <wfw:comment>http://www.miky-schreiber.com/Blog/CommentView,guid,396609a8-c65f-4579-a5be-623c0c4953cf.aspx</wfw:comment>
      <wfw:commentRss>http://www.miky-schreiber.com/Blog/SyndicationService.asmx/GetEntryCommentsRss?guid=396609a8-c65f-4579-a5be-623c0c4953cf</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
Few months ago we were given an assignment to copy/move all our DTSs that were
running with SQL Server 2000 to the new SQL Server 2005 Integration Services (SSIS).
My friend Michael did it and wrote some important notes that he discovered when building
ETLs with Integration Services. I decided to list them here because they are important
and useful, especially for those who haven't got the time to develop with SSIS so
far.
</p>
        <ul>
          <li>
One of the greatest improvements in SSIS is that between the source and the destination
of the ETL process you can do many things, such as making new fields, sorting, converting
data types, union all between different sources, implement your logic on a field,
and much more. This is much easier than ever because all you need to do is to
add a block to the data flow task and define it for your purposes. 
</li>
          <li>
SSIS ships with a tool for migrating SQL Server 2000 DTSs. Do not use this tool. Sometimes
the result of the conversion is not good enough and in all cases you can't edit the
new migrated data task. 
</li>
          <li>
When making a connection to a non-Microsoft DB, such as Oracle, use OLE DB client
instead of the out-of-fashion ODBC. 
</li>
          <li>
When the destination field is shorter (string type) than the source, add a data conversion
block and cut the string. Otherwise, there will be an annoying warning even if the
truncation is wanted. 
</li>
          <li>
In many times (when working with non-Microsoft providers) the automatic recognition
of the length &amp; types of the source fields is not correct. Enter the source block
and edit these properties by yourself. 
</li>
          <li>
When moving a Unicode field (data type DT_WSTR) to a non-Unicode destination
field (DT_STR), a data conversion block is required. 
</li>
          <li>
SQL Server 2000 Stored Procedures will work in SSIS, but the Linked Servers definitions
are problematic. Consider another options rather than using linked servers. 
</li>
          <li>
When the source/destination is a CSV file, use Flat File Connection. But if it is
an Excel file (.xls), use a Microsoft Jet OLE DB connection and define the source
as OLE DB Source (yes, it will work with Excel files).</li>
        </ul>
        <p>
Again, thanks for Michael for making and sharing these notes.
</p>
        <img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=396609a8-c65f-4579-a5be-623c0c4953cf" />
      </body>
      <title>First meeting with Integration Services</title>
      <guid isPermaLink="false">http://www.miky-schreiber.com/Blog/PermaLink,guid,396609a8-c65f-4579-a5be-623c0c4953cf.aspx</guid>
      <link>http://www.miky-schreiber.com/Blog/PermaLink,guid,396609a8-c65f-4579-a5be-623c0c4953cf.aspx</link>
      <pubDate>Thu, 31 May 2007 20:17:10 GMT</pubDate>
      <description>&lt;p&gt;
Few months ago we were given an assignment to copy/move all&amp;nbsp;our DTSs that were
running with SQL Server 2000 to the new SQL Server 2005 Integration Services (SSIS).
My friend Michael did it and wrote some important notes that he discovered when building
ETLs with Integration Services. I decided to list them here because they are important
and useful, especially for those who haven't got the time to develop with SSIS so
far.
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
One of the greatest improvements in SSIS is that between the source and the destination
of the ETL process you can do many things, such as making new fields, sorting, converting
data types, union all between different sources, implement your logic on a field,
and much more. This is much&amp;nbsp;easier than ever because all you need to do is to
add a block to the data flow task and define it for your purposes. 
&lt;li&gt;
SSIS ships with a tool for migrating SQL Server 2000 DTSs. Do not use this tool. Sometimes
the result of the conversion is not good enough and in all cases you can't edit the
new migrated data task. 
&lt;li&gt;
When making a connection to a non-Microsoft DB, such as Oracle, use OLE DB client
instead of the out-of-fashion ODBC. 
&lt;li&gt;
When the destination field is shorter (string type) than the source, add a data conversion
block and cut the string. Otherwise, there will be an annoying warning even if the
truncation is wanted. 
&lt;li&gt;
In many times (when working with non-Microsoft providers) the automatic recognition
of the length &amp;amp; types of the source fields is not correct. Enter the source block
and edit these properties by yourself. 
&lt;li&gt;
When&amp;nbsp;moving a Unicode field (data type DT_WSTR) to a non-Unicode destination
field (DT_STR), a data conversion block is required. 
&lt;li&gt;
SQL Server 2000 Stored Procedures will work in SSIS, but the Linked Servers definitions
are problematic. Consider another options&amp;nbsp;rather than using linked servers. 
&lt;li&gt;
When the source/destination is a CSV file, use Flat File Connection. But if it is
an Excel file (.xls), use a Microsoft Jet OLE DB connection and define the source
as OLE DB Source&amp;nbsp;(yes, it will work with Excel files).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Again, thanks for Michael for making and sharing these notes.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://www.miky-schreiber.com/Blog/aggbug.ashx?id=396609a8-c65f-4579-a5be-623c0c4953cf" /&gt;</description>
      <comments>http://www.miky-schreiber.com/Blog/CommentView,guid,396609a8-c65f-4579-a5be-623c0c4953cf.aspx</comments>
      <category>BI;BI/SQL Server 2005;BI/SQL Server 2005/Integration Services</category>
    </item>
  </channel>
</rss>