Archive: January ’08

Re-targeting source tables, the forgotten WhereScape RED functionality 31 Jan ’08

I sometimes need to modify one of my dimension, fact or model tables and re-target the source table in the field definitions.

It can be because the source name has changed or because it needs to be replaced by another table upstream. I must admit, I used to change them manually. It was pretty painful. Result: thousands of copy/pastes to do and the guarantee that I would forget a few fields. This was leading to endless procedure re-generations before it successfully compiled. When the table was holding a huge number of fields, this could rapidly turn into a nightmare.

It turns out that you don’t actually need to pass it to your trainee or your neighbor, or even try to do tricky things in the WhereScape RED metadata tables. A brilliant WhereScape RED option is ready to give you a hand. All you have to do is right click on your table in the left pane of WhereScape RED. Then, choose “re-target source”.

A dialog box will appear. The target table obviously is the table to modify. You need to set the primary source table, i.e. the source you want to modify. The two other dropdown lists will serve to map fields from target and new source tables.

Two modes are basically available. First one is “Auto Match”; you rely on WhereScape RED to find in the specified new source table the corresponded fields. For this mode, column names from new source and target must be the same. Click on the “Auto Match” button, press “OK”, the job is done!

The second option is do it manually. For every column of your table you can go and pick the new source from any table / field available in the metadata. This takes a bit longer but is very useful when the name of the column is to be changed over the two tables. Column renaming should nevertheless rarely occur over fact, dimension or model population.

Let’s conclude by a warning. This option will be very efficient but won’t replace source tables and fields in any transformation definition of the target. This is something that might be changed in a later version but generally, transformations are unlikely to be done anyway when populating fact, dimension or model tables.

Cube partitioning is good for your health 14 Jan ’08

When the data volume is getting big, salvation always comes from partitioning. Cubes are no exceptions. I am always amazed to see how it can both simplify business user’s life and mine. Simply defining partitions by periods of time radically shortens the loading duration as well as improving performance reporting. 

Let’s take an example: we want to build a basic sales cube holding 8 years of history. Data is changing nightly on the current year. We naturally do not want to rebuild the full cube every night and the users won’t want to stand around waiting for full scan queries. Partitioning by year will allow us to only rebuild the current year nightly. Trends and comparisons between periods will be a lot faster. Users, who will generally focus on current sales, will appreciate the performance. Less than 1/8 of the cube needs to be parsed each time they’re accessing current year’s data. Everybody wins.

WhereScape RED has been designed to be a great help to create your multi-partitioned cube. But before letting the magic happens, it’s all about finding a good partition key. Logically, the natural choice comes to a time period but not always. The choice really depends on data volume, data organization and history limits.

Using WhereScape Red, we have 3 options for partitioning. The list box “Partitioning method” is located in the properties of the cube, under the “Cube Processing / Partition”. This tab contains all related information. From there, it is pretty easy to define the way we populate the cube’s data as well as its dimensions, depending on the chosen method of partitioning.

Every cube possesses at least one partition. The first option, for small cubes, is to keep a single partition. This works when the data volume is not huge and the cube doesn’t need tuning in the first place. The population option in that case would logically be the full refresh mode.

The second one allows us to manually create partitions. For that, select “Manually managed multiple partitions” in the “partitioning method” dropdown list. Define the other parameters then exit the properties. By right clicking on the cube in the left pane, we choose “Add partition”. A dialog box appears with all the new partition attributes. We just have to fill the blanks. This option is handy if we want to create for instance a partition on the last few years only and keep another big one for all the previous history.

The third one is really powerful; you can set the partitioning to be automatically done. This option is available if the cube is to be partitioned by one numeric value. It should preferably be on a time period value like day, month, quarter or year. Once the key is defined, WhereScape RED will create all necessary partitions for us. By using the next options in the tab, we are able to optimize the query made by WhereScape RED to check if a new partition needs to be created as well as we are able to limit or not the number of partitions. The last option is very useful; we can ask WhereScape RED to populate only the last x partitions during the normal periodic process (daily in our example). It becomes really easy to replace some partitions and recreate only parts of the cube.

As a conclusion, partitioned cubes combined with a well structured dataflow upstream and an efficient fact table population strategy (such as fact table partitioning…) will ensure performance on the whole chain of data warehousing. All the users will see is their cubes ready when they arrive in the morning instead having to wait for mid-afternoon, and better performance – nice…

Innovation 10 Jan ’08

I was asked by the renowned entrepreneur Ken Morse of the MIT Entrepreneurship Center to be a judge at an entrepreneurship  course he ran this week.  The concept was that during the cocktail function at the end of the first day the participants had to find judges and give them their elevator pitches (55 seconds), we then scored them on whether we would give them a meeting as well as various other criteria.  It is a surprisingly hard thing to do - I know having had to do it myself while I was on the  Entrepreneurship Development Program at MIT Sloan last year.  Run away winner was Martyn Levy from RoamAD pitching his wireless networking platform.  Overall there was an amazing array of talent at the event, and some fantastic businesses in the making and already winning customers. 

Well done HP 8 Jan ’08

WhereScape has for a long time been involved in HP’s Developer and Solution Partner Program (DSPP), was awarded HP Elite ISV status in 2006 (we still don’t know what that means),  and are pleased to have a large number of very happy customers running on HP hardware.  We want to offer our congratulations to HP for the expansion in their business intelligence customer base (see their press release  announcing a 50% increase).

Particularly pleasing for us was to see that a quarter of the projects listed were completed using WhereScape RED.

Well done HP and may that growth continue in 2008.