Archive: August ’08

Live from Morgo 29 Aug ’08

Morgo is a two day conference for hi tech entrepreneurs from New Zealand held in the picturesque Bay of Islands.  The official theme this year was “Partnering for Success” but some of the strongest presentations were on giving back rather than partnering.

Selwyn Pellet, on the board of AIM listed Endace and now with Imarda talked about an entrepreneurship apprenticeship he is looking at setting up.  Born from the idea that you need entrepreneurs to grow economies he is looking at program that apprentices promising gen-ys (which he acknowledges may be a contradiction) with dark side CEOs – he is looking for sponsorship so the youth can fully participate with the CEO – joining them in all meetings, travel etc.  Sounds to me like a great idea and I would be in.

Scott Farquar from Atlassian (one half of the Australian Entrepreneurs of the year a couple of years ago) talked about www.kiva.com as well as Atlassian’s 1% initiative – giving 1% of revenue, equity and time into the Atlassian Foundation.  They also offer non-for-profit licences, and he offered the details from his web site if anyone wants to copy the idea.

Other highlights of the conference were Ben Anderson from CT Partners on the talent war and Andy Lark from Dell using an Olympics medal metaphor to discuss New Zealand’s relative performance to other countries. 

Glenn Martin’s talk about his recently released jet pack was highly engaging.  After spending 27 years working on a dream he was justifiably proud of his invention – and there is no denying flying with a jet pack is pretty cool.  Unfortunately the pack didn’t arrive in time so we were not able to get a demo.  Didier Elzinga, CEO from Rising Sun Pictures, gave a great talk about working with Hollywood from Adelaide, and how for the people he does business with if you outside their zip code you might as well be in Adelaide.

The Honorable David Cunliffe launched the Government’s Digital 2.0 Strategy at the conference.  His comparison of politicians to entrepreneurs was a bit of a stretch – I don’t know of too many politcos who have mortgaged their homes and borrowed money off their friends to go into Parliament.  I was a little surprised the Digital 2.0 targets were not a bit more ambitious. They are throwing some money at making sure the entire country can get broadband which is great, but surely we can plan for a bit faster speed especially in the urban areas?

Overall a great couple of days. As usual it is the informal discussions that are as useful as the actual content, it is great to catch up with a bunch of people all trying to make a difference, and a conference without people trying to sell you stuff is always refreshing.  Thanks to the Morgo team for organizing it and inviting me.

Dimension Types: What are the different dimension types and how they are implemented in WhereScape RED? 22 Aug ’08

RED operates with dimension types that are recognized by modern data warehouse theory. When a developer starts working on dimension tables he gets to chose from four options for the default generation of the dimension table and its update procedure via the following dialog box: 

 

A Normal Dimension is where a dimension record is updated and changed whenever any of the non-business key information changes.   This is commonly called a type 1 dimension and is used when historical values of the change are not kept. 

Overwriting the old values.  A Normal dimension type is the most common dimension type. RED auto-generates a stored procedure to handle update and insert paths. Consider this simple example. The “Product” table contains a product named, “Thingamabob” with Product ID being the primary key. In the year 2007, the price of Thingamabob was $150 and over the time, the product’s price changes from $150 to $350. 

In the year 2007, if the price of the product changes to $250, then the old values of the columns “Year” and “Product Price” have to be updated and replaced with the new values. In this Normal Type, there is no way to find out the old value of the product “Thingamabob” in year 2006 since the table now contains only the new price and year information. 

The logic that WhereScape RED applies to the stored procedure handling the Normal dimension type works as follows:

- The stored procedure gets all unknown records from the dim table. If an unknown dim record does not exist, then the procedure allows an explicit value to be inserted into IDENTITY field

- The procedure loops through all the input records, attempting to update the dimension records based on the business key value. If the update attempt fails, then the procedure performs an insert. 

A Slowly Changing dimension is where new dimension records are created when certain identified columns in the dimension change.   This is commonly called a type 2 dimension and tracks point in time historical changes. 

Creating another added record. In this type, the old values will not be updated but a new row containing the new values will be inserted in the product table. Consequently, at any point in time, the difference between the old values and new values can be retrieved and easily compared. This can be very useful for reporting purposes. 

The problem with the above mentioned data structure is the “Product ID” column cannot store duplicate values for “Thingamabob” since “Product ID” is the business /primary key. Also, the current data structure doesn’t clearly specify the effective date and expiration date of the product when the change to its price happened. So, it would be better to change the current data structure to overcome the above primary key violation. 

 

In the changed Product table’s Data structure, “Product ID” and “Effective Datetime” are now a composite business / primary key. Therefore, there will be no violation of the primary key constraint. Addition of the new columns, “Effective Datetime” and “Expiration Datetime” provides the information about the Thingamabob’s Effective date and Expiration date adding more clarity and noticeably improving the scope of this table. The slowly changing dimension (SCD), type 2 approach, may need added space in the database, since for every changed record; an extra row has to be stored. Since dimensions are usually not that big in the real world, extra space is negligible. 

RED performs the following additional tasks in the stored procedures to handle Slowly Changing Dimensions (SCD), type 2 dimensions.

- Three new special variables are added: insert indicator, version indicator and version number.

- Four extra columns are added to support slowly changing dimensions: current_flag; version; start_date; end_date.

- RED automatically generates an alter table SQL script, based on type. It also adds a unique, business key index to support the slowly changing columns.

- In the update stored procedure loop, if the record is not found, the procedure inserts a new record and sets the insert flag on. Otherwise, the procedure checks to see if we have an existing dimension row and have just had a field change for data that is not part of the slowly changing data that we are interested in and performs an update. If there is an existing dimension row and it verifies that any of slowly-changing fields have changed. The procedure then updates the version indicator and version number, and inserts a new row. 

A Previous Values dimension commonly called a type 3 dimension, allows the storing of the last value of selected fields in secondary columns.

Creating new fields. In this dimension type, the latest update to the changed values can be seen. The example shown below illustrates how to add new columns and keep track of the changes. From the example, we can see the current price and the previous price of the product, “Thingamabob”. 

The problem with the Previous Values approach is over time if the product price has multiple changes; the complete history may not be stored, only the latest change will be stored. For instance, in year 2008, if the Thingamabob’s price changes to $350, then we would not be able to see the complete history of 2007 prices, since the old values would have been updated with 2008 product information. 

- RED guides you through the process of adding secondary columns. 

A Date Ranged dimension supports source systems that provide date ranged information. It rolls over time based on specified columns and date range and the history is kept. Otherwise, it looks just like the Slowly Changing dimension type. The difference in that the date range dimension is valid for a given range or set of dates on specified columns. 

The following example might be appropriate here.  Thingamabobs are on sale for the date range 1/1/08 – 1/31/08 at $150 and on sale for the date range 2/1/08 - 2/29/08 for $200.  So the dimensional records would be for 1/1/08…1/31/08, 2/1/08…2/29/08 and 3/1/08 to present each range would be represented by a date range dimension key and the product on sale for that dimension key could be reported on.   

- RED will guide you through the process of identifying which column(s) will be date ranged columns, i.e. “from” datetime column followed by the “until” datetime column. E.g. such “from” column could be “Effective Datetime” and “until” column - Expiration Datetime  

Note: RED is using slightly different naming convention.

The above table will look like this:

(*)          = (Business Key)

(**)       = (Primary Artificial Key)

(+)          = (Slowly Changing Business Key) 

A Get Dimension Key Stored Procedure 

The ‘Get Key’ stored procedure is auto-generated by RED to return an artificial key when a business key is supplied. The normal syntax is to call the function (usually from update_stage_% stored procedure) passing the business key and be returned a status indicating the result of the lookup.  

For example:

EXEC   @v_return_status = get_dim_product_key 

      @v_stage_product     

     ,’N’ — Auto add flag     

     ,’N’ — Write error trail entry if not found flag     

     ,@p_sequence

     ,@p_job_name

     ,@p_task_name

     ,@p_job_id

     ,@p_task_id     

     ,@v_stage_product_key OUTPUT     

     ,@v_getkey_status OUTPUT

On a successful lookup, the artificial key for the dimension record is also returned. If the lookup fails because the business key is not found in the dimension then an appropriate status is returned and the artificial key is set to 0. 

The main logic as follows:

- Lookup the key: If dss_end_date is NULL, then we are just looking up the current record. If not null then we are trying to match to a specific slowly changing record, so look through all of the records until finding the appropriate one.

- The procedure then loops through all the old keys looking for a match on the date. If key is found then the procedure sets the key and the flag.

- If no match found then the procedure gets the current version.

- If a dimension key is not found, then the procedure will add it if we have the auto_add flag set.

- If requested (auto add log message), the procedure will log a message about the auto add and log a message about the lookup failure.

Live from TDWI 2008 in San Diego 22 Aug ’08

I’m soaking up sunshine on the terrace outside the annual TDWI World Conference in San Diego.   Pelicans are diving into the harbor just a few yards away.  Must remember to add tube of expensive French sunblock to expense report.  This is a great town for a conference.  Sales Veep Mark Budzinski and I walked to the Gaslamp district for dinner last night and it was hopping.  Restaurants and bars full of people and a big private party with a live band that took up a whole street.  Note to Michael: I’ve got the details if we want to do something similar for next year’s conference.  Maybe we can book Flight of the Conchords.

The exhibition hall has been mobbed and Mark, Raphael and I are talking non-stop to people who want to know about RED.  The new booth is really drawing them with the bright graphics and hard-to-ignore message: save TIME and MONEY.  With budgets tight, folks are really interested in what RED can do for them.

There’s also a growing buzz here about using Agile (or RAD) development methodologies for data warehouse development.  Agile methods are old hat, of course,  to Wherescape and our community of RED fanatics.  Six months and a zillion dollars to gather user requirements that will be outdated before the first draft is even written?  No thanks, mister multi-national tasseled-loafer consulting company.   How about we just GET IT DONE using RED?  Develop right alongside the users so they can show us exactly what they want and so we can respond quickly as the business evolves.  

This idea is catching on throughout the DW/BI community and several folks came up to talk about RED as enabling technology for Agile projects.  Stay tuned for more developments around Agile methodologies, including a White Paper and ultra-secret Project X.

It’s not all sunshine and carrot cake here in San Diego, though.   We get a little punchy after a couple of hours of talking about data warehousing.  Here’s a conversation that took place late yesterday afternoon between me and Raphael Klebanov, Wherescape’s main technical brain in Denver.  Raphael is originally from Russia, so you have to apply the correct accent to his lines:

Raphael:  I am sorry, Vince, I have maybe stupid question.

Vince:  There are no stupid questions, Raphael.  How can I help you?

Raphael: It is “TDWI”.  I have figured out that “DWI” stands for “Data Warehouse Institute”.  But what does “T” stand for?

Vince: You were right before.  That is a stupid question.  It stands for “The”.

Raphael: The what?

Vince: The Data Warehouse Institute. TDWI.

Raphael: I have that already.  This is Data Warehouse Institute.  But what does “T” stand for?

Vince: ”The”!

Raphael: The what?  This is what I want to know.  If it stands for something I would like to know what it is.

Vince: It stands for THE.  Just that word.

Raphael: What word?

Vince: The word THE!  THE DATA WAREHOUSE INSTITUTE!  TDWI!  That’s what it stands for!

Raphael: Yes, I know it stands for that.  But the “T” I don’t understand.

Vince: [turning to address the waiter who was clearing our coffee cups] Sir, could I tell you about how Wherescape RED can save you time and money on your next data warehouse project?

That’s it for now, live from TDWI 2008 in San Diego.  A big shout-out to our homeboy Sid from Sid Adelman and Associates;  Larissa Moss from Method Focus; Grace from Tagzfly Solutions in sunny Kingston, Jamaica; the BI crew from P.F. Changs (nice work snapping up all those raffle prizes);  and our booth neighbors Pervasive software, who were out the door and on the trolley to Tijuana as soon as the exhibit hall doors closed.  Don’t bother calling Pervasive tech support tomorrow. Those boys are going to be sportin’  major headaches, and maybe even a few new tattoos. 

Simpler, Faster Business Intelligence Webinar 21 Aug ’08

Join CIBER, WhereScape, and Strategy Companion for a complimentary one-hour webinar to learn how you can launch a business intelligence / data warehouse initiative without an overwhelming investment in time, tools, and technologists.

The webinar will take place on Wednesday 10th September Noon Eastern / 9.00am Pacific Time.  

Registration is free and can be completed here.

BI on a Shoestring 21 Aug ’08

When your software enables rapid building, eases change and simplifies management, one of the really nice things is that the cost for our customers of building and maintaining  data warehouses comes tumbling down.  While this is good for everyone in this economy, it is especially valuable in the mid and departmental markets where budgets are traditionally tight. 

Doing more with less is the theme for our next seminar, “Business Intelligence on a Shoestring” to be held in Portland, Oregon on Tuesday 9th September.  If you are in the area check us out.  One of the things we will be doing is building, live and in real time, a working data warehouse from scratch during the seminar.  For more information or to register see the seminar invite.