Archive: December ’07

SnapShot Fact Tables vs Type 2 dimensions 20 Dec ’07

SnapShot fact tables are marvellous things for tracking changes over time.  I often use them instead of type 2 dimensions which are too hard to business people to understand (and a pain to maintain) - take the example of a customer dimension, where the business users have said that they want to track attribute(s) over time (for example a customer category).  If these attributes are fairly static then a type 2 dimension would be OK, but if it changes relatively frequently (more than once a year) or the business actually want to report on changes / trends over time then a snapshot fact table is often the better solution - in that case I would have a type 1 customer dimension, a customer snapshot fact table and a customer category dimension table.

The grain of the fact table would be each customer per snapshot date (perhaps monthly).  Easy to understand and very easy to track changes over time - think of the SQL to do a trend of customers by category over time using just a type 2 customer dimension - possible, but yuk!

Where this approach is not so attractive is if there are tens of millions of dimensional members (eg customers) or if the changes need to be tracked at a daily level - in this case the fact table might get unpalatably large quite quickly.  For example 5 years of history on 10 million customers leads to a fact table of 5 * 12 * 10mln = 600mln.  Not absurd, but you would want to use partitioning, and makesure that your fact table is pretty tight (just keys and measures).  If the fact table is likely to get too large then slowly changing logic in the fact or dim is going to save space (it will still be a pain to report on though).

It would be pretty easy to build up a snapshot fact table from scratch over time by adding the current snapshot once a month.  To populate the snapshot fact table grain from a historical record (eg from an ODS) using a start and end date I often use a manual join to the date dimension (add the dim_date_key to the stage table but mark it as a manual dimension join in the key properties), then use a join like the following in the procedure build:

FROM load_customer
JOIN dim_date
ON dim_date.calendar_date BETWEEN load_customer.effective_from_date AND load_customer.effective_to_date
AND dim_date.cal_day_in_month = 1

This ensures that I get a record in the fact table for each customer for each month using generated code.  In RED I have to change the source table in the column properties of the dim_date_key to be the date dimension to ensure the meta data is correct downstream for documentation and for building OLAP cubes.

Without buy in what are you building…your next legacy system? 12 Dec ’07

Couldn’t have said it better ourselves…WhereScape customer ASB Bank were quoted in the press this week talking about their data warehouse project at the bank.  It was really interesting to see them talking about how adding more and more data to the warehouse was not what their users needed, the data warehouse team needed to get closer to the business and make more relevant information available.  When this happened they delivered less than they thought they had to, but added more value.  This is a fantastic benefit of data warehouse prototyping - we are pleased WhereScape RED could help.

The full Computerworld article is here.

Getting on the analyst companies radar 10 Dec ’07

“Is WhereScape RED an ETL tool?” is a surprisingly hard question for us to answer.  We have literally hundreds of customers who use us to “form data marts and data warehouses”, part of the webopedia definition of ETL.  Yet we also have many customers who use WhereScape RED in conjunction with a traditional ETL tool. 

We could just say “yes” - people use us to build a data warehouse, so we must be an ETL tool, and we could conveniently ignore the fact that we can coexist and add value to ETL tool customers.  However there is another problem - we don’t operate in the same was as a traditional ETL tool, making comparisons difficult.  Customers buy WhereScape RED to build data warehouses quicker and to make management and changes easier, to manage the entire data warehouse lifecycle, not purely as an ETL tool.

Unfortunately for us this does not fit into one of the neat boxes (or four boxes) that the analyst companies use.  We were therefore pleased to be mentioned in both the Gartner Magic Quadrant for Data Integration Tools, 2007 and also IDCs Worldwide Data Warehouse Platform Tools 2006 report.

 Gartner put us in a category of:

..vendors that are sometimes considered by Gartner clients alongside those appearing in the Magic Quadrant when deployment needs are aligned with their specific capabilities and/or are newer market entrants with relevant capabilities.

IDC  included us in the “Other Selected Vendors” correctly identified as data warehouse lifecycle management software.

The pure play data movement and ETL categories are not for us, the ETL question is answered differently for different customers, but the in a world of discontinuous change the concept of building quickly and enabling change is universal and compelling.

Naturally Fresh 3 Dec ’07

A belated thank you to WhereScape customer Naturally Fresh for their product review of WhereScape RED in the July issue of DM Review. We really appreciate it when our customers take the time to do product reviews or case studies - it makes a huge difference to us, and prospective customers always like to read independent reviews.