Archive: June ’08

Thanks to ASB Bank for a great case study 30 Jun ’08

Our thanks to Rosalie and Yohan at ASB Bank for the great case study on their Human Resources data mart.  It was the sort of project our guys really enjoy working on, with committed and involved users.  When you are doing a prototype based build engagement is the critical factor.  It can be used to decrease cycle times, as Yohan said “We could iterate the warehouse daily, and immediately work around any roadblocks or new requirements that came up.”  Engagement also results in a better solution.  We find that when issues arise the developers and users can jointly attack them, together working through any compromises, changes in design or changes in approach.  The data warehouse development in this case was 30% ahead of schedule, a result that is also not unusual for prototype-and-iterate projects undertaken with committed users.

NZ Holidays Calculator 27 Jun ’08

I recently came across a useful Excel macro that calculates NZ holidays - I have adapted it to generate SQL to update our date dimension.  Feel free to contact us on support if you would like a copy.

Impact Analysis: What can fizzle while applying changes on the database and how to sustain a healthy, evolving data warehouse? 10 Jun ’08

Every action in the information database and warehouse in particular has a consequence. Database developers, business analysts and DBAs  feel agonizing pain when an organization makes changes to their data warehouse applications without having complete insight into the true impact and cost that a modification will have.

So, what is an impact analysis?  It is an identification of the consequences of change on an object to its related objects. Impact analysis, in case of a data warehouse, consists of:

• understanding of the processes performed within a data warehouse as a whole and its components;
• comprehension of interdependencies between processes and objects;
• assessing the effect the change brings to the data warehouse structure;
• developing recovery procedures in case of  a failed modification.

WhereScape RED provides a variety of needed mechanisms to identify a data warehouse’s weak spot when it comes to a change.

In the development process we have put our best effort into creating a perfect addition to our data warehouse. Now it’s time to deploy our changes to a higher environment. But how can we do so without crashing things that we had developed previously? How do we insure that the new modification will not destroy the integrity of the data warehouse instead of improving it? With WhereScape RED it’s easy.

Here are a few recommendations on how to avoid oopses and ouches:

1. Structural Impact Analysis.  i.e.  what objects have been changed.
Click “Tools” >  Click “Search for string…” > Fill out dialog box for string (e.g. either enter the procedure name being modified or the column being changed) > examine “Object search results” screen for all objects that contain this string.

For the structural-impact analysis you can use the Browser screen for “DataWarehouse” connection.
Click “Invoke SQL Admin” > execute something like this (SQL Server):

select distinct(a.name) from sys.procedures a inner join sys.syscomments b on a.object_id=b.id
where charindex(‘stored procedure or column’,b.text)<>0;
– Check load where clause too
select distinct(a.lt_table_name)from ws_load_tab a
where charindex(‘stored procedure or column’,a.lt_where_clause)<>0;

2. Evaluation of the Impact. i.e. what possible impact the data warehouse’s alteration renders on various data warehouse objects. WhereScape uses an array of pre-defined reports. Click the “Reports” tab from the main Browser menu. The list of available reports expands from release to release. There are about two dozen reports that are directly related to Impact Analysis. Some of them are:
•  “Object modification dates (without indexes)” with date created and date modified, same for “Index modification dates”;
• “Modified procedures” i.e. procedures that have been modified since creation;
• “Table objects that have no associated procedure or script” and “Procedures that are not related to a table”;
• “All objects that have been Refreshed or Imported”;
• “Differences between current and selected repository…”;
• “Query data warehouse objects… where custom validations scripts can be run”;
• “Track back on specified column usage…” and many others.

3. Impact during the Promotion from one environment to the other. e.g.  from DEV to TEST,  when managing such deployments using the “Build Application tables…” built-in utility. While the developer plunges through the wizard-like dialog boxes, WhereScape RED continuously warns him/her of the possible implications of this activity. E.g. if the particular object is to be overwritten, warning message pops up to alert the developer on the possible impact on the existing object.

4. Organizational Documentation Update. The considerable problem for a conventional data warehousing project is that database changes within data warehouse are not reflected in documentation: nobody has time to update them. WhereScape RED automatically generates a new documentation set after each iteration of data warehousing development.

5. Versioning of Metadata Objects. Several kinds of version management are utilized in WhereScape RED. For example, choosing the “Automated Version Creation” option from the Tools menu will thwart a developer from losing valuable DDL scripts, procedural code and other metadata and database information.

6. Standardization and Best Practice Usage.    WhereScape RED uses a state-of-the-art methodology in building warehousing prototypes. Following the developer’s guidelines delineated by the WhereScape RED methodology, wizards and the automatic generation of database objects, in combination with an enterprise standard “look and feel” dramatically decreases the chances of glitches in the data warehouse processing.

Loading Data Using RED 5 Jun ’08

The are several different ways to load data from files and source systems into the data warehouse load table using WhereScape RED.  Each load table type has different attributes, options and performance advantages in different situations.

RED automates loading data from files by either parsing files into columns as they’re loaded or by loading files into single column tables.  The following kinds of flat files:

  • fixed width
  • delimited
  • XML

The default is to have RED manage the code for loading file.  This is called a file load.  You may decide to generated a script and manage this yourself.  This is called a script load.

When load fixed width and delimited files, RED uses the load utilities in the target data warehouse database environment.  For SQL Server this is BULK insert.  In an Oracle data warehouse, SQL*Loader is used.  Teradata has several different loaders and RED can utilize fastload, multiload and TPT.  DB2 uses the LOAD statement.

All options available for these loaders are able to be used by RED.

Two options are available for XML.  The default is to allow RED to read and load the XML file itself.  Another option available for SQL Server data warehouses.  This involves using the XML OLEDB provider for SQL Server via a Script load.

RED also automates extracting and loading data from other databases into the data warehouse.  Again, there are several approaches available:

  • ODBC loads
  • Native ODBC loads
  • Database Link loads via a database link (Oracle warehouses only)
  • Database Link loads via a linked server (SQL Server warehouses only)
  • Externally Loaded load tables

For large volume loads from another database into the data warehouse, the fastest option is almost always the Native ODBC load.  The exceptions for this rule are:

  • In an Oracle data warehouse loading data from another oracle database, a database link load may be faster if one of the following is the case:
    • the two databases are connected via a really fast network
    • the two databases on the same physical server

For smaller volumes, you should always use database link loads for SQL Server->SQL Server and Oracle->Oracle load tables.  Similarly, for small volumes, loading data from a different kind of database, it may be better to use ODBC loads rather than Native ODBC loads.

The final option RED provides is Externally loaded load tables.  These provide integration with existing ETL or EAI infrastructure.  RED knows about the load table in its metadata and can build downstream objects from the load table.  RED does not load the load table.  It waits for the other system to load it.  Job integration is achieved using a combination of RED and the other products APIs.

Refer to the WhereScape RED User Guide for full information on all of the load types discussed.