Archive: November ’07

Gold 30 Nov ’07

WhereScape has been awarded Microsoft Gold Certified Partner status this week.  Gold Certification for us is based on WhereScape RED certification.  We have passed the “Verified for Windows Server 2003″ and “Certified for Windows Vista” tests as well as Plaform Tests for Windows Server and for SQL Server 2005.  We thought that gave us enough points, but when they were compiled it turned out we were 10 points short.  With customer references providing 2 points each, we asked 5 of our customers to provide references  - a big thanks to the customers who responded - we had the points we needed within a day (with the first reference coming back within 10 minutes).  We hope to use Gold Certification to make sure we provide the best support we can to our SQL Server customers - and are already looking at Windows Server 2008 and SQL Server 2008.

Sending Emails from a Job in SQL Server 2005 27 Nov ’07

WhereScape RED provides the ability to call a commandline program on successful completion or failure of a job.  The most common use of this command is to send email to system administrators and operations staff (who can do something about it).  In order to set this up we either use a command line email utility (like BLAT) or more commonly in SQL Server 2005 a batch program that sends an email from the database.

SQL Server 2005 introduced the ability to send emails via an SMTP server (SP_SEND_DBMAIL) rather than the MAPI client that SQL Server 2000 used (XP_SENDMAIL).  This means that it is easier and more robust to set up. 

To use database mail in this fashion:

1. Get a database administrator to  set up a mail profile on the server in SQL Management Studio

2. Put the attached SendMail.Bat program in a share that the scheduler can access (remembering that the scheduler runs as a windows account on windows).

3. Create the stored procedure ws_job_email from attached file in RED.

4. Edit your WhereScape RED job and enter the following into the OKay Command (edit according to bat location, email address and email subject line):

d:\temp\sendmail.bat localhost WSLWarehouse dbarrett@wherescape.com Process_Data_Mart_Success $JOB_SEQ$ 1

5. Similarly for the Failure command enter:

d:\temp\sendmail.bat localhost WSLWarehouse dbarrett@wherescape.com Process_Data_Mart_Failure $JOB_SEQ$ 0

WhereScape Green? 24 Nov ’07

One of WhereScape’s customers, ARTA (Auckland Regional Transport Authority), recently got a write up in ComputerWorld about their tender for an integrated ticketing system.  Intelligent public transport systems contribute to the concept of sustainable mobility - the energy consumption of buses (which are relied on by 80% of public transport users) is one third that of a car per passenger/km .  We look forward to following ARTA’s progress.

XML Data in the data warehouse 22 Nov ’07

Frequently we have to import data that is given to us in the form of XML documents.  WhereScape RED supports importing XML data directly into a load table provided that the XML file is well structured (it can be opened in IE) and is accompanied by an XML schema (XSD) file that describes the structure of the data - this schema file allows WhereScape RED to automatically create the load table structure. 

If the XML does not have a corresponding XSD schema file, or it is of another standard, or a file contains XML fragments then WhereScape RED can leverage SQL Server’s and Oracle’s XML data type support.  Once XML data is loaded into an XML data type then the built-in extensions to SQL can be used to query the XML data. 

Although XML data can be kept in an XML data type, queried, and  even indexed it is usually much easier to shred the XML into relational tables as in most people’s opinion XML is hard to read,verbose and requires XML functions to query.  XML documents are built up of elements and attributes.  Elements can be nested (like multi-value fields).  In SQL  Server you can use column transforms using the value function to access element attributes eg

 XMLData.value(’(row/c2)[1]‘,’VARCHAR(255)’)

This example accesses the attribute called c2 of the row element and converts it to a varchar datatype.

 To access nested elements the nodes function can be used in the FROM clause, for example:

FROM load_t24
CROSS APPLY load_t24.XMLData.nodes(’row/c90′) as T(C)

This example splits the nested elements of the c90 element into a collection of rows.

Combinations of the value, query, and nodes functions can be used in column transformations and FROM clauses in WhereScape RED to shred any XML structures into database tables without resorting to custom code.

What, when, why and WhereScape 22 Nov ’07

We got a nice write up in the NZ Herald on Friday.  Journalists have a hard enough time explaining what data warehousing is to tech readers, but this piece was in the business rather than the tech section.   Wikipedia defines a data warehouse as:

A data warehouse is the main repository of an organization’s historical data, its corporate memory. It contains the raw material for management’s decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems.

The classic definition I learned is Bill Inmon’s:

Subject-oriented 
The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
Time-variant 
The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
Non-volatile 
Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting; and
Integrated 
The database contains data from most or all of an organization’s operational applications, and that this data is made consistent.

For non technical audiences I talk about how business decisions are much easier and less risky if you have all the relevant information you need at your fingertips at the time you make the decision and in a format that you can actually use it.  I then explain that the place you store that information is called a data warehouse, and that is what our software builds.

I do use the plumbing analogy used in the article - but I don’t remember ever using the line about locating the dishwasher in the lounge.  But hey - if that is how a user wants their data presented who are we to say no.

Fast Movers, Clever Disruptors 13 Nov ’07

WhereScape Software has been picked by CIO Magazine as a rising star.  We would like to thank them for recognizing us, but we may have to change our name if they insist on listing in alphabetical order.   It is because of the hard work of our customers and partners, who share our vision that the best data warehouses in the world will be built using WhereScape RED, that we have received this recognition as well as being selected as a Red Herring 100 winner.  Thanks.

The complete list of Rising Stars is here, and features companies fron the US, China, New Zealand and Australia:

Rising Stars 25
This section of Strategic 100 is an early flag flyer of companies that are already impressing savvy CIOs, who locally and internationally keep a tab on cutting-edge companies set to make a difference in ICT.
1. Agitar
2. Attensity
3. CargoWise
4. Cognizant
5. FX Networks
6. GFG Group
7. Holocentric
8. Huawei Technologies
9. Infohrm Group
10. iVistra Technology
11. K2
12. M-Com
13. Netcat
14. NetSuite
15. Orbeon
16. Orchestria
17. Pentaho
18. Sonar6
19. TrustDefender
20. Unica
21. Univa UD
22. Vocera Mobile
23. Wherescape
24. Xterprise
25. ZettaCore

Master Data Management and Data Quality 5 Nov ’07

One of our clients is researching a new initiative where key master data is going to be sourced from multiple systems, used in the data warehouse and also used by other systems.  The ETL tool vendors all seem to be releasing data quality and master data management tools.  Given that WhereScape is a data warehouse software company I think they were expecting us to make a land grab and make a “do it all in the data warehouse / do it all in WhereScape RED” type recomendation.

One of the things we provided them was a sidebar from our new prototype and iterate while paper.  It is more a position on data quality, but does also cover my view on master data management.  I have included the extract (slightly edited) here:

Data Quality And Data Noise

Today, with the rise of master data management and other corporate data quality disciplines, the mentality that data quality is a problem to be solved as part of a data warehousing project is in decline, and the integration of data quality facilities into ETL processes is viewed - as we believe it should be - with increasing amounts of skepticism.Nevertheless, during prototype and iterate sessions data warehousing practitioners frequently uncover dirty data - often where data is presumed to be neat and clean - and still struggle with the normal, but dangerous, tendency to want to solve data quality problems as part of the data warehousing project.Experienced WhereScape practitioners report that a few simple rules are usually sufficient to keep data warehousing prototypes from veering into master data management or data governance areas:

                      Data quality problems should be solved at the source: the poorly-designed transactional system that introduces the data noise, or the broken business process that the transactional system in question supports.

                      Data warehouses should always reconcile to their transactional sources exactly, to promote user confidence and ensure that there is no decision gap between user communities using the data warehouse and those using operational reports from the transactional system (which persist, despite efforts to the contrary).

                      Cleansing and normalization of business dimensions is a master data management problem, not a data warehousing problem. Data warehouses that include dimensional values that are “regularized” or “cleansed” may be functioning as de facto master data management repositories, and that is not their role. If master data management discipline is required, projects to implement master data management should be undertaken separately from the data warehousing project(s) at hand.

                      Data warehouses and marts are downstream from master data management repositories, not upstream from them. That is, warehouses and marts make use of canonical dimensions in MDM repositories; they do not supply those canonical dimensions.

                      Users are the best judge of acceptable levels of data noise. Except in those relatively rare situations in which a data warehousing or data marting project covers a subject area or data set with which the end-user constituency has no prior familiarity, end-users are already compensating for dirty data in their analyses, and will continue to do so without much complaint in most cases.

Clarifying project objectives with project sponsors is of course critical. In the view of experienced WhereScape practitioners, data warehousing projects are better used to highlight issues with data quality than they are to solve those issues, and knowing whether project sponsors intend to spend project budget to resolve transactional source system hygiene - as well as knowing whether end-users are prepared to live with ‘noisy data’ while those source system problems are fixed -  is critically important to overall project success.

Interstates and inner city roads 1 Nov ’07

We applaud the ETL vendors - they have finally recognized that databases such as Teradata are actually pretty good at processing data.  Unfortunately the ETL vendors are still missing a critical point.  Source systems are relatively constant - how often does an airline change its reservation system or a bank change its core banking platform? - but user requirements are not.  Rapid and discontinuous business growth absolutely mandates a flexible data warehouse that can be rolled out fast and easily enhanced.  As you move away from the source systems (the sweet spot of extract, transform and load tools) and into the data warehouse, especially the presentation layer, the rules change.  Users want functionality now and changes made this afternoon.  Keeping the data warehouse relevant is all about speed of delivery and the ability to rapidly make change.  Heavy lifting tools are less relevant in this world.  The difference is the same as interstate highways and inner city roads.  Interstates are (relatively) stable and (hopefully) well planned. Repairs and maintenance are carried out after hours by teams of professionals to minimize disruptions.  Conversely inner city roads seem to be in a constant state of upheaval.  They are always being dug up, traffic is held up, and delays constant.  A different set of tools are required to handle this volatility.  As data moves closer to the user a rapid development becomes more important, and the capabilities of data movement tools less relevant.  So, while ETL tool vendors have acknowledged that processing in the database is the right thing to do, they are still no closer to addressing the problems solved by rapid data warehouse development environments.