Archive: Technology

Is there a new Quantum Leap in Data Warehousing and Business Intelligence on the Horizon? 19 Apr ’13

A few years ago I saw a woman in the house opposite doing some strange movements. I thought she must be on drugs or seriously ill, until a colleague of mine explained she is playing Wii. I am not a gamer, but I could not imagine that this kind of movement control is a huge improvement in controlling actions on your computer outside of the gaming scene. But now a new controller is coming on the market that has the potential to become a complete game changer in the human interface device market.  Leap Motion™ [1] is a completely new controller device that allows you to control your computer by doing gestures in the air.  You put a small box on the desk and you can use both hands to control your computer in 3 dimensions without touching anything. According to Leap Motion™ the device is “up to 200 times more sensitive than existing motion-control technology” and recognizes your individual fingers [2]. Leap Motion™ announced the start of sale for their product to be in May.  In contrast to Microsoft, Leap Motion™ welcomes hacks of their device and has given prototypes to the developer community that resulted in impressive basic applications as you can see on youtube.com.

Although designed to change the way of web browsing, designing and gaming, this new device could also revolutionize data warehouse and business intelligence applications. Research increasingly emphasizes the importance of interaction for the understanding and analysis of patterns especially in complex data, e.g. [3, 4, 5]. Natural movements remove the barriers between users and abstract data. This change can not only be helpful to create more natural representations of data which, for instance, might help to interpret Big Data in a completely new manner, but it would also help to create the collection of the data in a completely new and more efficient way. Imagine an analytical analysis tool where you can drill to detail, slice and dice with both hands. Imagine building your data warehouse by dragging and dropping the tables in WhereScape RED with both hands. Imagine analysing unknown source systems by flicking with your fingers through tables and data in WhereScape 3D. Agile development and agility will get a new meaning with this new device.

It seems we can just glimpse the dawn of a new revolution in data warehousing and we at WhereScape are looking forward to bringing this future into the present.

 

 

References:

[1] https://www.leapmotion.com/

[2] https://www.leapmotion.com/product

[3] Bongshin Lee, Petra Isenberg, Nathalie Henry Riche, and Sheelagh Carpendale, “Beyond Mouse and Keyboard: Expanding Design, Considerations for Information Visualization Interactions”, http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/TVCG2012-NaturalInfoVis.pdf

[4] Ji Soo Yi ; Youn ah Kang ; Stasko, J.T. ; Jacko, J.A., “Toward a Deeper Understanding of the Role of Interaction in Information Visualization”, Visualization and Computer Graphics, IEEE Transactions on 13 (6), pp 1224-1231, 2007

[5] Correa, C. ; Silver, D. ; Chen, M.,” Illustrative Deformation for Data Exploration”, Visualization and Computer Graphics, IEEE Transactions on 13 (6), pp 1320-1327, 2007

Novopay: A Classic IT System Failure 3 Apr ’13

If you say the words Novopay in New Zealand everyone knows what you are talking about – a classic failure of an IT system.  Novopay is an end to end payroll service provided to the New Zealand Ministry of Education by an Australian firm called Talent2. It is the most complex payroll in the country, paying 110,000 people employed in 2500 schools and covers 15 collective agreements, but has been beset by problems since going live in August 2012.

Being a government project there is a fascinating amount of detail available if you want to witness a train wreck in slow motion.  The risk, issues and monitoring reports are all available online.

To give some idea of the scale and implications of continued problems the New Zealand Government has created the position of Minister Responsible for Novapay. They have given this (potential hospital pass) to their trusted fix-up person, Hon. Steven Joyce.

The disaster has created a feeding frenzy for the consultants – Deloitte have undertaken a technical review, PwC have been engaged to report on individual pay periods, and a new dedicated Education Sector Payroll Services business unit (headed by former Inland Revenue Deputy Commissioner and KiwiSaver Programme Director Cathy Magiannis) has been established in the Ministry of Education to co-ordinate the Remediation Plan.

I can’t offer any specific insight into payroll systems and have no experience in the area (although I do actually recall visiting Concept Systems the original developers of the Alesco system at the heart of Novopay many many years ago).   However I have had involvement with hundreds of IT projects over the years.

To me this looks like a failure in the procurement process and a failure of the project approach.

This is, by New Zealand standards, a complex system.  The current tender system used by the New Zealand government and other large entities is the broken step in the process.  I can’t speak for this particular process, but I have seen and been involved in enough government Request for Information/Request for Proposal to know they don’t work.  There is no way to know what purchaser can specify the requirements accurately, and as can be seen clearly in this case, the idea of contracting out of risk does not work as a defence when you are on the 6 o’clock news. While I would love to see government procurement changed, it is not an area that I personally care to spend any time on.

Many aspects of the project failure  have relevance in a data warehousing, and even more so in the “big data” world. The Novopay project was beset by complexity and scope problems. This is the norm in the data world as we are never in control of the data we receive.

How do you mitigate this? At WhereScape we recommend late learning strategies – something the agile community also champions.  We recommend only going as far as basic structures until we have confidence in what we are doing.  As an example, if you are building a dimensional data warehouse we recommend building to first level fact tables initially, and then socializing the results.  Hold off on fancy aggregates, complex cubes and beautiful dashboards and reports as long as possible as the work can be wasted if we are delivering the wrong types of data.

This is even more important in a big data world where it can take far longer to get an idea of what is possible when you are interacting with large complex data sets as opposed to (relatively) well behaved source systems.

Projects are less risky when you identify as many risk factors as early as possible.  We designed WhereScape 3D to specifically help users answer the biggest questions around scope and approach – which in our world is to do with data.  Given it is a constraint in every project it needs to be examined and dealt with up front, not left as a surprise in each and every project we do.

Novopay knew that the payroll was complex, there was clearly a lot they did not know, and probably a lot they could not know until late in the project.  Wouldn’t it have been better to know that sooner?

The Data Warehouse Landscape 7 Mar ’13

diagram

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


The Situation

The advances in storage and processor power have been mirrored by an explosion in data. Big Data is changing everything. There are now analytical tools that give the power to use this data and there are more options than ever when deciding the best platform to run a data warehouse on. Companies have a desire to extract value from this information but it is not easy – there is no one product that is the best at everything.

Business Problem

  • Existing approaches to data warehouse design do not fit with the demands of a multi-platform environment.
  • Many enterprises current data analysis capability are inadequate or too rigid to handle 21st century data requirements.
  • A single platform will not be the answer in a customers data warehouse.
  • Managing this new environment will require a diverse group of people with specialised skills.
  • There is the potential for more silos within an organisation creating an increased risk that projects will be slow and unsuccessful.


The WhereScape View
The rules have changed and a new paradigm of a data warehouse “landscape” has emerged. Customers will want to use best of breed for each of their disparate requirements but will grapple with finding a cost effective way to plan, manage and build this environment in a timely manner. They are also going to want a holistic view across all of their data in order to truly understand their customers, products and services.

What will customers need in this landscape to successfully deliver data warehouse projects?

  • Ability to plan the project across a multi-platform environment
  • A single development environment
  • Single metadata repository
  • End-to-end impact analysis
  • Single scheduling environment
  • End-to-end documentation


The WhereScape Products

We have been building our products with exactly these problems in mind. Regardless of the environment we want to ensure customers the fastest time to value. We believe that the traditional problems of data warehouses taking too long, being too difficult to change and difficult to support will only be exacerbated by the increased demands the data explosion will cause.

Transmitting Export Files to Customer’s FTP Site Using Host Script 15 Jan ’13

WhereScape RED has an easy way of creating Export objects that are used to produce ASCII files from a single db table or view for a downstream feed. The solution below represents a very typical post-Export task: secure delivery of generated files to a customer’s FTP site with use of a RED Host script.

RED Host scripts are maintained within the WhereScape RED environment and can be scheduled to run in their host environments (either WINDOWS or UNIX). The sample below contains examples for several common practices such as FTP load using secure credentials, use of RED parameters, wildcard file load, and archiving.

You will need to add five RED parameters in order to execute the Host script as illustrated below:

OUTFilePath               = Directory where outbound files are created by the WhereScape RED extract;
ArchiveFilePath          = Directory where processed files are archived;
FileSpec                      = File spec of the files to FTP and archive;
FtpUserName             = FTP username;
FTPPwd                       = FTP password.
FTPPwd                       = FTP password.

The result will be something like this. Also ensure that the path parameters have an ending backslash.

image 1 table

 

The following environment variables will be set by the WhereScape RED Host script

WSL_SEQUENCE    = A unique sequence number for the scheduler;
WSL_WORKDIR       = The work directory defined in the connection;
WSL_SERVER          = The server defined in the connection;
WSL_DATABASE     = The database defined in the connection;
WSL_USER               = The DSS user defined in the connection;
WSL_PWD                 = The DSS password defined in the connection.

The sample connection might be as follows:

image 2 table

 

Below is a sample (working) script that can be copied and used by WhereScape customers.

REM **********************************************************
REM Script Name : ftprap
REM Description : Transmit export files to customer’s FTP site
REM Generated by : WhereScape RED, manually
REM Generated for: WhereScape Customer
REM Author : Raphael Klebanov
REM **********************************************************
@echo off
setlocal enabledelayedexpansion
setlocal enableextensions

::Setthe directory where multiple outgoing files are created by RED extract
SET SEND_FILE_PATH=$POUTFilePath$
::Set the file specification of the files to FTP and Archive
SET SEND_FILE_NAME=$PFileSpec$
::Set the path of where the files will be archived after being successfully ftp’d
SET ARCHIVE_PATH=$PArchiveFilePath$
::Set the FTP UserName
SET FTPUser=$PFtpUserName$
::Set The FTP Password
SET FTPPwd=$PFtpPwd$
::Set file count variable which tracks how many files we have processed
SET /A FileCount=0
::Set variables of date parts (yy, mm, dd, dow) to be used in the renaming of the file
for /f “tokens=1-4 delims=/ ” %%a in (‘date /t’) do (
set dow=%%a
set mm=%%b
set dd=%%c
set yy=%%d
)
::Remaining variables declarations
SET SEND_TABLE=%send_file_path%%send_file_name%
SET FILECTL=%WSL_WORKDIR%\wsl%WSL_SEQUENCE%.ctl
SET FILELOG=%WSL_WORKDIR%\wsl%WSL_SEQUENCE%.log
SET FILEAUD=%WSL_WORKDIR%\wsl%WSL_SEQUENCE%.aud

REM **********************************************************
REM ************* C H E C K F O R F I L E S **************
REM **********************************************************
Dir %SEND_TABLE% /b | %windir%\system32\find /v “RandomString64″ >nul && (goto FILES_FOUND) || (set _empty=Empty)
echo -1
echo No Files Found For Upload In Location: %SEND_FILE_PATH%%send_file_name%
exit

:FILES_FOUND
REM **********************************************************
REM ************* F T P T H E D A T A ********************
REM **********************************************************
echo load of %SEND_TABLE% > %FILEAUD%
SET /A RESULT_CODE=1
SET RESULT_MSG=Send Completed

:LABEL_LOAD
SET FILE_NAME=NO_MORE_FILES
for %%A in (%SEND_TABLE%) do if “!FILE_NAME!”==”NO_MORE_FILES” SET FILE_NAME=%%A && SET FILENAMEONLY=%%~nA
if %FILE_NAME%==NO_MORE_FILES goto LABEL_EXIT

::rename file before FTP
SET NewFileName=%FILENAMEONLY%_%yy%%mm%%dd%.CSV
REN %FILE_NAME% %NewFileName% >>%FILEAUD% 2>&1

::Create FTP Script File ftpcmd.scr
echo Open ftp.WhereScape.com > c:\ftpcmd.scr
echo %FTPUser%>> c:\ftpcmd.scr
echo %FTPPwd%>> c:\ftpcmd.scr
echo cd import>> c:\ftpcmd.scr
echo cd TabDelimitedCSVFiles >>c:\ftpcmd.scr
echo mput %SEND_FILE_PATH%%NewFileName%>>c:\ftpcmd.scr
echo bye>>c:\ftpcmd.scr

::Execute the FTP script
%windir%\system32\ftp.exe -i -v -s:c:\ftpcmd.scr >> %FILEAUD% 2>&1

IF %errorlevel% EQU 0 GOTO LABEL_OKAY
:LABEL_FAIL
echo -2
echo Send failed with return status %ERRORLEVEL% See error log for details
rem type %FILEAUD% >&2
rem type %FILELOG% >&2

exit

:LABEL_OKAY
echo FTP Successful For File %FILE_NAME% >> %FILEAUD%

REM **********************************************************
REM ************* A R C H I V E ******************************
REM **********************************************************
::Archive and rename the file

:LABEL_RENAME
MOVE %SEND_FILE_PATH%%NewFileName% %Archive_Path%%NewFileName% >>%FILEAUD% 2>&1

IF %errorlevel% EQU 0 GOTO MOVE_OKAY
:MOVE_FAIL
echo -2
echo %FILE_NAME% Was Unable To Be Moved To %Archive_Path%%FILENAMEONLY%_%yy%%mm%%dd%.CSV >>%FILEAUD%
exit

:MOVE_OKAY
echo %FILE_NAME% Successfully Moved To %Archive_Path%%FILENAMEONLY%_%yy%%mm%%dd%.CSV >>%FILEAUD%

::Since successful, add 1 to the files processed counter
Set /A FileCount+=1

REM **********************************************************
REM ************* N E X T F I L E **************************
REM **********************************************************
REM WARNING: Do not loop back unless the file name contains a wildcard
REM WARNING: You must rename the file that has been loaded if
REM looping back otherwise the script will loop for
REM ever loading the same file.
:LABEL_NEXTFILE
goto LABEL_LOAD

REM **********************************************************
REM ************* E X I T ************************************
REM **********************************************************
:LABEL_EXIT
echo %Result_Code%
echo Number Of Files Processed: %FileCount% %FilePath%
exit

Getting the most out of Ragged Hierarchies in RED 5 Oct ’12

Recursive parent-child dimensions are painful to query using SQL.  A common example of a recursive parent-child dimension is a ragged organization hierarchy containing Manager/Employee relationships.  You usually have to flatten them out to make them work in a star schema.  That is, instead of having two columns in the dimension for child id and parent id, you need to create n columns, one for each level of the parent-child relationship.  Even this is less than optional for users.

Luckily, Analysis Services OLAP Cubes make this easy.

Consider the classic situation of an organization structure.  There are four levels in the organization:

  • Mary is the CEO and has two direct reports: Bob the Sales manager and Charlie the Engineering manager
  • Bob has two salespeople: Debbie and Elsa
  • Charlie has two engineers: Fred and Gary
  • Fred has 2 helpers, Harry and Ina

 

To represent this data in a dimension we have two options.  The first is a flattened structure so that every level in the organization is visible at each row.  For example:

This structure is good for querying via SQL.  It’s useful when the number of levels in the hierarchy is small, known, and reasonably static.  When the number of levels is large (i.e. more than 10) or when the number of levels is unknown or can change this structure becomes difficult to populate and query.  There’s also the question of what to do with the blanks in the unpopulated cells at the lower levels.

An alternative approach is to represent the data in a recursive parent-child structure.  Each row has only its parent listed.  For example:

This structure is very flexible however it is very difficult to write SQL queries across multiple levels.  The significant advantage of this structure is that OLAP cubes can flatten the hierarchy on the fly when they are processed.  This means that a hierarchy with changing levels or a large number of levels can be catered for with relative ease.  It’s also relatively straightforward to write an MDX query to browse the hierarchy in a cube.  Most query tools (for example Excel) will do this dynamically without any code being written.

Here’s an example:

A simple warehouse built from Adventure Works contains a sales orders star schema.  The fact table has five dimensions – customer, order date, ship date, product and salesperson.  Both the ship date and order date dimensions are role-playing dimension views on dim_date.  Similarly, the salesperson dimension is a role-playing dimension view of the employee dimension table.  The employee dimension stores the data in the parent-child structure described above – every employee record had its manager business key recorded.

The star schema looks like this:

The objective is to produce a Microsoft Analysis Services OLAP cube of the star schema with the salesperson dimension being an OLAP parent-child dimension.

Microsoft Analysis Services expects an OLAP parent child dimension to have the artificial key for the child AND the artificial key for the parent on the same record.  By default, WhereScape RED only creates the artificial key for the child when it builds a standard dimension.

To create an OLAP parent-child dimension, we need to add the artificial key for the parent (manager) to the SalesPerson view.  Make sure this is NOT an artificial key in RED and has the end user display check box turned on.  Set the source table to be blank and in the transformation of the parent key put the source table (an alias of dim_Employee) and source column.  Using the alias of dim_Employee facilitates the join of the dimension to itself to provide the parent artificial key.  The column list looks like this:

Set a where clause on the SalesPerson dimension view as follows:

LEFT JOIN dim_Employee dim_Employee1
ON dim_Employee.ManagerID = dim_Employee1.EmployeeID

Next create an OLAP cube called olap_SalesOrder.  The wizard in RED also created the OLAP dimensions automatically.  You need to delete any hierarchy that gets created – it’s not necessary for this exercise.  Edit the attribute list for the dimension so that it has just these two keys:

Create an attribute relationship between the two keys so that Analysis Services knows how to relate the keys to each other.  To do this right-click on the OLAP dimension for SalesPerson and select “Add Attribute Relationships”.  The details should be:

Now change the properties of the attributes so that the sales person’s name is displayed, not their key (which would be meaningless to any user).  To do this, display the attributes again and edit the properties of the child (sales person) key.  Ensure the usage was “Key” and change the name column to the descriptive column in the dimension.

Define the relationship as a parent-child relationship.  To do this edit the properties of the parent (manager) key attribute and change the usage to “Parent”.  Leave the name column the same as the key column.  It looks like this:

That’s it.

Now just create the cube and update or process it as you usually would with any other cube.

 

Based on a white paper by Michael Tantrum, WhereScape USA Inc.

 

 

Thoughts on the NZ Big Data Conference 3 Oct ’12

I wasn’t going to attend the New Zealand Big Data Conference. I will admit to a bit of cynicism about events organised by the professional conference companies. With no peer review of presentations the variation in quality can be huge, and they can be dominated by sponsor input. A couple of things fell into place this time – I was offered a free ticket (thanks Phil – you rock), and a half day meeting that was in my diary for the Friday was postponed meaning I had no excuse not to attend. At the very least I thought it would be good to catch up with the local business intelligence community, which I have not spent enough time with recently.

The big sponsors were SAS and EMC (who got speaking slots) along with local consulting company, and friend of WhereScape, Theta. Lukas Svoboda was a good conference chair, and Conferenz did a very professional job of running the event (which is the advantage of the professional conference companies).

The conference opening was a five minute advertisement by SAS – and I mean an actual advertisement. Perhaps more suited to a SAS conference than an industry conference. Peter Kokinaskos from SAS was however a good speaker, and being first up took on the unenviable task of defining big data, complete with the three Vs.

Katrine Evans, Assistant Commissioner, Office of the Privacy Commissioner gave a thought provoking talk on privacy and big data. I particularly liked her comment that there is a person behind every bit of personal data. Obvious when she said it, but as data people we do sometimes need reminding of this.

As you would expect from the pros, Clive Gold from EMC was another very good speaker. Check out a project EMC are sponsoring – The Human Face of Big Data (http://humanfaceofbigdata.com/). From the website:

Download the free mobile app and learn about yourself, how you compare to others, and what your phone can tell you about your life. Compare answers about yourself, your family, trust, sleep, sex, dating, and dreams with millions of others around the world. Find your Data Doppelganger. Map your daily footprint, share what brings you luck, and get a glimpse into the one thing people want to experience during their lifetime. We’ll donate $1 per download to charity: water for the first 50,000 downloads, as a way to say thank you for participating in The Human Face of Big Data. charity: water uses 100% of public donations to directly fund clean water projects.

The afternoon sessions were the high points of the conference. Kiwibank’s Chris La Grange was on two strikes as he took to the stage (the post lunch slot and a talk on data governance policies) but managed to hit it out of the park.
Chris’s comments on how data governance does not have an end date, and that it was a process not a project were the basis of the most picked up tweet of the day. His comments resonated with three data Twitter heavyweights @YvesMulkers (8700 followers – who has an auto retweet set up on #bigdata which is one of the reasons his exposure numbers are so high), @Merv (Gartner’s Merv Adrian with 7000 followers) and @datachick (Karen Lopez with 6300 followers).
As an aside, if you look at the twitter analysis of the conference hashtag (#nzbigdata) you would have to say we were pretty average with our twitter output from a big data conference!

Simon Pohlen, Head of Technology and New Capabilities, Loyalty New Zealand, gave one of the best Big Data case studies I have heard. While Loyalty New Zealand (think Fly Buys) is very strong in New Zealand, with only 4.4 million people in the country the data volumes are not “big”. The techniques they are applying are, however, very much big data.

I enjoyed Phillip Higgins’ discussion and demonstrations of big data. I suspect some of the more marketing oriented people in the room were wondering what hit them, but it was a well put together talk and a nice counterpart to the more theoretical discussions.

Overall I would rate the event as useful, and I suspect Confernz will run it again next year. WhereScape is also running a Big Data event in New Zealand, but ours will be quite different. We are bringing down Marc Demarest, who (along with Mark Madsen) has keynoted the last two TDWI Big Data World Conferences in San Diego. Marc is highly knowledgeable and an engaging speaker, and we are going to try and separate the big data hype from the reality. We will be running the event in Auckland on October 30th (see the WhereScape website for more information), and may also be running one in Nelson. Stay tuned for more on that one.

Data warehousing, big data and the Higgs particle 1 Oct ’12

Michael Whitehead and Dr. Ralph Schuster

This is a blog I have been wanting to write for a while, but it has only (finally) come about because Ralph has joined the WhereScape lunchtime runs. That’s Dr Ralph Schuster (one of the leading lights in the WhereScape development team) with a doctorate in physics – handy when you want to blog about the Higgs boson. Even more handy when he actually writes most of the blog for you.

This blog is really about the value of a data warehouse, but in keeping with current accepted wisdom, it is about Big Data and the value of a data warehouse. It starts with the Large Hadron Collider (LHC) – the holy grail of big data. The LHC generates more than 20 PB of data per year…and that’s after keeping only 1 per cent of the recorded data.

But this is not the only link between data warehouses (and big data) and the LHC. There is a deeper connection on a more conceptual level. As every educated nerd (or human) knows, one of the main purposes of the LHC is to find the Higgs particle, the last missing building block in standard particle physics, which has been hiding from experimental detection for almost 50 years. The Higgs mechanism is the inevitable ingredient of the standard model, the most successful theory of the fundamentals of matter and forces. The incorporation of the Higgs mechanism is necessary from the very beginning of the formulation of the fundamental equations to give the electroweak bosons mass. The particles which mediate the weak nuclear force responsible for beta decay, the form of radioactivity which, for example, is the source of radiocarbon dating allowing us to estimate the age of archaeological organic objects. Though the indirect effects of the Higgs mechanism are ubiquitous in physics, chemistry, biology and even archaeology the direct evidence is extremely hard to spot and the direct evidence of the existence of a Higgs particle is hardly measurable.

This is where it is like a good data warehouse. The benefits of a data warehouse can be found in almost every business function from finance to fraud detection to human resources, and WhereScape has customers benefiting from data warehouse based analysis from a multitude of verticals including banking, telecommunications, insurance, healthcare, not-for-profits and education institutions. But the accurate measurement of the direct benefits is almost as hard as finding the Higgs particle.

We all know the benefits are there, but we cannot look for them in the data warehouse itself. We need to look for the indirect benefits. And this is where it gets tricky.

“It is the singularly unfair peculiarity of war that the credit of success is claimed by all, while a disaster is attributed to one alone”.

When we look for indirect benefits there are many people who can, quite correctly, claim to be responsible for success, and the business intelligence team is just one. We can provide a great data warehouse, but it needs to be coupled with great analysis, a switched on team who make the right decisions, and an entire organization which acts on it. Each of these groups has legitimate claims for credit.

At WhereScape we recommend looking for opportunities for “assists” that sports concept of the last pass or critical play. There is no call to quantify the exact value of the assist. You don’t allocate assist percentage points to individual team members for a try or a touchdown – you either assisted or you didn’t.

That there is value in fact based decision making is indisputable. Like the Higgs particle, it needs to be there or the basic concept of business decision making fails. Unlike the physicists, we don’t need to spend fifty years looking for it – as long as we don’t look for the glory, and are happy with just being part of the team. So when it comes to the value of the data warehouse, look for the value in overall results. Did we help the organization do well (or less badly)?

And always look for opportunities to claim an assist.

References:
http://openlab.web.cern.ch/sites/openlab.web.cern.ch/files/press-coverage/Y/M/TechRepublic-CERN_Where_the_Big_Bang_meets_big_data_0.pdf

http://en.wikipedia.org/wiki/Weak_interaction

http://en.wikipedia.org/wiki/Radiocarbon_dating

The Works of Tacitus, tr. by Alfred John Church and William Jackson Brodribb, [1864-1877], http://www.sacred-texts.com/cla/tac/ag01020.htm

Agile Data Vault Building using WhereScape 3D and WhereScape RED 30 Aug ’12

The concept of data warehousing dates back to the late 1980s, but data warehouse elements in some form have been around even longer. Many early fiascos led practitioners to try and de-risk projects by using a requirements-driven, inflexible, sequential, and interdependent process, adapting techniques from the Software Development Life Cycle (SDLC) methodology. As a consequence, many of the current data warehouse tools and techniques are designed around this approach, and consulting firms thrive on these resource-hungry and lengthy projects.

WhereScape supports a different approach. Our tools and methods adapt Agile techniques to mitigate the risk of data warehouse projects through collaboration with the users.

The solution we offer in this article will allow you and your organization to jumpstart your data warehouse projects, utilizing an Agile methodology and providing results for your decision-makers in hours or days, instead of weeks or months.

Problems associated with traditional data warehouse projects

The data warehouse project painfully grapples with various pitfalls, including:

  • Inaccurate or outdated business requirements;
  • Poor development productivity;
  • High Total Cost of Ownership (TCO) and, especially, cost of resources;
  • Slow development cycles;
  • Unacceptable data quality;
  • Dismal documentation.

Most data warehouses utilize the SDLC serial development approach. The traditional approach dictates that structured processes must be followed to ensure success. But this can result in serious issues with a data warehousing project:

  • Resource-intensity;
  • Rigidity (i.e. difficulty to change);
  • Failure to respond quickly to ever-changing business, and, as a result,

Data warehousing is experiencing a pattern change

We need a faster and more effective approach to address the current issues with data warehouse projects.

Agile data warehousing is a relatively new approach that treats the entire data warehousing lifecycle as one process, breaking the traditional development methods. This technology includes:

  • Source data discovery and analysis;
  • Data warehouse schema design for both Foundation and Presentation layers;
  • Data transformation and integration;
  • Metadata repository;
  • Designing data warehouse objects and procedural code;
  • Full self-documentation;
  • Scheduling the loading routines;
  • Data lineage, impact analysis, data traceability, and auditability.

The principles of Agile development are fully applicable to data warehousing projects. The main principles are:

  • Frequent collaboration with business users;
  • Constant delivery of the working product (i.e. datamart);
  • Acceptance of changes, even in later stages of the project;
  • Self-organizing, self-supporting teams;
  • Complete documentation at delivery;
  • “Cycle of Cycles” approach;
  • Data warehousing at the speed of business changes;
  • Use in an Integrated Development Environment (IDE);
  • Test-driven development.

Agile Data Warehousing Project Stages


Stage I:
Using WhereScape 3D:

  • outline data warehouse requirements and guidelines at a high level during a project initiation meeting, bringing together business experts and developers;
  • pinpoint major business artifacts and their relationships for one business area;
  • draw the objects on a white board;
  • define grain, keys, and some attributes;
  • identify data sources for the chosen business area;
  • use WhereScape 3D to test the assumption: model the relational diagram of the future datamart, discover and profile source data;
  • document the findings using WhereScape 3D’s auto-documenting capability;
  • socialize the model with business community.


Stage II:
Using WhereScape RED:

  • retrofit datamart objects, designed using WhereScape 3D, into WhereScape RED;
  • build a presentation layer of the chosen business area (datamart) from existing sources;
  • populate the datamart with live production data from data sources;
  • provide means of displaying the data to business users such as OLAP cubes, PowerPivot, and the like;
  • socialize the data with the developers and the business community;
  • allow business users to “play” with the data;
  • iterate based on feedback from the business.


Stage III:
Using WhereScape RED:

  • create an enterprise data warehouse layer;
  • build a raw data vault based on the structure and data lineage of the datamart;
  • dimensions ? hubs + hub satellites; facts ? links + link satellites;
  • re-target source tables from feeding the datamart to feeding the data vault; point data vault tables to populate datamart tables;
  • schedule/run the whole process;
  • socialize the results with the business community;
  • adjust accordingly.


Stage IV:
Using WhereScape RED and 3D:

  • repeat the process with another business area;
  • as changes are required they can be added to the datamart in the development environment, then added to the data vault and deployed;
  • extend the data vault with business structures – where data needs to be integrated, joined or has common transformations across multiple business areas, then these business areas can be built into the data vault;
  • keep the business community engaged every step of the way;
  • document the data warehousing areas as they reach completion;
  • automate the process of building data warehouse according to Agile principles.


Deliverables

These deliverables are based on a standard three-day proof of concept, performed onsite using an experienced WhereScape consultant and WhereScape software.

  • ER diagrams of the datamart and data vault;
  • A datamart populated with (a subset of) production data;
  • A data vault designed and built based on the datamart;
  • Processes and workflows to populated the data vault and datamart;
  • Indexing based on a default indexing strategy;
  • Full documentation showing data lineage, technical and business metadata;
  • Workshops to demonstrate the datamart to subject matter experts and end user community;
  • Workshops to show the processing and code to technical folks.

 

WhereScape Information

WhereScape is a leading global provider of data warehouse development productivity software that help organizations build complex solutions that afford them with access to one of their most precious resources – data.

Our family of products is comprised of:
WhereScape 3D: a data warehouse planning and modeling tool. 3D enables you to expose and explore the risk areas of a data project up front, before expectations and budgets have been set.
WhereScape RED: an Agile data warehouse Integrated Development Environment (IDE) that builds, operates, controls, and documents the entire data warehouse.

Contact Info: WhereScape USA
Headquarters: 2100 NW 133rd Place, Suite 76 Portland, OR 97229
Phone: (503) 807-5024
Fax: (503) 466-3978
Website: www.wherescape.com

WhereScape 3D and Big Data 24 Aug ’12

We have been asked about WhereScape 3D and big data.  While we are not yet aware of any customer use cases, it does come up in conversations where people want to explore the boundaries of the software.

The engineers have been doing some investigation on various big data scenarios.  With WhereScape 3D we will only ever be interested in data that will eventually, in some form, be used in decision making and will most often be used in conjunction with a traditional data warehouse.  This gives us some boundaries for our big data investigation and ultimately functionality that we will support.

One of the key concepts behind WhereScape Data Driven Design (3D) is that we want to include data in the design and planning process.  Given this, an obvious place to start our big data journey was with Hive.  And yes we are very much aware that big data does not just mean Hadoop.  It is where we started, not where we intend to finish.

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.  Since there is a JDBC driver for Hive we can use standard WhereScape 3D functionality – in theory anyway.

Here is a WhereScape 3D screen shot showing discovery and profiling against Hive:

3D and Hive

 

 

 

 

 

 

 

 

 

 

 

The good news is that we could get it going. The not so good news is that the JDBC driver was temperamental and slow.

The bottom line…this is not yet ready for prime time. We are continuing on the WhereScape 3D big data journey. We are looking for input on how people want to analyze big data (not just Hive) – please feel free to contact us if you have a scenario you would like us to investigate.

New Zealand Minister of Trade visits WhereScape Auckland Office 3 Aug ’12

New Zealand’s Minister of Trade, Hon Tim Groser, knows a thing or two about international trade.

Tim Groser

© Crown Copyright

 

 

 

 

 

 

 

 

 

 

 

 

His profile on the official New Zealand Government website (named beehive.govt.nz – who knew?) states “Tim is regarded as one of the world’s leading experts on international trade. Prior to being elected as a member of Parliament in 2005 he was New Zealand’s Ambassador to the World Trade Organization (WTO), and Chair of Agricultural Negotiations for the WTO.”

The Minister recently visited WhereScape’s New Zealand headquarters where we talked through some of the advantages and challenges of being a New Zealand tech company.

Tech is important to New Zealand.  According to Paul Matthews, Chief Executive of the Institute of IT Professionals New Zealand Inc, the tech sector’s total export value is second only to dairy.  In WhereScape’s case, 75% of our customers are outside New Zealand.

We appreciated the opportunity to meet the Minister, and enjoyed hearing his perspective on global trade.  While we face some of the challenges of the commodity exporters, it is certainly easier being in the digital side of the business.

Our thanks to NZTE for organizing the visit, and we intend to keep up the work we are doing and continue to develop award winning software that delivers value all across the world.

Tim Groser at WhereScape

 

 

WhereScape RED and Big Data 2 Aug ’12

TDWI San Diego has kicked off, and the conference theme is “Big Data”.  TDWI seems to like themes – Boston in September is “Agile” (of course we will be at that one), and Orlando in November is “Emerging Technologies”.

You can catch us at TDWI San Diego.  We have a booth where we are catching up with customers, analysts and press, as well as introducing our software to prospective users.  One of the questions we get asked, along with “Can you really help me build a data warehouse that quickly?”  (Yes), “I use Teradata/SQL Server/Oracle/DB2 can you still help me?” (Yes), “Are you an ETL tool?” (No, but let’s sit down and talk about what you need to achieve rather than what software category you want) is often “So what is WhereScape’s view on Big Data?”

I have enjoyed listening to Colin White talk about Big Data at the Masters of BI events that WhereScape have run, have listened intently to Colin and Harriet Fryman from IBM present and discuss Big Data Analytics at the Pacific Northwest BI Summit and I have even been on a panel talking about Big Data at a CIO Magazine event.  I have had discussions with customers, analysts and other vendors, and have also listened to webinars, and read widely on the topic (one of the best pieces I have read is from Mark Madsen previewing the keynote he and Marc Demarest are giving at TDWI San Diego ).

Defining Big Data

I am not convinced there is a common understanding of what Big Data is. I will leave it to the analysts to come up with the perfect definition.  For the moment I am going with:

Data that exceeds the capability of conventional databases and approaches.

Mark Madsen wrote:

“What we’re facing is an architectural shift. You can think of it like the Web 1.0 to Web 2.0 shift. Web 1.0 was largely read-only, publishing oriented, just like BI and the data warehouse. Web 2.0 is read-write, peer-to-peer at both a human and machine level. That was an architecture shift as much as a technology shift. We reconfigured the components in the architecture and created different ways to build and scale software.”

I really like the idea that Big Data is an architectural shift.

So what is WhereScape’s view on Big Data?

WhereScape Software is a data warehousing software company.  We believe traditionally data warehouses take too long to build, and once built are too hard to change.  Despite these shortcomings, we passionately believe that data warehouses have a role within data architectures.  But data warehouses will be different.  Already many of our customers do not view their data warehouse as a heterogeneous platform – they no longer have a Teradata or SQL Server data warehouse; our customers now have a data warehouse that includes Teradata or SQL Server.

Our view is that big data redefines the role and usage of data within an organization.  Some things do not change – there is still absolutely a place for a well-managed, well governed data warehouse (just built quicker, and more responsive to change, thank you very much).  But with Big Data we now need to think about all the other data that an organization could use for business benefit – but that will never make it into the traditional data warehouse, or will only be included in an aggregated or derived form.  This data could be in an appliance, a non-relational store, external or streamed.  As data people we need to be cognizant of changes in what our user constituency define as useful data.

WhereScape RED in a Big Data World

When we announced WhereScape RED 7, Shawn Rogers, Vice President of Research, Business Intelligence and Data Warehousing at Enterprise Management Associate wrote:

 “The days of the enterprise data warehouse operating as the centralized hub for all reporting and analytics is a thing of the past. Database appliances are gaining traction as IT organizations warm to the notion of using purpose-built platforms for specific analytic tasks, but the speed of the platforms has naturally outpaced the productivity tools needed to support them. 

WhereScape RED 7 is ideally suited for today’s highly diverse, special-purpose IT infrastructures. WhereScape has increased its value proposition by moving from ‘let us build and manage your data warehouse’ to ‘let us manage your extended data ecosystem, which includes your data warehouse’”.

Big data may change how we view a data warehouse and data within an organization, but the concepts of time to value and responsive to change remain, and that is where WhereScape RED continues to be focused. Bring on Big Data, and bring on WhereScape RED.

 

Fixing Files From DOS 26 Mar ’12

Someone asked me the other day: “How do I fix this data file before I load it?”

All they wanted to do was change one value that contained a reserved word their loader couldn’t cope with.

They were on windows and didn’t want to write anything that needed compiling (c, vb, etc).

Finally, it needed to run from DOS in a RED host script.

Here’s what we came up with – a really simple DOS program that generates and runs a windows powershell script:

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SETLOCAL ENABLEEXTENSIONS
SET INFILE=c:\in.txt
SET OUTFILE=c:\out.txt
SET OLDSTRING=oldval
SET NEWSTRING=newval
SET FILEPS1=%WSL_WORKDIR%\wsl%WSL_SEQUENCE%.ps1
SET FILEAUD=%WSL_WORKDIR%\wsl%WSL_SEQUENCE%.aud
ECHO Get-Content “%INFILE%” ^| ForEach-Object { $_ -replace “%OLDSTRING%”, “%NEWSTRING%” } ^| Set-Content “%OUTFILE%” > %FILEPS1% 2>%FILEAUD%
powershell -command . %FILEPS1% >> %FILEAUD% 2>&1
ECHO 1
ECHO The string %OLDSTRING% in %INFILE% has been replaced by the string %NEWSTRING% to create %OUTFILE%
TYPE %FILEAUD%
EXIT

If you want to use this, just paste it into a RED host script and change the four variable SET statements to your file names and strings…

New Release of WhereScape’s Data Warehouse Development Environment Enables Cross-Platform Database Appliance Support 14 Feb ’12

WhereScape RED 7 adds support for analytic appliances, bringing rapid and agile data warehouse development and management to the entire data ecosystem

TDWI LAS VEGAS– February 13, 2012 –WhereScape today introduced WhereScape RED 7, a new release of its integrated data warehouse development environment that supports today’s leading database appliances. WhereScape will be demonstrating WhereScape RED 7 support of IBM Netezza and EMC Greenplum in booth #501 at the TDWI World Conference in Las Vegas.

WhereScape RED 7 generates code that will run on the appliance through a fully featured user interface, enabling practitioners to generate the same code they would otherwise have to handcraft. In the case of IBM Netezza, WhereScape 7 will enable practitioners to utilize the IBM Netezza platform for ETL transformations, generate NZPLSQL set based SQL, as well as creating tables, all of which are documented and all automatically included in WhereScape’s integrated and open metadata repository.

WhereScape RED enables practitioners to access source data from multiple environments, create procedural code, scripts, and tables, build cubes, schedule updates, and generate documentation in HTML format, all while maintaining existing BI front-end compatibility. In addition to the analytical appliance integration announced today, WhereScape RED provides native support for Teradata as well as IBM DB2, Oracle, and Microsoft SQL Server.

“The days of the enterprise data warehouse operating as the centralized hub for all reporting and analytics is a thing of the past. Database appliances are gaining traction as IT organizations warm to the notion of using purpose-built platforms for specific analytic tasks, but the speed of the platforms has naturally outpaced the productivity tools needed to support them” said Shawn Rogers, Vice President of Research, Business Intelligence and Data Warehousing, Enterprise Management Associate. “WhereScape RED 7 is ideally suited for today’s highly diverse, special-purpose IT infrastructures. WhereScape has increased its value proposition by moving from ‘let us build and manage your data warehouse’ to ‘let us manage your extended data ecosystem, which includes your data warehouse’,” Rogers said.

“Speaking as both a current WhereScape and Netezza customer, I am very excited by the announcement that WhereScape RED will be now be supporting IBM Netezza appliances,” said Johan Vendrig, GM of Information Services for healthAlliance, a New Zealand-healthcare services organization. “The development speed of WhereScape RED and the power of Netezza make for a great combination.”

“When WhereScape RED was first introduced, the data warehouse consisted of a single database, and we integrated our software with four of the industry leaders—Microsoft SQL Server, IBM DB2, Oracle and Teradata,” added WhereScape CEO Michael Whitehead. “Our customers and prospects now want to develop and deploy quickly across a multi tiered data warehouse environment. We are committed to providing appliance customers a seamless, end-to-end experience for quickly developing their data warehouse infrastructure as well as help them maximize their use of it. WhereScape RED helps accelerate the development of data warehouses and enables organizations to them in less time, with less money, while mitigating risk and resulting in fully documented implementations,” Whitehead said.

 

About WhereScape

WhereScape design, develop, sell and support WhereScape 3D, the industry’s first data warehouse planning tool; and WhereScape RED, the industry’s first and best integrated development environment for building, deploying, managing and renovating data warehouses.

WhereScape’s products are used in every kind of business intelligence project found in today’s commercial environment: from normal-form enterprise data warehouses and data vaults, through user access layers and tiers of dependent data marts, to standalone data marts and reporting systems.

Our more than 500 customers are active and satisfied users of our technologies, and report that, with WhereScape’s products, they are able to build data warehouses with fewer people, few if any diversions, missed deadlines or unmet user expectations, and in record time: in days or weeks, instead of months or years. WhereScape has offices in Portland Oregon, Auckland New Zealand, and Wokingham UK. For more information, please visit www.wherescape.com.

All products or company names herein may be trademarks of their respective owners.

 

Contacts

For WhereScape:
Scott R. Humphrey
Humphrey Strategic Communications
503-644-9709
humphrey@strategic-pr.com
TDWI Booth #501

How to Print Out the Entire RED Documentation 20 Dec ’11

In order to be able to print out the complete RED documentation set, it is required to create a script that merges all the RED Documentation’s HTML files into one HTML file. The description is as follows:

  • Put the script_html_merge.bat script into RED documentation directory.
  • Double click on it to execute manually; it will create doc.html merged file. The execution of the script_html_merge.bat script can also be scheduled via RED Scheduler.
  • Download http://sourceforge.net/projects/pdfcreator/ PDF creator a free PDF printer for Windows. If desired, the other similar product can be used.
  • After the installation, it will create PDF Creator printer.
  • Open doc.html and print it to PDF Creator – it will create PDF file.
  • For PDF creation, you can use any other tool allowing printing into PDF or convert HTML into PDF; I used PDF Creator simply because it is free and it does a good job.
  • The code is two lines really, see attached code below

Note: Obviously, you can print out one doc at a time by Right Click on the doc and choose Print from dropdown menu

 

REM  *****************************************************************************

REM  Script Name    :    script_html_merge

REM  Description     :    Merges all the RED Documentation HTML files into one HTML file

REM  Generated by  :    WhereScape RED, manually

REM  Generated for :    WhereScape Customer

REM  Author              :    Raphael Klebanov

REM  *****************************************************************************

 

echo OFF

copy index.html doc.html

REM FOR %%G IN (dir /B w*.html) DO (copy doc.html+%%G d && copy d doc.html && del d)

FOR %%G IN (dir w*tech.html w*user.html ) DO (copy doc.html+%%G d  && copy d doc.html && del d)

 

REM this line will put the “tech” files first, then “user” files

 

REM  *****************************************************************************

REM  Notes

REM  *****************************************************************************

REM  1. “FOR %%G”  means for (each item)

REM  2. “/B” switch enabling the batch file to quit with a return code. “/B” option can be removed. I do not see any issues.

REM  3. DO (…&&…) means run the command following && only if the command preceding the symbol is successful.

REM  4. Small “d” is a temporary file used as a temporary copy.

REM  5. If the list of the files always contains the same files, then the remaining files can be added before or after w*, e.g. glossary.html

REM  6. To prevent cut off the images, configure the PDF Printer, e.g., use landscape instead of portrait.

REM  7. You can modify list of HTMLs that are processed within DO statement to include/exclude the doc files

 

Good luck!

3D Speeds Up Analysis 23 Nov ’11

The other day I set out to build a small data mart combining support data and development data from two database sources and several other sources.

I needed to quickly find out which tables in the two database sources I needed to use.

My development system has 104 tables and my support system 244 tables.

So I pointed WhereScape 3D at both sources.

Very quickly I had:
- complete ERDs of both sources
- profiling results for all tables and columns
- identified the 12 tables in the develpment system I needed
- identified the 13 tables in the support system I needed

Less than 30 minutes after starting, I was ready to build my data mart using WhereScape RED.

I’m sure it would have taken me at least a day to figure out both source systems without 3D.

Here’s the ERD I ended up with for the subset of the develpment system I needed:

And here’s the one for my subset of the support source system: