The Modern Data Platform – Introduction
“If you don’t know your history”
It seems to be a very long time since we put to bed the arguments over the need for data warehouses. I remember the first project I worked on, a City Council struggling to get a comprehensive view of their residential properties and tenants. Why were they struggling? Simply put their data was in silos across a multitude of different operational systems, each of which had its own unique way of storing (or in some cases hiding) its data.
Our challenge was obvious; how do we find all the relevant data items and combine them so that we can answer the questions that the organisation has?
The IT team had already tried to solve this problem in the usual way you would expect. Yes, that’s right, extracts or dumps from each system manually pulled into Excel sheets to produce reports.
How could they gain a Tenant and Property centric view and remove the huge effort and risks inherent in managing, combining and reporting all of these separate data sets?
The answer was to build them a Kimball style dimensionally modelled data warehouse with a reporting layer on top. Back in early 1999 (for it was then) this meant a Microsoft SQL Server 7 data warehouse and the combination of Crystal Reports and Crystal Info for report creation, management, scheduling and distribution. Ah, those were the days 😉
Except that it wasn’t that simple
We used SQL Server Data Transformation Services (DTS) as our ETL tool. This made loading the data into SQL Server easier, but we still had issues connecting to the source systems. When we could, the concept of a delta extract, i.e. just the changed records, was a non-starter.
Once the datasets were in our database we passed them through a number of staging areas. This cleansed, related, combined, aggregated and differentiated the data we required for our historical non-volatile reporting models. Unfortunately, this resulted in us having to store multiple copies of the base data. Never mind, disk space was relatively cheap, or not as the case may be.
Once the data model was ready we applied a particularly cunning combination of Indexes and Stored Procedures in order to deliver the level of performance required. This performance tweaking was necessary as a result of having to update the warehouse and distribute the core reports within the customer’s already constrained overnight batch processing window.
With the model optimised and populated the reporting schedule kicked in to run, render and deliver via email whichever daily, weekly, or monthly reports we required.
So far so good. Unless of course someone asked for a change to an existing report, or a new report, or doomsday scenario, a new report that required a completely new set of data!
Unfortunately, this isn’t an extract from the missing Business Intelligence chapter of Grimms Fairytales. This is in fact, give or take the odd OLAP cube, semantic layer and so called end user reporting tool the model that is still in use almost 20 years later to deliver information systems within most organisations.
Why is this a problem?
I speak to many customers from organisations of all shapes and sizes. Both public and private. Highly regulated or not. I hear time and time again complaints about a lack of access to accurate, timely, relevant, trusted information. Information that is required to support or drive decision making. The so called path from transaction, to information, to insight to action.
Would it surprise you to know that most of those complaining already have a data warehouse?
A data warehouse that satisfied their original requirements when built, but as sure as night follows day their organisations information demands changed.
This is a fact of life and something that we have to recognise.
Many existing data warehouses and the teams that maintain them struggle to keep up with changes to data sources. Structural changes to existing sources or entirely new data sets from newly implemented systems or as a result of M&A activity. These changes need to be addressed whilst also trying to meet the evolving and often challenging analytical needs of the organisation.
Decaying data warehouses are like set concrete, too rigid. Many organisations are unable to bring together diverse (structured and unstructured) data types. They suffer from time-consuming reporting, experience data quality issues, and most importantly can’t answer forward-looking questions. The business will look for more agile ways to obtain the information it needs.
All this does is enforce the opinion that the “old data warehouse” no longer meets the needs of the users.
The 5 challenges with a traditional data warehouse
We have built many data warehouses and have first-hand experience of helping customers evolve their data warehouse platforms.
Time and again we hear about issues and challenges related to existing data warehouses, these are best summarised as follows:
- COMPLEX ARCHITECTURE
- SLOW PERFORMANCE
- OLD TECHNOLOGY
- LACK OF GOVERNANCE
The reality is that business needs have changed and technology has moved on. We feel that it is time to adopt the concept of a Modern Data Platform.
The Modern Data Platform
The rest of this blog series will explore in detail how the Modern Data Platform has evolved to address each of these challenges.
We will also explore best practice development and implementation techniques that have evolved to support their successful deployment.
You can also expect to hear real world, cross industry, practical examples of the Modern Data Platform in action.