The Modern Data Platform – Challenge 2 – Complex Architecture
In the first part of this blog series I discussed the traditional data warehouse and some of the issues that can occur in the period following the initial project. I concluded by outlining the 5 common challenges that I hear from customers.
- Complex Architecture
- Slow Performance
- Old Technology
- Lack of Governance
This blog will focus on the second of these issues, complex architecture.
In the previous posts in this series we have discussed the origins of the traditional data warehouse and outlined the basic architecture required. The following is a simple implementation architecture for a multi-source data warehouse designed to support traditional Business Intelligence workloads. An architecture that I am sure most of you reading this post will recognize.
- Automated source system flat file extracts, loaded and processed via multiple complex SQL statements – typically hard to manage
- ETL tools that ship with the Data Warehouse RDBMS such as SQL Server Integration Services – often optimised for their own RDBMS and / or limited in their functionality
- Specialist 3rd Party ETL tools such as SAP BusinessObjects Data Services – true database independent integration platforms often with additional management, scheduling, data quality and stewardship functionality
Store and Optimize – the corporate memory where all the relevant data is structured, stored and optimised for performance reporting. Generally speaking this has traditionally been a relational database management system such as Oracle or SQL Server, but may occasionally be a higher performance analytics database such as Sybase IQ or an appliance from Neteeza or Teradata.
Front End Analytics – traditionally the bulk of information requirements were for structured paper based reporting that was predominantly historical in nature. This is often accompanied by a need to provide end user ad-hoc reporting and some level of KPI based performance dashboard. Typical solutions here; SAP BusinessObjects, Cognos, Microsoft Excel, Qlikview or any one or more of the many others available.
Good for Then, But What About Now?
On one level this type of architecture was fine when we were dealing with onsite, relational, structured data predominantly stored inside accessible SQL based databases. For some organisations this is still the comfortable reality. I am increasingly hearing from data warehousing and BI teams that are struggling to implement manageable architectures to address the complex and evolving needs of their users. For instance, what if you found yourself confronted with the following requirements:
- The Marketing team need to build a customer churn / prioritisation predictive model. This will be built on cloud based mobile app behaviour data and CRM data from the data warehouse.
- The Maintenance and Support team that wants to leverage new sensor technologies to get a real-time view of how customers are using their products. Perhaps this will be used to improve service levels or event offer more efficient preventative maintenance.
- The Finance team want to be able to perform more advanced Planning, Budgeting and Forecasting operations using the data warehouse. Ideally without having to run endless extracts.
- Senior Management would like to make use of their iPads. This creates a need to deliver real-time KPI information that can be drilled down into.
So what do these types of requirements do to our traditional “simple” architecture? They make it significantly more complex.
Source Data – understanding new data sources and whether they can be made available for downstream consumption.
Extract, Transform and Load – if we are no longer simply dealing with batch loaded relational data then we need additional connectivity solutions to handle new consumption models. If we are streaming data as well as batch loading then clearly we will need a different type of monitoring and scheduling / management layer.
Store and Optimise – so called “Big Data” requirements change the dynamic of our storage layer. Simplistically put we need to have a repository capable of addressing the 3 V’s (Variety, Velocity and Volume). In other words, unstructured, fast moving and high volume. Typically, this is where we see teams looking towards Hadoop and other similar solutions, but can they be used for a traditional SQL based data warehouse? Can Hadoop deliver the performance we need, if not can we get it to work with our existing data warehouse?
Front End Analytics – it is a fact that many business users have become tired of waiting for their IT teams to deliver requested functionality so have purchased their own departmental solutions. These are often beyond the scope of the original reporting tools that were implemented. Often these solutions duplicate existing data in order to address “advanced analytics” workloads such as predictive and statistical modelling. This creates more silos of data that need to be managed and updated to ensure they remain accurate and consistent.
It should be pretty clear by now that this organic sprawl of a data architecture is less than optimal. This bolt on approach to addressing changing requirements results in a number of key issues:
- Integration – how do you get all these new technologies to talk to each other
- Overlap / redundancy – many tools that have similar capabilities
- Risk – who truly understands all of the components
All this adds up to a complex architecture, and that results in increased costs and lost agility.
So What’s the Answer?
Is there an alternative to organically stretching your existing data warehouse architecture, bolting on new capabilities as you go? Can we move to a new architecture that will deliver the kind of experience that our users desire?
Which of these collections of properties look like they would be the easiest to manage, maintain, extend and ultimately live in?
What if you could implement a data architecture that was fully integrated including management and monitoring? With the ability to not only monitor the end to end process but also the quality and lineage of the data itself.
A data architecture that could seamlessly ingest, store and optimally respond to both traditional transactional BI and newer Big Data workloads based on batch and real-time data.
What if this architecture was open standards compliant, able to support best in breed BI tooling in order to address advanced use cases such as predictive analytics.
Finally, what if this could be delivered either on premise or in the cloud.
Would you be interested in understanding what this data architecture looked like?
The answer is, yes (of course).
The Modern Data Platform
The Modern Data Platform delivers a foundation for agile modelling, delivery and iteration that rapidly shortens the time to deliver changes, as a result lowering the cost of ownership of your Business Intelligence and Analytics solutions.
Remember to follow the rest of the posts in this blog series where we will explore in detail the four remaining common challenges of traditional data warehousing.