Data Governance
A topic which can be mistaken for something from legal jargon but instead simply refers to the lifecycle of data. This spans its creation, capture, storage, processing, use and where appropriate, its destruction. From an Analytics perspective, we are particularly interested in the above as it determines the quality and ultimately, the trustworthiness of what is placed before us in the form or reports, dashboards or whatever guise that information may take.
This is a topic I’ve presented on several occasions and has piqued the interest of many. My recent session at the NTT DATA Business Solutions Customer Conference was no departure in this sense and I was approached by many of you with follow-on questions. This piece will provide a quick summary of the session I delivered and address some of the key themes that featured in the subsequent questions. Data Governance can be an amorphous term to many as suggested above but Data Quality which is a component of this; this is experienced and often endured by most. Data Quality and typically the dwindling state of it is a phenomena known to many and is much more conversant than the wider term of Data Governance. So, with this important distinction in place, I would like to take some time to talk through 4 key challenges and how they may be addressed through the implementation of SAP Information Steward.
1) Improving the trustworthiness of data
The perennial problem of IT being responsible for everything electronic, be it the systems or the toaster in the kitchen, hasn’t helped IT or the Business Users and it certainly hasn’t helped address the holistic challenges presented by data quality. Addressing this requires effective engagement from the business; it’s the business in our experience that truly understands information and how it is applied to deliver business value. This practical view coupled with IT’s advanced understanding of systems and data structures is pivotal in determining the success of this pursuit. SAP Information Steward is a comprehensive solution designed to help in many of these areas and especially in empowering the business users and bringing to the fore their understanding of information and its application. This along with the ability to work collaboratively with their IT counterparts will help deliver the vision we have outlined above. Returning to the topic of conversation – Improving the trustworthiness of data – SAP information Steward allows a business users to access source data from disparate systems and then profile this data or subject it to a series of domain specific rules. All of this is intended to provide an in-depth understanding of data quality metrics and the inconsistencies that may lie within the source data. Armed with this understanding the business users can then work in an informed and engaging manner with IT to curb the issues that have been identified. There are techniques such as exporting rules to SAP Data Services which can make this a robust and automated process.
2) Understand the composition of business reports
Have you ever been in a situation where you are questioning the validity of numbers appearing in your reports? Does Gross Sales include or exclude certain type of transactions, does discounts include discretionary and promotional discounts, how are multipacks represented by the Quantity metric. Do any of these sound familiar? Often we are at the behest of IT to accept these as the integral numbers and for those doubting souls, there may be a spreadsheet which provides lineage information of sorts. SAP Information Steward provides end-to-end lineage which would allow a business user to understand the entire journey of a given metric from source to destination (report). Typically represented within this, is the source field, ETL happenings, Universe representations and any report logic. Self-service extending beyond reporting to metadata I hear you say. That’s absolutely the case here; business users are empowered to understand the composition of metrics and where appropriate raise these as matters of difference with IT and then resolve appropriately. This functionality is equally helpful for our IT colleagues as it allows them to correctly interpret the impact of any technical changes that are being contemplated. For instance before dropping or amending any source database artefacts, this functionality allows IT to fully appreciate how that object is contributing to other processes upstream.
3) Catalogue and define business terminology
I’ve witnessed many family feud like eruptions over what objects should be called and if you really want to ratchet it up, introduce definitions into the debate. These are sensitive and contentious matters as often there is inertia, competitiveness and politics at play when agreeing on a ratified set of terms. Brushing this aside for a brief moment, SAP information Steward allows us to document and store the agreements emanating from such a process to ensure that it is secure and accessible by the business. This business glossary could provide the exact terminology adopted for a given attribute or measure and then document to an appropriate level of detail, it’s definition, source, transformations, etc. Please note that you can also bind these definitions with physical objects to ensure that benefits of Metadata Management and Data Insight (as summarised in the earlier sections) are exploited by this module. Additionally, this allows us to track changes and audit these as definitions and terminology evolves over time.
4) Categorise Unstructured Data
So much of our data in the current age is unstructured and rarely do we use this for information purposes. Often this is due to technical limitations both in processing and storage terms. SAP Information Steward includes a Cleansing Package Builder module which has an array of out-of-box suggestions which help classify and categorise unstructured data but you can then augment this with any domain specific terminology. This functionality extends beyond simple text matching to standardising variations and also making the classifications context specific. As an example of the latter, paper weights expressed as 20 lb, 20-lb 20 # can all be classified as 20 lb as your preferred standardisation. I’m sure you will have many other examples within your use of unstructured data.
In summary, SAP Information Steward provides a comprehensive set of capability to empower the business user to combat the challenges presented by Data Governance and specifically Data Quality. However, importantly this cannot be achieved in isolation. I started by stressing the need for IT and Business to work collaboratively to tame the beast and SAP Information Steward fills the previous void and provides the Business with a set of capabilities which allows them to engage more effectively with IT.
Finally, Data Governance is a topical subject with varied commentary and so-called solutions but I firmly believe that technology alone cannot provide a solution. In my experience, it’s important for the organisation to develop a certain mindset and focus so that such an initiative can be successful.