Assessing Data Quality
Life is like a box of chocolates
As the Head of a Data Management Practice and Global Digital Practice I often find myself having conversations with customers about data and in particular the quality of the data. From my core experiences of managing databases, through business intelligence (BI) and advanced analytics to enterprise data architecture, the subject always crops up. You may be considering a migration to a new landscape as part of your digital transformation agenda, acquiring a new business, removing data for regulatory reasons or implementing master data governance to name just a few initiatives but the perspectives people have never cease to amaze me.
“My mum always said that life is like a box of chocolates, you never know what you are going to get”.
Forest Gump had a point, until we lift the lid and assess the situation we will never know what state the data is in, how it has been customised, has quality eroded over time, is it fit for purpose and can I make effective decisions from the data. Full of joyous enthusiasm and childish wonder people tend to embark on projects and rummage around in the box in search of their favourite hidden treasures such as truffles. So why do so many people have unreasonable expectations in the area of data quality management?
Simple – Data Quality projects are perceived to be difficult and is far too often in the hands of IT departments.
Lifting the lid on our applications to assess and fix data is like opening the box of chocolates. The anticipation of starting the data quality project is full of business benefits such as “Identifying Redundant Inventories” or “Optimising Debt Management” but inevitably we all know there will be things to encounter that nobody wants and despair creeps in e.g. the Coffee Crème or Coconut based chocolates. Whatever, your pet hate is, there is a data equivalent in a Data Quality Assessment (DQA).
Conducting a Data Quality Assessment
A DQA may be required to support a number of projects such as a Data Migration, Master Data Management/Governance, Data Warehousing, archiving and advanced analytics projects but ultimately the same dimensions may be used to assess the risks and effort associated within. The following provides a view on a selection of typical dimensions that can be used.
• Accuracy: The degree that data represents “real life“ entities they are implementing.
In many cases accuracy is measured by how the values of a record agree with the source of “real life“ information. The UK Postal Address File (PAF) can be used as a source of valid post codes to address mappings for UK addressable data. However, sometimes the only viable way of checking accuracy is through manual processes such as physically checking serial numbers on an asset such as an engine or rail signal.
• Currency: The degree which information is current with the world it models.
Closely aligned to accuracy is currency which measures how “fresh“ the data is and can be extremely important with Data Privacy and even simple processes such as debt management. e.g. If Vendor address data is not checked regularly then invoices, reminders and court processing may ultimately be sent to the wrong location impacting cash flow.
• Uniqueness/Duplication: Simply states that an entity only exists once.
This is probably the most common issues we are finding as we migrate or convert very old legacy ERPs. The lack of data governance and associated controls has left many organisations with duplicate master data often leading to business partner (customer and vendor) consolidation problems and duplicate Material Master records leading to inventory holding issues that tie up working capital.
• Completeness: The expectation that one or more attributes will have assigned values.
For example it may be expected that all material records will have a unit of measure. However, completeness may also reflect the expectation that a number of dependent rows will be present in a dataset e.g. For Material Master Data it may be expected that there will always be associated Procurement and Finance attributes.
• Precision: The level of detail of a data attribute.
Reflecting accuracy of financial figures with/without rounding and to a number of significant digits may impact forecasts, profit statements etc.
• Privacy: The need for access controls to monitor usage on Personally Identifiable Information (PII).
Data Privacy in recent years has taken on a whole new dimension following the European Union General Data Protection Regulation (GDPR) and this dimension needs to be carefully assessed and appropriate controls put in place.
• Referential & Entity Integrity: The relationships between data enforced by constraints.
Many applications enforce referential integrity through physical constraints or application coding e.g. A Sales Order line will always have a Sales Order Header but also entity integrity has to be maintained across many tables as within SAP ECC to ensure that data within tables remains consistent e.g. all Material Master tables such as MARA, MAST, MARC are consistent and there are no orphaned records.
The above list is not exhaustive and other dimensions for analysis exist but it remains important to keep any eye on business benefits and the return on investment (ROI) for our data quality projects.
Focus, Focus, Focus
How often have we spent time surfing the Internet for one thing only to find that we have found something else along the way? Several hours later we forgot what we originally started surfing for.
Running data quality assessments can be much the same and lead us down rabbit holes where we never intended to go down. Projects need to be managed with good project management (managers who understand data related projects) to keep the analyst on track and focused on tasks.
At NTT DATA Business Solutions UK we have developed processes associated with master data management and data migrations to S/4 enabling us to align the Business Process transformation with data analysis so that we only focus on the data requiring our attention. For example, when a customer has data in their SAP ECC going back to R3 version 3 or 4 and they have never archived anything, should I really analyse all their customers or materials? Of course the answer is no and we manage the scoping of the assessment in line with the core project being undertaken.
Bringing Technology to Bear
A wide range of technologies exist in the SAP portfolio which can be deployed depending upon the scale of programmes undertaken. Typically at NTT DATA Business Solutions we use the following products:
• Traditional SAP transactions to extract data and undertake analysis in Excel;
• SAP Archiving & ILM;
• SAP Information Steward;
• SAP Data Services;
• SAP Master Data Governance; and
• SAP Data Management Suite
Equally NTT DATA Business Solutions have our own products and accelerators, that can support solution architectures e.g. it.MDS (Master Data Simplified).
So I have now categorised the relevant problems found in my data e.g. duplicate, incomplete and inaccurate material master data. I have also understood associated risks on parts of said data e.g. the customer is carrying too much inventory holding. So what can I do about this?
Over the coming months we will be running a number of webinars covering some of these core SAP technologies and how they can be used in support of Data Quality Assessments.
Data analysis and assessment is like a box of chocolates so develop the skills with NTT DATA Business Solutions to ensure you focus on those that are a priority to your business.