Addressing Business Intelligence Data Quality

A friend of mine posted a link to Michael W. Dobson’s TeleMapics Blog entitled Google Maps announces a 400 year advantage over Apple Maps about Apple’s problems with the release of their own mapping solution with iOS6. These problems are well documented elsewhere but Michael’s post highlights problems not just associated with mapping data but common across many, if not all, business intelligence solutions. I have been lucky enough to work on projects that include telematics data and spatial analysis. One project in particular is riddled with data quality problems but the machine generated telematics data is the least of their problems – it is the data quality in the main operational system and people’s understanding of it that causes the issues.

Michael identifies five issues that exist with Apple’s current offering from a data quality perspective but the description can simply be mapped to any business intelligence article.

  • Completeness – Data that is absent and some data that is included but seems to have erroneous attributes and relationships. There can be described as omissions in the data but there will also be errors of commission where the same data is represented more than once (usually due to duplication by multiple data sources).
  • Logical Consistency – the degree of adherence to logical rules of data structure, attribution and relationships. There are a number of sins included here, but the ones that appear to be most vexing to Apple are compliance to the rules of conceptual schema and the correctness of the topological characteristics of a data set. An example of this could be having a store’s name, street number and street name correct, but mapping it in the wrong place (town).
  • Positional Accuracy – is the data in the right place? This is most commonly expressed by the way that data is associated with artificial hierarchies added to the data warehouse. It may appear to be a small error in terms of how much data is in the wrong place and the overall total still adds up. This may not seem like a large problem to developers but the perception of the data by people who use the lower levels of the hierarchy for their reporting will immediately and irreversibly be tainted by the perception of inaccuracy.
  • Temporal Accuracy – particularly in respect to temporal validity – is the data still meaningful at this point in time? Combining data from different temporal contexts without  understanding the consequences can lead to a distortion of the reported results.
  • Thematic Accuracy – is it sensible to put certain sets of data together? The ability to join data in large data warehouses does not mean that different pieces of data should be joined? How do the users understand what is valid or invalid when it comes to using the data?

Recommendations

1. Step back and re-engineer your approach to data quality as a core business intelligence competency.
2. Understand that you have a problem and that you are unlikely to have the experience or know-how to fix it.
3. Employ experts – consultants in the short-term but permanent staff tasked with maintaining the quality in the long-term.
4. Ensure that you have the experience in management to control the outcome of their development efforts. You need to hire someone who knows data quality, management and how to build winning teams.
5. Become active in crowdsourcing. Find a way to harness local business knowledge and invite your users to supply local information, or at least lead you to the local knowledge that is relevant.
6. Don’t hire a high power consulting group to solve this problem. This would be the biggest mistake you make, but it is one that big business seems to make over and over. As an alternative, I suggest that you look to people who actually know something about these applications.

When consulting we often urge people to put data quality at the heart of the process and to instigate Continuous Data Quality Processes. Our white paper How Data Works can provide insight into the types and causes of data quality issues. Other information can be found using the Data Quality tag on this site.

I hope that Michael W. Dobson will forgive my plagiarism of his concepts but he has neatly addressed some of the most common and pressing issues in business intelligence data quality.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.