Dr. John Talburt
UA Little Rock
Topic: How can I improve my data quality in my organization?
Question: I am often asked “Everyone always praises the idea of better data quality, but no one really wants to invest in it. We still have recurring data quality issues, and it seems like our data quality efforts are mostly fighting fires, just fixing immediate problems, not an organized effort to improve data quality long term. How can we move from data quality fixes to an ongoing data quality program?”
Answer: It is true, I never hear anyone speak out against data quality (DQ), and to some degree, every organization implements some level of DQ activities. The biggest problem I see is that most organizations practice “supply-side” DQ instead of “demand-side” DQ. What do I mean by this? Supply-side DQ is when the focus is on cleansing and standardizing data at sourcing. Supply-side DQ is based on a theory that if the data are transformed to comply with all standards, and if the clean data are then fed into bug-free software, then the output will automatically be of high quality. If there are problems with the output, then it must be because some data was not, or could not be, properly standardized, or because there was an undiscovered bug in the software.
The problem with supply-side DQ is that it treats the output as simply a “by-product” of the information system rather than the “product” of the system. The key concept to a successful data quality program is managing information as a product. While this approach has long been advocated in academia as far back as the MIT Total Data Quality Management (TDQM) methodology of the 1990’s, it is only now being taken seriously as an industry practice.
So what is demand-side data quality? It simply means to start the assessment of data quality from the standpoint of the data consumer. You begin by asking “What are the information products I am building?” Then asking users of each information product “What is it about this product that produces value for you? What do you want and expect from this information product?” The answers to these questions are the basis for implementing “quality control” standards. Quality control (QC) is simply inspecting the final information product to see if it meets user standards. The lack of QC activity is a good indicator an organization is practicing supply-side DQ, i.e. why inspect the product, if it was assembled with clean input and clean software? No reputable manufacturing company would ship a product without putting it through a QC process, but it happens all the time with information products.
Another advantage of demand-side DQ is it introduces the concept of product life cycle management. If you are managing information as product, then are you even producing the right product? Are you producing an old product needing retirement (station wagons instead of SUVs), or perhaps a new product needing more testing and improvement based on user feedback?
Of course, demand-side still has a supply side component, but in the demand-side model, the sourcing standards and software requirements are driven by the product requirements, i.e. starting with the end in mind. The cleansing and standardization of the data are quality assurance (QA) activities. QA is measuring the parts before they are assembled to help assure the final product passes QC. Many people often confuse these terms or use them interchangeably, but in quality management these terms are well-established. QA only happens during the building process before the final product is assembled, and QC is the inspection of the final product. The key to a good DQ management program is synchronizing QA and QC. In other words, are the data quality metrics you are using in the QA process such as completeness, timeliness, and consistency really making the end product more useful and valuable to user, i.e. helping meet QC requirements? If not, the QA process need to be adjusted, which speaks to the final component of a demand-side DQ program – analysis and improvement.
All of the defect discovered in the QA and QC measurements need to collected, classified, and saved in a reporting or ticketing system. In addition, to assigning individual tickets for problem resolution, there needs to a periodic analysis of all the tickets with an eye toward detecting systematic errors. If a systematic error is found, then its root cause should be determined, and if justified from a cost-benefit analysis, a project should be launched prevent the error and possible correcting past occurrences of the error. Here is where DQ programs and DQ project intersect. While a series of DQ project is not a DQ program, a DQ program can launch a series of DQ projects.
What I have described as demand-side DQ is nothing new. It goes all the way to the quality pioneers Deming and Shewhart and the Plan-Do-Check/Study-Act cycle. The same cycle is the basis for a new ISO standard 8000 Part 61 which establishes a Reference Model for Data Quality Management. It specifies 51 data quality activities and 82 data quality outcomes that need to be addressed in a demand-side DQ management program. I recommend the ISO 8000-61 standard as a great tool for performing a gap analysis on your DG program.