Useful Tools & Techniques for Managing Data Quality: Getting Started

David K. BeckerChief Architect and Scientist @ DB-SAC
Tom Redman``The Data Doc`` @ Data Quality Solutions

Useful Tools & Techniques for Managing Data Quality: Getting Started

Most organizations are aware that data quality is an issue. But absent a measurement of data quality and an estimate of the associated costs, they are unable to develop a plan to address the issue. To deal with this situation, we summarize three inter-related techniques (Redman 2019): the Friday Afternoon Measurement (FAM), the Rule of Ten (RoT), and the Cost of Poor Data Quality (CoPDQ).

FAM results, which provide an estimate of quality as the data is used in day-to-day work, and the RoT, which estimates the cost of dealing with erred data, are inputs into CoPDQ, which is expressed as a fraction of the organization’s time wasted dealing with data quality.

 

Friday Afternoon Measurement (FAM) and Measured DQ

The first part of the process using FAM was introduced to help people make a quick and dirty DQ measurement. There are other ways to measure DQ — however, one virtue of FAM is that people can make the measurement in an afternoon. The FAM process is depicted in Figure 1.

Figure 1 – FAM Process

  1. The FAM process approach assembles a small sample (typically the most important data attributes (e.g., fields) in the data records associated with 100 units of work completed, representing the organization’s most recent business activity and most important data).
  2. The sample is then evaluated to determine the Measured Data Quality. This involves inspecting the records to identify any obvious errors, where each record is rated as either Defective (containing any obvious error) or Defect Free (perfect). The number of Defect Free records are counted, and divided by the Sample Size (number of records in the sample) to give the Measured DQ.
  3. Measured DQ can then be defined as a number ranging from 0 to 100 that represents the percent of data records created correctly the first time. Importantly, Measured DQ score can be interpreted as the fraction of time the work was done properly, the first time.

 

The Rule of Ten (RoT) and Cost

The calculation of costs in CoPDQ uses Redman’s “Rule of Ten” (Redman 2016) which states that:

“It costs ten times as much to complete a unit of work when the data is flawed in any way as it does when they are perfect.”

“Said differently, if the straight-through path costs a dollar, then the fix errors path costs ten.”

The cost process using RoT, when added on to the FAM process, is depicted in Figure 2.

Figure 2 – Rule of Ten Process

  1. The number of Defect Free records is multiplied by the Straight-thru Cost (the cost to complete a unit of work when the data in the record is perfect, typically normalized to 1) to obtain the Cost of Defect Free processing. Then the number of Defective records is multiplied by the RoT (the cost to complete a unit of work when the data in the record is flawed, typically Redman’s “Rule of Ten” number, either 10 or some other estimated multiple of the Straight-thru Cost) to obtain the cost of handling defectives (Cost of Defective). The Cost of Defect Free and the Cost of Defective are added together to obtain the actual Total Cost of processing the data.

 

Non Value-Added Cost and the Cost of Poor Data Quality (CoPDQ)

Given the prior steps we finally arrive at the process for calculating the CoPDQ, as depicted in Figure 3:

Figure 3 – Full COPDQ Process

  1. In order to determine how much of the organization’s Total Cost of data operations is non-value added we first multiply the Sample Size by the Straight-thru Cost to identify the Value-Added Cost (how much the organization would normally spend if all of the data were perfect). Then subtracting this Value-Added Cost from the Total Cost calculated above, we would obtain the Non-Value-Added Cost (how much of the money expended on data operations is wasted due to poor data quality of the incoming data). Then, by dividing the Non-Value-Added Cost by the Total Cost, we arrive at the Cost of Poor Data Quality – CoPDQ (the estimate of the percentage of the transaction processing data operations budget that is wasted due to poor data quality).

 

A Call to Action

FAM and CoPDQ can work together to stimulate an organization to action. Indeed, most organizations find their measured DQ far lower and their CoPDQ far higher than they expected. Every initiative needs a starting point. Quickly identifying how much of an organization’s processing is being wasted can be a very powerful motivator. If the true cost of poor data quality is understood, it is difficult to ignore.

 

References

Redman, T. (2016), “Getting in Front on Data:  Who Does What”, Technics Publications, September 2016.

Redman, T. (2019). “In-Depth Seminar: The Data Provocateur’s Bootcamp”. Enterprise Data World Conference, March 17-22, 2019, Boston, MA.

Back To Articles