USA National Data Quality Pandemic – Part 2 of 2

USA National Data Quality Pandemic – Part 2 of 2

How did I get here ? I suggest you read Part 1 of 2 to understand the first attempt and basic calculations of the commercial damage to the US economy caused by poor quality information moving through technology systems. After six months of more production metrics with ClearDQ clients, and peer review (globally), that first set of numbers can be refined even further. So read on…

Information Time Capital – Establishing a GDP Floor Price

This is my next best attempt to apply some more rigour and refinement to this estimation challenge. I do ask and encourage others to attempt this in an effort not only to “measure” the scale of the pandemic, but come up with solutions and ideas to “solve” the problem (such as, as the IT and Data Industries have faltered (despite noble efforts). This calculation sets a floor-price or base price on just one aspect of the Information Industry, namely the negative effect on worker productivity due to poor quality information (causing rework, delays or stoppages to primary job function), what I call Information  Time Capital (an Intangible Asset Class).

Poor Information Quality can be defined as, but not limited to:

  • Data/Information being inaccessible, due to inferior data access tools or data access capabilities
  • Data/Information arriving late, missing critical deadlines
  • Data/Information values being incorrect, requiring repairs or re-runs, or other manual intervention
  • RequiredData/Information having to be re-requested, causing delays
  • Data/Information data having to be re-reconciled or reworked prior to being input into the  next work process

The previous article rated the damage in the 2016 US Economy between $1.262T (low range) and $3.108T (high range) per annum

This article further refines the recoverable damage to US Economy at 5.53% GDP or  $1.027T using the ClearDQ pre-production estimation method to scope the scale of  just the Information Time Capital portion of this problem.

Whilst this is lower than the $3T figures referenced by others, and my own high range figure of$3.108T  from the Part 1 effort,  this is the portion ($1.027T) that would be measurable and potentially recoverable if the ClearDQ method and software was actually executed cross the US economy.

In summary, this challenge started with Part 1, when IBM quoted about $3.1T in annual economic damage to the US economy, with no reference, backing, source or formulas.

This number was then re-quoted a year ago by globally respected Data Quality expert Tom Redman in a Harvard Business Review article called “Bad Data Costs the U.S. $3 Trillion Per Year. Tom and I talked about and debated these numbers, and voila, we agreed more granular numbers needed to be found – that’s how Part 1 manifested.

Since writing Part 1, and digging a little deeper, I did stumble into a 2011 article by Hollis Tibbetts called “$3 Trillion Problem: Three Best Practices for Today’s Dirty Data Pandemic” where he has a valiant attempt (the first I believe ?) to do the calculation and came up with $3T back then.

Banking and Energy, Country by Country…

This Part 2 effort, raises the bar a notch because I am now introducing the concept of Gross Domestic Product (GDP) valuation of data quality damage on an economy, as I intend to chisel into specific nations and industry verticals within countries to dog regulators to insist on accurate accurate commercial measurements  in an effort to strip away national and industry-wide inefficiencies.

For example, take the Australian Energy industry which is in crisis with national electricity shortfalls across the eastern seaboard due to collapsed generating capacity, resulting in skyrocketing energy prices.  Just one non-generating participant we have spoken to  in  the grid could be saving $37m per annum. This is an industry desperate for any efficiencies and savings. The market operator AEMO could consider measuring the entire industry, and penalise the inefficient players.

Another industry could be the Australia Banking system at large. Just one of the major publicly listed banks could be saving $700m per annum (or $2.1B over 3 years). Multiply that sort of saving across the banking industry could see a dramatically different operating landscape for Banking. In this case the regulator APRA could consider enforcing more specific metrics via it’s existing “CPG 235 – Managing Data Risk” Guide which already demands some Data Quality Metric initiatives (Data Quality Metrics points 63, 64, 65) which are presently anchored in risk, not commercial value.

Calculating the 5.53% GDP Recoverable Damage

The approach is relatively simple, so that any Data Scientist, Accountant or Lawyer can also follow along, which is important, since this is a “business problem” not a “technology problem” and requires business people to lead the recovery of the data damage.

First the caveats.

ClearDQ is a fast, reliable and production proven method of pricing economic damage (Information Time Capital)  in an organisation caused by bad data/information, and maps that damage to root causes and several ingestible Data Maturity Models. It uses the OUTSIDE-IN approach which was first documented by Jack Olsen in 2003 in his book called “Data Quality: The Accuracy Dimension” , and developed in the Business Process Management (BPM) but never applied industrially in algorithmic form in the Data Quality industry until 2012, when pioneered by ClearDQ which is  data-enabled business process aware.

Before executing ClearDQ, we always come up with a quick estimate to see if ClearDQ is worth running. The calculation you see here is only the pre-production estimation being applied, which is a free service, and we are happy to share it. We also offer online estimators as well for the time-poor who just want a $ figure, instantly.

Future releases of ClearDQ method and the software will also be free when the whole method and software stack is Open Sourced and Copy Left, as I believe enabling others to build faster/better data valuation products is better for society. I am sure others can do this better than me, but unfortunately as of 2017 no-one seems to have bothered to attempt this, with any production backed-results, nor publish their  results, except for the attempt made by Hollis Tibbetts in 2011. If that is incorrect, please let me know.

Now on with the 2016 US Data Quality Pandemic Calculation (Second Attempt – 6th Sept 2017).

US Economy 2016 Calculation Inputs

  • (A) 15% – ClearDQ production metrics since 2012 declare data quality damage runs between 13% and 24% of workforce productivity. We are going to use a conservative number of 15% of workforce.  source: ClearDQ
  • (B) $18.56T – This is the annual GDP of the USA economy in 2016. Source: International Monetary Fund
  • (C) 158.6m – Labour force. Source: US Dept of Labor
  • (D) 4.7% included in (C) to be removed from the calc. Source: US Dept of Labor
  • (E) Employed Workforce Pool = (C) – (D) = 151,145,800 workers. Source: US Dept of Labor
  • (F) Labour Force by Sector (we will be using only 79.1%).Source: US Dept of Labor, This is made up of:
    00.7% – farming, forestry, and fishing (we will exclude this group)
    20.3% – manufacturing, extraction, transportation, and crafts(we will exclude this group)
    37.3% – managerial, professional, and technical(we will include this group)
    24.2% – sales and office (we will include this group)
    17.6% – other services (we will include this group)
  • (G) 119,556,328 Impacted Workers
  • (H) 17,933,449 – Lost productivity in Impacted Worker pool
  • (I) $57,300 – GDP per capita. Source: International Monetary Fund
  • (J) $1.027T – in recoverable productivity loss per annum due to poor quality data – Information Time Capital lost.
  • (K) 5.53% of GDP – recoverable loss due to poor quality data – Information Time Capital lost expressed as a % of GDP.

US Economy 2016 Calculation Formula & Sequence

WORKFORCE CALCS – 119,556,328 Impacted Workers

  • STEP 1 – (C) x (D) = 7,454,200 Unemployed Pool
  • STEP 2 – (C) – 7,454,200 Unemployed Pool = Employed Workforce Pool =  151,145,800  = (E)
  • STEP 3 – (E) x (F) = (G) = Impacted Workers = 119,556,328  workers

Recoverable DQ Damage Overhead – $1.027T

  • STEP 1 – (A) x (G) = (H) 100% of Lost Worker Productivity in Workers = 17,933,449 Worker Equivalents
  • STEP 2 – (H) x (I) = (J) = recoverable productivity loss per annum due to poor quality data = $1.027T

Recoverable DQ Damage Overhead – 5.53% of GDP

  • STEP 1 – (J) / (B) = (K) = $1.027T / $18.46T = 5.53% of US GDP in 2016


If you are interested in calculating the “actual & total” data quality damage circulating through your organisation today, along with the identification of the root causes and how to recovery back the economic damage, you may need our automated tool and our algorithms to do that quickly, accurately and at scale.

Our framework, method (algorithms), software tool, and client references will satisfy your accounting, operational, risk, human resource  and data experts.

Author: Martin Spratt, 6th September  2017. Martin Spratt is a data value guru, author and CDO advisor, held hostage in Melbourne by 4 women and a cat, and survives on cappuccinos. This article first appeared on

Data Quality for Financial Performance