Tools, Technologies and Training for Healthcare Laboratories

Quality Goals at the Crossroads: Growing, Going, or Gone?

There are days I worry we are debating how many angels can dance on the head of a performance specification. Or that we're all blind men feeling different parts of a great elephant of error and uncertainty. But the latest "round" in the debate is depressingly familiar.

Quality Goals at the Crossroads: Growing, Going, Gone?

Sten Westgard, MS
February 2016

Goals: Growing, Going, Gone?The 2015 Milan conference re-opened the debate on analytical performance specifications (what we in the normal world call quality requirements, goals, etc.). Editorials and opinion letters have peppered Clinical Chemistry and Laboratory Medicine, suggesting that this is just the "match continues" in the debate of "TE vs MU." That the most recent set of arguments resembles the last set is a sign we may be recycling the same points:

But to be truthful, there is no reason for debate. The matter has been settled in the official metrological world for some time. If you're an ISO 15189 accredited laboratory, you don't have a choice, you are not allowed to debate, you MUST calculate measurement uncertainty. MU is mandatory. ISO 15189, which is the de facto global standard for laboratory quality, does not discuss total error. It only mandates MU.

The VIM, which is to metrology what the bible is to Catholics, is also quite clear on its intolerance of bias. In the introduction to the third edition of VIM, it notes:

“The deviation from the true value is composed of random and systematic errors. The two kinds of errors, assumed to be always distinguishable, have to be treated differently.
No rule can be derived on how they combine to form the TE of any given measurement result, usually taken as the estimate. Usually only an upper limit of the absolute value of the the total error is estimated, sometimes loosely named 'uncertainty'”.

Thus, if you are an ISO 15189 laboratory, you must follow VIM, and you cannot use Total Error. One sin of Total Error is that it combines two kinds of errors, random and systematic. The greater sin of Total Error is that by accounting for bias in its formula, it allows bias to exist. This is the contrary to Measurement Uncertainty, where, if you are an ISO 15189 laboratory, you must eliminate bias. Anytime you find a bias, you should recalibrate, apply a correction factor, or ignore it as insignificant.

While this is very clear in principle, in the practical world this is not what we see. We do not see eliminated bias. We do not always see bias as insignificant and therefore ignorable. We do not see that ISO 15189 labs only calculate measurement uncertainty. We do not see that ISO 15189 laboratories shun the use of Total Error. Indeed, if you look at most of the EQA programs around the world, you see that they express acceptance limits that include both bias and imprecision. Inherently, these EQA performance goals are expressed as combinations of random and systematic error. And when I visit labs around the world, the thousands of laboratorians I speak with almost invariably complain about having to calculate measurement uncertainty. They don't understand it - and their clinicians don't either - and they usually just hide those calculations until the ISO inspector demands to see them.

Indeed, it is this intolerance of bias that makes Total Error more useful than MU. The real world is full of bias, biases that cannot be constantly corrected, calibrated out of existence, or ignored. But let's be clear: the fact that Total Error accounts for bias should not be mistaken for complicity or a tacit condoning of the existence of bias. It's just an attempt to deal with reality of the laboratory.

Another major obstacle to the practical use of measurement uncertainty is that while total error has generated performance specifications (allowable total errors), for the most part measurement uncertainty has not. While there are papers about the concept of Target Measurement Uncertainty, there are no widespread resources available which provides actual performance specifications. In fact, for the most part, when Target Measurement Uncertainty is mentioned, the reader is quickly directed to the Total Allowable Error goals, usually the Ricos goals (desirable specifications based on within-subject biologic variation). Why advocate MU but use TE goals? It begs the question, if the Measurement Uncertainty approach is the only true way to express test results, why can't this approach develop its own set of performance specification goals? What's holding it back? The metrologists have had decades to develop these target measurement uncertainty specifications. No one is preventing them from doing so. Indeed, it would seem natural to publicize these specifications far and wide. And yet...

Speaking of Ricos, careful visitors of this website will note an absence of a 2016 update to the "Ricos Goals" database. For more than a decade, Westgard Web has been priveleged and delighted to post updates to the biologic variation database, developed and maintained by Dr. Carmen Ricos and her group at SEQC. But these updates will happen no more. This is probably the most tangible outcome of Milan: there will be no 2016 update to the database of biological variation performance specifications. This task has been taken over by the EFLM task group, who will scrub the database of old and unreliable papers, develop new standards for inclusion, and restate the database in some future but unspecified year. The EFLM website will most likely become the new home of the Ricos database.

In the meantime, it may be useful to think about where our goals are right now. What do we do while we wait for better goals? We are using the goals we've got. And that means, by and large, using "state of the art" performance specifications. When you look at EQA programs around the world, you see a lot of round numbers.

In the revised Milan Heirarchy, "state of the art" is the third type of performance specification, behind outcome-related and biologically-derived performance specifications. There is a growing realization, however, coming out of Milan, that while in the Stockholm Hierarchy, "state of the art" specifications were the lowest form of goals, in the new view, they are not quite so bad. Indeed, for most analytes, they're the only game in town and the only realistic set of expectations.

 Here, for example, are the goals for sodium:

EQA Programme CLIA
(US)
SEKK Dmax
(Czech)
Ricos Desirable TEa RCPA
(Austral-Asia)
ProBioQual
(France)
Rilibak
(German)
Spanish 2016
minimum consensus
Belgian EQA
Sodium Performance Specification  +/- 4 mmol/L  5%  0.73 +/- 3.0 mmol/ <150 mmol
2% > 150 mmol
2.5% 5% 5% 2%

 Sodium is notorious for being hard to control, due to its tight biologic variation. No method on the market can hit the "Ricos" goal for allowable total error. The most common specification is actually 5%, not less than 1%. This reflects a more pragmatic approach. Even the CLIA goal, which is stated in units, but works out to between 2 and 3 % along the reference range, is tighter than most methods can muster. And we should note, the CLIA goal is one of the tighter goals in this bunch. It's the Rilibak, Spanish consensus, and SEKK that set the loosest specifications. The 2 and 2.5% from RCPA, ProBioQual, and the Belgian EQA programs are much more demanding.

If we look at an analyte that isn't as tightly controlled by the body as sodium, we see that even the "state of the art" responds to biological variation.

EQA Programme CLIA
(US)
SEKK Dmax
(Czech)
Ricos Desirable TEa RCPA
(Austral-Asia)
ProBioQual
(France)
Rilibak
(German)
Spanish 2016
minimum consensus
Glucose Performance Specification  +/- 6 mg/dL or 10% (greater)  9% 6.96 +/- 0.4 mmol/ <5 mmol
8% > 5 mmol
5.5% 15% 11%

There's more of a range of goals here, but the mid-point if still close to the CLIA goal of 10%. Notice the ProBioQual goal is actually tighter than the Ricos goal. Demanding more out a method than is biologically possible is a bit of a paradox, like asking a camel to fit through the eye of a needle.

When we look at HbA1c, a heavily standardized and traceable measurand with certified equivalent methods, we can see that there are major differences between the recommendations:

EQA Programme NGSP
(US 2015)
SEKK Dmax
(Czech)
Ricos Desirable TEa RCPA
(Austral-Asia)
Rilibak
(German)
Spanish 2016
minimum consensus
HbA1c Performance Specification  6%  18% 3.0% +/- 0.5 % / <10%
5% > 10%
18% 11%

The outcome based goal, the goal used by NGSP with its accuracy-based, most traceable and metrologically correct EQA program, is still twice as large as the Ricos goal. Meanwhile, the German and SEKK goals are 900% larger.  This is one of the largest "spreads" in quality goals. But the progressive narrowing of quality goals by CAP and NGSP has had a major impact on the industry. It's a case where setting performance specifications aggressively has helped to move the market.

Finally, let's look at cholesterol, one of the perennial favorite assays on this website:

EQA Programme CLIA
(US)
SEKK Dmax
(Czech)
Ricos Desirable TEa RCPA
(Austral-Asia)
ProBioQual
(France)
Rilibak
(German)
Spanish 2016
minimum consensus
Belgian EQA
Cholesterol Performance Specification 10% 8.5% 9.01% +/- 0.3 mmol/ <100 mmol
6% > 100 mmol
6% 13% 11% 9%

Here's yet another case where the CLIA goal is very close to the median recommendation of around 9%. SEKK, CLIA, the Spanish Consensus, and the Belgian EQA are all very close. Rilibak is only slightly higher, while RCPA and ProBioQual are again in an odd place: their specifications are smaller than the biologically desirable specification.

Where do we go with these goals?

While the debate on specifications has been great at generating articles, conference talks, and lots of publications, what's going on in real laboratories all over the world? In the US, measurement uncertainty is wholly ignored. Some laboratorians have never even heard of it here (I know, this elicits either shudders of envy or disgust). The IQCP regulations are in fact driving the US in the opposite direction - moving toward less quantifiable expressions of measurement error toward more qualitative "risk-based" judgments. We hope that the rest of the globe will not follow the US lead in IQCP. At the same time, ISO labs are calculating measurement uncertainty as well as using Total Allowable Error in their routine operations. How is it labs can do both? Is it cognitive dissonance? Do they not realize they're supposed to be taking sides? Declaring loyalties? Exterminating one metric in favor of the other?

No. What's happening is most labs are too busy to care about the debate. They need to get their results out the door. They need to pass their next inspection (whether it's CAP or CMS or ISO). They need to endure the next set of budget cuts, consolidations, and compromises. If this means they calculate measurement uncertainty but don't actually report it to clinicians, this is what they do. If this means they use Total Allowable Error as way to assess their methods, monitor their EQA, design their QC and determine Sigma-metrics, they'll do that, too. They may do both at the same time. It's folly to think we can force a laboratory to embrace one metric by extinguishing the other metric. Outlawing Total Error won't make the concept disappear.

There is an old saying, "When elephants battle, the ants perish." One of the least productive outcomes of this TE-MU debate is that we argue about models, without moving the field forward. Having another round of debate is a disservice to the laboratory community. Do we really need more papers, more conferences, more task forces? Why not let the market decide? Let labs vote with their feet. Why not acknowledge co-existence in the meantime? Recognize the validity of measurement uncertainty in the manufacturing phase and the laboratory selection of traceable examination procedures. Recoginize the utility of Total Analytic Error in accounting for the imprecision and bias observed in the laboratory. If methods improve, and we manage to find better and better ways to eliminate bias, Total Error will fade away and Measurement Uncertainty will gain its supremacy. If bias continues to exist and thwarts our attempts to correct for or eliminate it, then laboratories must have a way to handle the observed existing bias.

If we do nothing but continue the debate, we simply perpetuate an argument about the metrological meaninglessness of Total Error and the practical uselessness of Measurement Uncertainty, and we generate another sheaf of meaningless, useless publications for the scientific literature. In time, laboratories will make their own choices, sift and winnow the field of metrics. We will see what they value in their actions. And our words, words, words, will fade to nothing. Our TE vs MU will turn to MUTE.