Tools, Technologies and Training for Healthcare Laboratories

Analytical Bias Exceeds Desirable Quality Goals in 4 of 5 common Immunoassays

Bias exists! And in a recent Clinical Chemistry study, the authors found unacceptably large biases in 4 out of 5 common immunoassays: cobalamin, folate, ferritin, TSH, and Free T4.  Does a Sigma-metric analysis agree with this judgment?

Sigma-metrics of Five Common Immunoassays: Cobalamin, Folate, Ferritin, Thyroid-Stimulating Hormone, and Free T4

October 2016
Sten Westgard, MS

[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the link provided.] six sigma quality design control

In much of the current debate on measurement uncertainty vs. total error, bias is a contentious issue. For the metrologically dogmatic, bias should not exist - as soon as it is detected, it should be eliminated, or if it can't be, ignored.  Often, real-world labs find neither alternative practical. If bias didn't exist, measurement uncertainty would be a great approach. But if we're going to embrace uncertainty and jettison all other measurements, we probably better assess the state of our assays to determine if there are significant biases for our assays. If it turns out a lot a bias actually exists, the problem isn't about choosing MU or TE, it's a problem that needs to be addressed further up the chain, by the assay manufacturer.

This study takes a look at common immunoassays, which are often left out of the MU vs TE debate because they are already more complicated in design:

Analytical Bias Exceeding Desirable Quality Goal in 4 out of 5 common Immunoassays: Results of a Native Single Serum Sample External Quality Assessment Program for Cobalamin, Folate, Ferritin, Thyroid-Stimulating Hormone, and Free T4 Analyses Gunn BB Kristensen, Pal Rustad, Jens P Beg, Kristin M Aakre. Clinical Chemistry (2016) 62:9; 1255-1263.

This study used 20 fresh native single serum samples, a native serum pool, a reference serum pool, and 2 EQA materials and sent them to 38 laboratories for measurement of these five common immunoassays. Abbott ARCHITECT, Beckman Coulter Unicel, Roche Cobas, Roche Modular, and Siemns ADVIA Centaur platforms were analyzed.

The Imprecision and Bias Data

"To reveal typical method repeatablity and robustness of calibration/measurement procedures, we calculated the pooled within-method analytical coefficient of variation (CVA) for the different componetnts" These CV values were presented in Figure 2 graphically, and we have interepreted the values from that graph in this analysis.

Bias, as always, is trickier to deteremine: "A reference method was available only for the free T4 assay.... Reference-method values were established for 5 selected native serum samples and for the serum X material at the Laboratory for Analytical Chemistry, Faculty of Pharmaceutical Sciences, Gent University (Belgium).... For the other components, the target value was defined at the total mean (Mt) value determined from 3 single mean values for the methods groups Roche, Abbott and Siemens." The Bias from target values (which was either Mt or the concentration measured by the reference method) was measured for Serum X and listed in Table 1.

The CV and bias were compared against the desirable goals listed in the 2014 biological variation database

We will highlight in red any number below that exceeds those recommendations.

Assay Platform
Serum X
Level
Ricos Desirable
allowable bias%
Abbott
ARCHITECT
Beckman
Coulter
Unicel
Roche
Cobas

Roche
Modular

Siemens
ADVIA
Centaur 
Cobalamin, pmol/L 329 17.7%  1.82% 32.83% 4.56% 4.56% 6.38%
 Folate, nmol/L 14.0 19.2%  10.0% 10.71%  15.0% 4.29% 0.71%
 Ferritin, ug/L 62.4 5.2%  2.4%  22.12%  15.06% 17.79%  13.94%
TSH, mU/L 1.69 7.8%  8.88%  0.59%  9.47% 11.83%  1.78% 
 Free T4, pmol/L 14.3 3.3%  8.39%  21.68%  5.59% 4.9% 2.8%
 Free T4, pmol/L
reference method
19.7 3.3%  32.99%  42.64% 22.84% 23.35% 24.87%

 As you can see, the bias is a big issue for most of these assays and most of these manufacturers. In 11 out of 25 cases (not taking into account the Free T4 reference comparison), the methods are exceeding the acceptable bias.

We can agree with the orthodox metrologists that this bias is an indication that manufacturers must do better to build traceable, standardized methods for these assays. However, the paper points out that many of the manufacturers merely conform to an "in-house industry standard", which sounds a lot to me like the manufacturer is purposely NOT standardizing to an international reference method or material. As it happens, the manufacturer here who has the most number of international traceable methods is also the manufacturer with the least number of intolerable biases. But regardless of manufacturer, everyone can agree that the methods need to be built so that there is less bias and more traceability.

Where the front-line laboratory and the orthodox metrologist may part ways is in what to do while those methods are being developed. Realistically, a new standardized method may take months, if not years, to be developed and launched. It may require a new platform, in which case it may be half a decade or longer before some of these manufacturers bring their methods in line.

in the meantime, what should labs do? "Eliminate" the bias by recalibrating? Ignore it? Take a long term estimate (months and months) of the bias and "convert" it into a random error and combine it with the other imprecisions to generate a measurement uncertainty? Re-baseline all patients to a new reference range and reset all the therapeutic cutoffs to account for the biases? Run more controls?

We'll try to make some sense of this in a moment. But first, let's look at the imprecision. Please note what we mentioned before: the CV values are being estimated from inspection of the bar graph (Figure 2) in the paper. So the precise imprecision figures are not known (were not provided in the paper or the supplementary materials).

Assay Platform
Serum X
Level
Ricos Desirable
allowable CV%
Abbott
ARCHITECT
Beckman
Coulter
Unicel
Roche
Cobas

Roche
Modular

Siemens
ADVIA
Centaur 
Cobalamin, pmol/L 329 7.5%  10.5% 32.83% 3.5% 4.1% 10.0%
 Folate, nmol/L 14.0 12%  9.75% 3.5%  6.5% 6.0% 12.50%
 Ferritin, ug/L 62.4 7.1%  4.0%  5.75%  4.75% 6.25%  9.0%
TSH, mU/L 1.69 9.7%  3.75%  4.25%  3.0% 5.0%  5.5% 
 Free T4, pmol/L 14.3 3.6%  8.39%  21.68%  3.5% 4.9% 2.8%

 In 8 out of 25 cases, the CV is unacceptable.

Calculate Sigma metrics

The Sigma-metric approach takes both imprecision and bias into account in a single equation. We're going to calculate Sigma-metrics using the Ricos Desirable Goals for TEa.

Remember the equation for Sigma metric is (TEa - bias%) / CV.

Example calculation: for the Abbott ARCHITECT, for Vitamin B12 and a  a 30.0% allowable total error, given 10.5% imprecision and 1.82% bias for males:

(30.0 - 1.82) / 10.5 = 28.18 / 10.5 = 2.68 Sigma

So while the method has a great bias, but poor imprecision, and the net effect is that this method is below three sigma.

So here's the table with all the Sigma-metrics, again according to the Ricos goal:

Assay Platform
Serum X
Level
Ricos Desirable
Allowable TEa%
Abbott
ARCHITECT
Beckman
Coulter
Unicel
Roche
Cobas

Roche
Modular

Siemens
ADVIA
Centaur 
Cobalamin, pmol/L 329 30.0%  <3 <3 >6 >6 <3
 Folate, nmol/L 14.0 39.0%  3.0 >6  3.7 5.8 3.1
 Ferritin, ug/L 62.4 16.9%  3.6  <3  <3 <3  <3
TSH, mU/L 1.69 23.7%  4.0  5.4 4.7 <3  4.0 
 Free T4, pmol/L 14.3 8.0%  <3  <3  <3 <3 <3
 Free T4, pmol/L
reference method
19.7 8.0%  <3  <3 <3 <3 <3

In 13 out of 25 cases, the manufacturers are performing below three Sigma. In other words, the Sigma-metrics are tougher in their judgement on these methods than either the CV or bias judged in isolation. And regardless of whether we're using the Free T4 reference method or not, no manufacturer has an acceptable method. There's a real need to improve the performance of these methods. Standardization is important but also better precision is needed.

Summary of Performance by Sigma-metrics Normalized Method Decision Chart

We can make visual assessments of this performance using a Normalized Sigma-metric Method Decision Chart:

2016 IA Comparison Colabamin VB12 NMEDx

Roche appears to have the best VB12 methods.

2016 IA Comparison Ferritin NMEDx

Abbott has the best Ferritin method.

2016 IA Comparison Folate NMEDx

While Beckman has a lot of issues with the other analytes, in this case, it has the best Folate method.

2016 IA Comparison Free T4 NMEDx

This graphic demonstrates the performance when the 8% Ricos goal is applied, and the bias is figured not against the reference method, but against the Mt.

2016 IA Comparison TSH NMEDx

TSH is an assay where manufacturers seem to perform the best. No one is right in the bull's-eye, but there is a lot of good and excellent performance.

Conclusion

The authors conclude "[F]or commonly used immunoassays, large method differences still persist. Reference intervals used by local laboratories are not always adjusted in accordance with existing method differences, and commercial EQA samples showed noncommutability on several occasions and, therefore, were useless as a tool for harmonization. The elimination of method differences would be a major contribution to future patient care, and the increased focus lately on commutability, as well as the IFCC global campaign on reference values are important steps in the right direction. However, this fundamental task for patient care may only be solved by laboratory professionals, healthcare authorities, manufacturers, and EQA providers jointly demanding commutability at all steps of the traceability chain. The clinical chemistry communities should keep up and increase their focus on this issue until clear improvement is demonstrated."

Not only do we need focus, we need pressure. We need to push manufacturers to improve, and penalize them when they are not living up to the needs of our patients.