Tools, Technologies and Training for Healthcare Laboratories

2019 Comparison of TT4 methods

A recent study of the automated immunoassay methods compared the major diagnostic instruments against a ID-LCMS reference method. When compared to the "true values", can any methods hit the most evidence-based biological variation-derived performance specifications? Can any methods hit the minimum performance specifications, for that matter?

Sigma-metric Analysis of Six Automated TT4 Immunoassay methods compared to Isotope-Diluted Liquid Chromatography-Tandem Mass Spectrometry

Sten Westgard, MS
May 2019

[6/14/18 NOTE: An error in the original posting had the incorrect constant bias for the Siemens method. The constant had been entered as 12.8, when in fact the constant error as -12.8. All data tables and graphics have been updated with the correction. We regret the error.]

[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma metrics. If you aren't, follow the link provided.]

In February of 2019, a new study was published evaluating six automated TT4 methods:

Comparison of Six Automated Immunoassays With Isotope-Diluted Liquid Chromatograpy-Tandem Mass Spectrometry for Total Thyroxine Measurement,  Songin Y, Zhou W, Cheng X, Meng Q, Li H, Hou L, Lu J, Xie S, Cheng Q, Zhang C, Qui L, Ann Lab Med 2019;39:381-387.

The Imprecision, Bias and Sigma-metric Data

"Certified reference materials for TT4 (CRM21201 and cRM20202) were provided by Professor Lothar Siekmann of the German Society of Clinical Chemistry and Laboratory Medicine (DGKL).... The CRMs were measured in three runs, and with triplicate measurements in each run."

"[T]hree serum pools from Bio-Rad (Hercules, CA, USA; lot 40300, levels 40301, 40302, and 40303) that were used as quality control materials were prepared for assessing immunoassay imprecision prior to comparisons. Following the CLSI EP15-A... on five consecutive days, one freshly thawed aliquot of each pool was measured four times by all immunoassays."

The TEa goals used include Spanish minimum goals as well as the Ricos 2014 goal for TSH.

Instrument and level Spanish
Minimum TEa
Ricos 2014
TEa
Level Bias% CV%
TT4 Abbott Q1 24 7 56.40 19.6 2.1
TT4 Abbott Q2 24 7 144.00 11.6 2.6
TT4 Abbott Q3 24 7 196.00 10.2 2.9
TT4 Autobio Q1 24 7 67.30 11.3 8.2
TT4 Autobio Q2 24 7 133.00 5.7 4.7
TT4 Autobio Q3 24 7 156.00 4.9 3.2
TT4 Beckman Q1 24 7 48.60 7.9 5.9
TT4 Beckman Q2 24 7 129.00 6.1 5.1
TT4 Beckman Q3 24 7 184.00 5.8 4.5
TT4 Mindray Q1 24 7 58.40 3.2 4.2
TT4 Mindray Q2 24 7 138.00 7.8 3.3
TT4 Mindray Q3 24 7 171.00 8.4 3.1
TT4 Roche Q1 24 7 60.50 3.1 2.0
TT4 Roche Q2 24 7 137.00 0.2 1.9
TT4 Roche Q3 24 7 165.00 0.2 1.7
TT4 Siemens Q1 24 7 61.30 2.1 5.3
TT4 Siemens Q2 24 7 158.00 14.9 2.6
TT4 Siemens Q3 24 7 187.00 16.2 3.6

 Yes, that is a whole lot of numbers!

Nevertheless, what do all these numbers mean? In the absence of context, it's hard to know.

So let's calculate the Sigma-metrics.

Sigma-metric calculations for the ADVIA Centaur XPT

Remember the equation for Sigma metric is (TEa - bias) / CV:

For a 24% quality requirement, for TT4 on Abbott Q1, the equation is (24 - 19.6) / 2.1 = 2.1

The metrics are displayed along the right column.

Instrument and level Spanish
Minimum TEa
Ricos 2014
TEa
Level Bias% CV% Ricos 2014
Sigma metric

Spanish
minimum
Sigma metric

TT4 Abbott Q1 24 7 56.40 19.6 2.1 negative 2.1
TT4 Abbott Q2 24 7 144.00 11.6 2.6 negative 4.8
TT4 Abbott Q3 24 7 196.00 10.2 2.9 negative 4.8
TT4 Autobio Q1 24 7 67.30 11.3 8.2 negative 1.6
TT4 Autobio Q2 24 7 133.00 5.7 4.7 0.3 3.9
TT4 Autobio Q3 24 7 156.00 4.9 3.2 0.67 5.98
TT4 Beckman Q1 24 7 48.60 7.9 5.9 negative 2.7
TT4 Beckman Q2 24 7 129.00 6.1 5.1 0.2 3.5
TT4 Beckman Q3 24 7 184.00 5.8 4.5 0.3 4.1
TT4 Mindray Q1 24 7 58.40 3.2 4.2 0.90 4.95
TT4 Mindray Q2 24 7 138.00 7.8 3.3 negative 4.9
TT4 Mindray Q3 24 7 171.00 8.4 3.1 negative 5.03
TT4 Roche Q1 24 7 60.50 3.1 2.0 1.9 >6
TT4 Roche Q2 24 7 137.00 0.2 1.9 3.6 >6
TT4 Roche Q3 24 7 165.00 0.2 1.7 4.0 >6
TT4 Siemens Q1 24 7 61.30 2.1 5.3 0.9
4.1
TT4 Siemens Q2 24 7 158.00 14.9 2.6 negative 3.5
TT4 Siemens Q3 24 7 187.00 16.2 3.6 negative 2.18

 Yes, there are a lot of low Sigma-metrics.  Whenever there is a "negative" number, that's because the bias outweighs the allowable total error. Essentially, there is a clinically significant bias present.  The numbers are very low particularly for the Ricos 2014 goal - most methods simply do not get acceptable Sigma-metrics.

Summary of Performance by Sigma-metrics Method Decision Chart

We can make visual assessments of this performance using a Normalized Sigma-metric Method Decision Chart:

2019 TT4 c Ricos NMEDX

This visual assessment gives a stark display. Most methods are completely missing the target, are even "off the chart" poor in performance. (Whenever you see a dot tucked at the edge of the graph, it means we ran out of room on the chart.)

2019 TT4 c SpanMin NMEDX

With the Spanish Minium goal, we see some world class methods as well as some unacceptable specimens. So there is a differentiation of performance, a range from great to horrible. We usually take that as a sign that the performance specification is more practical. It's not a rubber stamp making every method look rosy, nor is it an impossible target that every method misses.

Summary of QC Design by Normalized OPSpecs chart

The benefit of the Sigma-metric approach is that labs can do more than assess their quality, they can act on it. Using OPSpecs charts, they can actually optimize their QC procedures for each test. In this case, they can use the data to try and mitigate the risk of poor performance.

2019 TT4 c Ricos NOPSPECS

For most of these points, we will need the maximum "Westgard Rules" with 8 control measurements per run. To be brutally honest, though, there is no practical statistical quality control procedure that can help. There aren't enough "Westgard rules" to keep most of these methods in control. We can't afford to run enough controls, to respond to enough rules, to run the QC as frequently as it needs to be run.  You could spend more money on controls than the instrument or method would be worth itself.

 2019 TT4 c SpanMin NOPSPECS

With the Spanish Minimum goal, there's some hope at least. At least one method is right in the bull's-eye, and can re-design the QC procedure so that no "Westgard Rules" are needed - just a 1:3s rule and simply 2 controls. Other methods will need to use the full "Westgard Rules" with double the number of controls (4) or even four times the numbers of control (8).

 

Conclusion

"In conclusion, though all the immunoassays tested in this study correlated strongly with ID-LC-MS/MS, most did not meet the minimum clinical requirements derived from biological variation. Thus, efforts to reduce imprecision and standardize TT4 detection remain necessary. Laboratories and manufacturers must be aware of the assay limitations and improve the performance of these assays."

Facing the inability of all methods to achieve the biological variation goals, we would advise choosing a more realistic goal, at least for the immediate future. Certainly the manufacturers must think to future engineering and standardization improvements so that future methods will have a chance at hitting the tight targets. But for today, the Spanish minimum goal of 24% allowable total error produces a differentiation of performance between the methods on the market. Some methods are better and some are worse according to this goal. There is a chance then to design QC in a way that's feasible for labs to operate.