Tools, Technologies and Training for Healthcare Laboratories

Multitest Chemistry Analyzer - 16 Years Later

The unspoken assumption in laboratories and diagnostics is that we're getting better all the time. Our new systems are better than our old ones. But does the data show this? Just looking at a Westgard et al. paper from 1990 and a collection of recent instrument performance data provides a challenge to this thinking.

June 2006

The Original papers.

Back in 1990, Dr. Westgard and colleagues produced an analysis of the Hitachi Multichemistry AutoAnalyzer:

  • Koch DD, Oryall JJ, Quam EF, Feldbruegge DH, Dowd DE, Barry PL, Westgard JO. Selection of medically useful quality control procedures for individual tests done in a multitest analytical system, Clin Chem 1990;36:230-233).
  • Westgard JO, Oryall JJ, Koch DD. Predicting effects of QC practices on the cost- effective operation of a multitest analytical system. Clin Chem 1990;36:1760-64.
  • Westgard JO, Barry PL. Cost-Effective Quality Control: Managing the Quality and Productivity of Analytical Processes. Washington DC, AACC Press, 1986, p 2.

Given the current insistence that analytical quality is no longer a problem, we thought it would be useful to make a comparison of recent data with this past data.

Some more recent data: a fruitbowl approach

It's very difficult to get an apples-to-apples comparison of performance data. We can't recreate the conditions of 16 years ago. The instruments are several generations more advanced: higher volume, more sophistication, with different capabilities and requirements. The staff that maintains and operates these instruments has changed, too. There are more educational requirements for lab workers these days, even as other factors such as cost pressures and long hours work against the acquisition of the best candidates.

So we've got a fruitbowl approach: there are sets of data here from different laboratories in different countries on different instruments. The amount of data for each study is different; sometimes the data is from hundreds of points, other times only from an initial method validation study. A perfect comparison isn't really possible. Nevertheless, the comparison of then and now should be useful.

We've selected here a number of method validation studies available to us. Several of them have been highlighted on this website before:

  • 2002: A Roche Integra from the Laboratory at St. Joseph Hospital in Houston, Texas, as highlighted previously on this website. The data is listed here as Texas.
  • 2005 AACC poster: Dade RxL instruments from "A Multisite Validation that selected 'Westgard Rules' is efficient and cost effective." David Plaut, Plano, TX; Beth O'Neill, Monongahela Hospital, Morgantown, WV; Shane Holonitch, Quest Laboratories, Denver, Colorado; Sherry Tighe, University Hospital, Cleveland, Ohio; Annette Yamaguchi, Margaret Mary Medical Center, Batesville, Indiana; Linda Griffiths and Wendy Palmeter, Memorial Hospital, Craig, Colorado. The Data is listed here as Ohio and Indiana.
  • 2005 AACC poster: "Assessment of Total Clinical Laboratory Performance: Normalized OPSpecs Charts, Sigma-Values and Patient Test Results", Diler Aslan, Selahattin Sert, Hulya Aybek, Gamze Can Yilmazturk, Pamukkale University, School of Medicine, Biochemistry Department, Denizli Turkey. The data is listed here as Turkey.
  • 2005 AACC poster: "Six Sigma in Clinical Laboratory: comparison of two automated chemistry systems" F.A. Berlitz, M.L.Haussen. Weinmanns Laboratory, Porto Alegre, Brazil. The data is listed here as Brazil.

The CVs

Note first that we are only considering the imprecision figures (smeas or CV, expressed as a percentage). Bias is not measured or compared here. In cases where the studies provided more than one imprecision figure (for example, imprecision figures listed at 2 levels), we report below the lower imprecision figure. Thus, this is the most optomistic comparison possible. To make the results easier to understand, numbers highlighted in red indicate imprecision that is worse than the original paper, and nubmers highlighted in blue indicate impercision that is better than the original paper.

TestTEa (%)smeas (%)TexasOhioIndianaBrazilTurkey
Sodium3.08 % / 4.0%0.52%1.20.90.51.510.7
Potassium10.01.171.51.40.71.511.3
Chloride4.0 / 10.01.040.61.01.01.58
Total CO210.02.50 1.35.7
Glucose8.0 / 10.01.200.91.131.761.972.2
BUN10.01.331.32.333.4
Creatinine30.03.001.21.531.762.292.1
Calcium5.0 / 10.01.681.01.91.712.300.9
Phosphorus10.01.28 1.41.4 0.9
Uric acid10.01.101.1 1.0
Cholesterol10.01.351.31.411.411.971.4
Total protein12.0 / 10.01.841.41.41.41.641.5
Albumin10.02.131.51.031.251.531.0
Total bilirubin20.02.204.91.470.91 10.4
GGT10.01.17 3.6
ALP10.01.171.2 2.0
AST20.03.003.91.51.41.0
LD20.0 / 30.03.002.01.72.01.7

The quick summary is that there are 30 cases where imprecision is worse, 38 cases where imprecision is better, and 1 case where imprecision is the same. A few analytes jump out, like Total protein and Albumin, where all the new studies show improvements over the original paper.

Resulting Sigma metrics

Using the Six Sigma calculations, we can convert these imprecision figures into Sigma metrics. Again, note that we are not taking any bias into account.

TestOriginalTexasOhioIndianaBrazilTurkey
Sodium
7.693.334.448.002.655.71
Potassium
8.556.677.1414.296.627.69
Chloride
9.6216.6710.0010.006.33
Total CO2
4.00 7.691.75
Glucose
8.3311.118.855.685.084.55
BUN
7.527.694.292.94
Creatinine
5.0012.509.808.526.557.14
Calcium
5.9510.005.265.854.3511.11
Phosphorus
7.81 7.147.14 11.11
Uric acid
9.099.09 10.00
Cholesterol
7.417.697.097.095.087.14
Total protein
5.437.147.147.146.106.67
Albumin
4.696.679.718.006.5410.00
Total bilirubin
9.094.0813.6121.98 1.92
GGT
8.55 2.78
ALP
8.558.33 5.00
AST
6.675.1313.3314.2920.00
LD
10.0015.0017.6515.0017.65

Since these metrics do not take bias into account, they are optimistic (and higher) than other metrics reported on this website or in other literature. Nevertheless, it's worth noting that even back in 1990, the original paper reported that 11 of the 18 analytes performed at Six Sigma or higher. Only 7 analytes were less than Six Sigma, and of those, only 2 of those analytes were below Five Sigma.

Since the metrics are based solely on the imprecision, the distribution of the metrics are the same as earlier table: there are 30 cases where the Sigma metrics are worse, 38 cases where the Sigma metrics are better, and 1 case where the Sigma metric is the same. What's more interesting to note is that there are 50 cases where performance is higher than Six Sigma, 8 cases where the metric is between 5 and 6 sigma, 5 cases where the metric is between 4 and 5, 1 case where the metric is between 3 and 4, and finally 5 cases where the metrics are below 3 sigma.

What do these differences mean?

Beyond the numbers, there is a practical significance to the metrics. The QC procedures necessary for the the analytes depend on the Sigma metrics. Thus, the cases where Sigma is above 6, QC is very easy - go to the mininum number of controls (2, as CLIA specifies) and wide limits such as 3s or 3.5s.

TestTEa (%)Original PaperTexasOhioIndianaBrazilTurkey
Sodium413.5s N=213s/22s/R4s/41s N=412.5s with N=413.5s N=2Multirule N=413s N=2
Potassium1013.5s N=213.5s N=213.5s N=213.5s N=213.5s N=213.5s N=2
Chloride1013.5s N=213.5s N=213.5s N=213.5s N=213.5s N=2
Total CO210Multirule N=413.5s N=2Multirule N=4
Glucose1013.5s N=213.5s N=213.5s N=213s N=212.5s N=412.5s N=4
BUN1013.5s with N=213.5s with N=212.5s with N=4Multirule N=4
Creatinine1512.5s with N=213.5s with N=213.5s with N=213.5s with N=213.5s with N=213.5s with N=2
Calcium1013.5s with N=213.5s with N=213s with N=213s with N=212.5s with N=413.5s with N=2
Phosphorus1013.5s with N=213.5s with N=213.5s with N=213.5s with N=2
Uric acid1013.5s with N=213.5s with N=213.5s with N=2
Cholesterol1013.5s with N=213.5s with N=213.5s with N=213.5s with N=212.5s with N=213.5s with N=2
Total protein1013s with N=213.5s with N=213.5s with N=213.5s with N=213.5s with N=213.5s with N=2
Albumin1012.5s with N=213.5s with N=213.5s with N=213.5s with N=213.5s with N=213.5s with N=2
Total bilirubin2013.5s with N=212.5s with N=413.5s with N=213.5s with N=213s/22s/R4s/41s/8x N=4
GGT1013.5s with N=213s/22s/R4s/41s/8x N=4
ALP1013.5s with N=213.5s with N=212.5s with N=2
AST2013.5s with N=212.5s with N=213.5s with N=213.5s with N=213.5s with N=2
LD3013.5s with N=213.5s with N=213.5s with N=213.5s with N=213.5s with N=2

Here we see that the QC procedures necessary are mostly the same. 40 cases where the QC procedure didn't change, 12 cases where less QC is needed than the original paper (notably concentrated in two tests, Albumin and Creatinine). Most troubling, though, are the 17 cases where more QC is needed than the original paper.

Conclusions

The data here does not support the conclusion that instrument performance have gotten so good that QC is no longer needed or can be reduced. Indeed, the cases where performance is worse outnumber the cases where performance is better.

Remember, these are the optimistic figures. If we took bias into account, the metrics would be lower, and the QC needs would be greater.

References

  1. Koch DD, Oryall JJ, Quam EF, Feldbruegge DH, Dowd DE, Barry PL, Westgard JO. Selection of medically useful quality control procedures for individual tests done in a multitest analytical system, Clin Chem 1990;36:230-233).
  2. Westgard JO, Oryall JJ, Koch DD. Predicting effects of QC practices on the cost- effective operation of a multitest analytical system. Clin Chem 1990;36:1760-64.
  3. Westgard JO, Barry PL. Cost-Effective Quality Control: Managing the Quality and Productivity of Analytical Processes. Washington DC, AACC Press, 1986, p 2.