Tools, Technologies and Training for Healthcare Laboratories

Sigma-metrics of two blood-gas devices and a core lab analyzer

A 2015 study compared the performance of two blood gas analyzers with a core laboratory instrument. Can we assume that our modern blood gas results will match the core labs results? 

Sigma-metrics of two Blood Gas Analyzers compared to a core laboratory analyzer

March 2015
Sten Westgard, MS

[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the link provided.]

This analysis looks at two (really three) analyzers and their relationship: a Noav Stat Profile Critical Care Xpress, a Siemens Rapidlab 1265, and the Olympus AU 2700 autoanalyzer:

Comparison of blood gas, electrolyte and metabolite results measured with two different blood gas analyzers and a core laboratory analyzer. Metin Uyanik, Erdim Sertoglu, Huseyin Kayadibi, Serkan Tapan, Muhittin A. Serdar, Cumhur Bilgi, Ismail Kurt,  Scand J Clin Lab Invest2015 Apr;75(2):97-105.

One interesting part of this study is that not ly the performance of pCO2 and Ph were evaluated, but also the performance of common electrolytes (Sodium, Potassium, Chloride, and Calcium), as well as two metabolites (glucose and lactate). Further, they performed the studies not only on the two blood gas analyzers, but also on the core laboratory analyzer itself. So we have almost three sets of data to look at from one facility

The Imprecision and Bias Data

The study concentrated on the comparison data, but the authors were thoughtful enough to include a precision component. "Prior to sample analysis, a precision study was performed, analyzer each internal QC 10 times consecutively." This sounds like a within-run precision study (or in ISO terminology, this might be called repeatability). In a short time period like this, the imprecision data is likely to be overly optimistic. So we'll need to take these numbers with a degree of skepticism. But let's not let that stop us just yet. Also, we only have one level of imprecision measured; we have to assume this is the most critical level for interpretation.

Performance of Siemens RapidLab 1265
Assay Level
CV% Bias%
pH 7.40 0.06%  
pCO2 5.62 kPa 3.46%  
Sodium 137.4 mmol 1.65%  
Potassium 4.09 mmol 2.64%  
Chloride 104.9 mmol 2.18%  
Calcium 1.15 mmol 3.01%  
Glucose 6.60 mmol 5.92%  
Lactate 1.87 mmol 4.48%  
Nova CCX Analyzer
pH 7.40 0.07%  
pCO2 5.52 3.25%  
Sodium 138.5 mmol 1.64%  
Potassium 4.23 mmol 2.75%  
Chloride 105.2 mmol 2.35%  
Calcium 1.17 mmol 2.96%  
Glucose 9.77 mmol 6.45%  
Lactate 2.03 mmol 4.56%  
AU 2700 (no Blood Gas measurements)
Sodium 138 mmol 3.54%  
Potassium 4.13 mmol 2.75%  
Chloride 106.0 mmol 1.97%  
Calcium 1.30 mmol 2.28%  
Glucose 6.94 mmol 4.8%  
Lactate 2.05 mmol 3.6%  

Next, we need the bias data. The study states that 40 arterial patient samples were tested on all three analyzers. The comparison of the two blood gas devices was made using the AU 2700 as the comparative method, so we won't have any bias estimates for the AU 2700 itself. Note that the blood gas instruments work on whole blood, where the core analyzer works on plasma. So there are differences in sample. But in the routine operation of the hospital, those tests results would still be compared, so the comparison is relevant.

The study used Passing-Bablok regression to calculate the correlation coefficient, slope and y-intercept. The regression equation can be used to determine the difference between the Nova CCX/Siemens RapidLab 1265 methods and the AU 2700 methods. We will not duplicate all of that regression data, but we will show just the bias values.

Newlevel = (slope * Oldlevel ) + Y-intercept

As an example, let's take Sodium, where the study determined that the Nova CCX had a slope of 1.275 and y-intercept of -37.7037. If we use the regression equation at the level of 138.5 mmol, this is what we see for bias

Newlevel1 = (1.275 * 138.5 ) - 37.7038

Newlevel1 = (176.5875 ) - 37.7038

Newlevel1 = 138.8837

The bias between the old and new level is the absolute value of the difference between 138.8837 - 138.5 = 0.3837

This is a 0.277% bias at the level of 138.5

Now we'll just fill in the biases...

Performance of Siemens RapidLab 1265
Assay Level
CV% Bias%
pH 7.40 0.06% n/a
pCO2 5.62 kPa 3.46%  n/a
Sodium 137.4 mmol 1.65%  1.06%
Potassium 4.09 mmol 2.64%  1.25%
Chloride 104.9 mmol 2.18%  1.35%
Calcium 1.15 mmol 3.01%  11.08%
Glucose 6.60 mmol 5.92%  102.47%
Lactate 1.87 mmol 4.48%  10.58%
Nova CCX Analyzer
pH 7.40 0.07%  n/a
pCO2 5.52 3.25%  n/a
Sodium 138.5 mmol 1.64%  0.28%
Potassium 4.23 mmol 2.75%  3.20%
Chloride 105.2 mmol 2.35%  1.03%
Calcium 1.17 mmol 2.96%  9.49%
Glucose 9.77 mmol 6.45%  183.48%
Lactate 2.03 mmol 4.56%  0.53%
AU 2700 (no Blood Gas measurements)
Sodium 138 mmol 3.54%  n/a
Potassium 4.13 mmol 2.75%  n/a
Chloride 106.0 mmol 1.97%  n/a
Calcium 1.30 mmol 2.28%  n/a
Glucose 6.94 mmol 4.8% n/a
Lactate 2.05 mmol 3.6% n/a

Now you may be wondering about these bias numbers, because some of them are quite large. For example, the glucose bias for the RapidLab is 183%. What's happening here? Now in this case, the "calibration of the glucose procedre for plasma specimens was accomplished with the use of the Chemistry Calibration... material, which was traceable to the...NIST Standard Reference Material (SRM) 965a." So while we can't state for certain that the Olympus method is "right" and the RapidLab is "wrong", we might place more weight on the trueness of the Olympus method. However, ultimately we don't have to assign right and wrong or allocate blame to either of the methods. All we need to realize is that patients within this hospital are likely to have their tests run on both methods, so the difference is relevant. There are biases between these methods and we would need to correct them or caution clinicians about their existence.

Determine Quality Requirements at the decision levels

Before we calculate the Sigma-metrics, we can judge the acceptability of the methods with separate specifications for imprecision and bias. If we consult the Ricos database for desirable specifications for imprecision and bias, we can compare those numbers to the errors we've observed in the instruments and methods:

Performance of Siemens RapidLab 1265
Assay Level
Desirable CV% CV% Desirable Bias% Bias%
pH 7.40  1.8% 0.06% 1.0% n/a
pCO2 5.62 kPa  2.4% 3.46% 1.8%  n/a
Sodium 137.4 mmol  0.3% 1.65%  0.23%  1.06%
Potassium 4.09 mmol  2.3% 2.64%  1.81%  1.25%
Chloride 104.9 mmol  0.6% 2.18%  0.5%  1.35%
Calcium 1.15 mmol  1.05% 3.01%  0.82%  11.08%
Glucose 6.60 mmol  2.8% 5.92%  2.34%  102.47%
Lactate 1.87 mmol  13.6% 4.48%  8.0%  10.58%
Nova CCX Analyzer
pH 7.40  1.8% 0.07%  1.0%  n/a
pCO2 5.52  2.4% 3.25%  1.8%  n/a
Sodium 138.5 mmol  0.3% 1.64%  0.23%  0.28%
Potassium 4.23 mmol  2.3% 2.75%  1.81%  3.20%
Chloride 105.2 mmol  0.6% 2.35% 0.5%  1.03%
Calcium 1.17 mmol  1.05% 2.96% 0.82%  9.49%
Glucose 9.77 mmol  2.8% 6.45% 2.34%  183.48%
Lactate 2.03 mmol  13.6% 4.56% 8.0%  0.53%
AU 2700 (no Blood Gas measurements)
Sodium 138 mmol  0.3% 3.54% 0.23%  n/a
Potassium 4.13 mmol 2.3% 2.75% 1.81%  n/a
Chloride 106.0 mmol 0.6% 1.97% 0.5%  n/a
Calcium 1.30 mmol  1.05% 2.28% 0.82%  n/a
Glucose 6.94 mmol 2.8% 4.8% 2.34% n/a
Lactate 2.05 mmol 13.6% 3.6% 8.0% n/a

As the colors make clear, almost none of these methods are meeting the desirable individual performance specifications from the Ricos database. However, we have to make one extra consideration: the goals themselves may be too tight. It may very well that the specifications from the Ricos database are not practically achievable by any of today's instrumentation. To judge the quality of these methods and instruments, we'll need to use a different set of more practical quality goals.

Instead of Ricos goals, we're going to use, for the most part, the CLIA proficiency testing criteria, which set specifications in the form of a total allowable error. Where CLIA doesn't regulate an analyte (in this study, Lactate), we'll use the desirable specification for total error from the "Ricos goals."

Analyte CLIA TEa
 pH  +/- 0.04
 pC02  +/- 5 mm Hg or 8% (greater)
 Sodium +/- 4 mmol/L
Potassium +/- 0.5 mmol/L
Chloride +/- 5.0%
Calcium +/- 1.0 mg/dL
Glucose +/- 6.0 mg/dL or 10% (greater)
Lactate 30.4% (actually a Ricos TEa)

For many of these analytes, CLIA sets a unit-based goal, which means there is a variable allowable total error across the range of the assay. We convert those unit goals into percentage goals at the level where the CV is measured and the bias is estimated.

With Sigma-metrics we can start to make some sense of the scale of the performance problem.

Calculate Sigma metrics

Now all the pieces are in place.

Remember the equation for Sigma metric is (TEa - bias%) / CV.

Example calculation: for Sodium for the Siemens RapidLab 1265, with a 2.91% quality requirement, at the level of 137.4 mmol, given 1.65% imprecision and 1.06% bias:

(2.91 - 1.06) / 1.65 = 1.85 / 1.65 = 1.1 Sigma

So here's the full table with all the metrics, where possible:

Performance of Siemens RapidLab 1265
Assay Level
CV% Bias% Sigma-metric
pH 7.40 0.06% n/a 9.0
pCO2 5.62 kPa 3.46%  n/a 2.3
Sodium 137.4 mmol 1.65%  1.06% 1.1
Potassium 4.09 mmol 2.64%  1.25% 4.2
Chloride 104.9 mmol 2.18%  1.35% 1.7
Calcium 1.15 mmol 3.01%  11.08% 3.5
Glucose 6.60 mmol 5.92%  102.47% n/a
Lactate 1.87 mmol 4.48%  10.58% 4.4
Nova CCX Analyzer
pH 7.40 0.07%  n/a 6.1
pCO2 5.52 3.25%  n/a n/a
Sodium 138.5 mmol 1.64%  0.28% 1.6
Potassium 4.23 mmol 2.75%  3.20% 3.1
Chloride 105.2 mmol 2.35%  1.03% 1.7
Calcium 1.17 mmol 2.96%  9.49% 4.0
Glucose 9.77 mmol 6.45%  183.48% n/a
Lactate 2.03 mmol 4.56%  0.53% 6.6
AU 2700 (no Blood Gas measurements)
Sodium 138 mmol 3.54%  n/a 0.8
Potassium 4.13 mmol 2.75%  n/a 4.4
Chloride 106.0 mmol 1.97%  n/a 2.5
Calcium 1.30 mmol 2.28%  n/a 8.4
Glucose 6.94 mmol 4.8% n/a 2.1
Lactate 2.05 mmol 3.6% n/a 8.4

Now remember, these are the CLIA goals, which are supposed to be more lenient than the Ricos goals. Nevertheless, the Sigma-metric CLIA verdict is still pretty harsh. Also remember that we've calculated the AU 2700 Sigma-metrics without any bias, since they were, in effect, the local reference method for the blood gas instruments. So the metrics of the AU 2700, if we took into account some form of bias (from peer group, from PT/EQA, from comparison of observed values vs. target or assayed values), we would expect the Sigma-metrics to decline.

Finally, let's tackle the subect of "n/a"  We put "n/a" whenever the bias exceeded the allowable total error. For the Nova CCX has a 183% bias for glucose, that's more than a bit larger than a 10% allowable total error. In this case, the assays are just not aiming at the same target. They are getting significantly different answers  (again, we can't say for certain which instrument is the true answer, all we can be certain is that there will be significant and possibly confusing differences between the glucose test resutlts of the two methods. 

Recall that in industries outside healthcare, on the short-term scale, 3.0 Sigma is the minimum performance for routine use and 6.0 Sigma is considered world class quality. We're looking at the long-term scale for this Sigma-metric calculation, which is 1.5s higher (the short-term scale builds in a 1.5s shift, to allow for "normal process variation"). So possibly, we could go as low as 1.5 for the bare minimum acceptability. Still, what this is telling us is that both of these analyzers have problem assays, particularly if we compare the performance to the local core lab analyzer. 

For the Nova CCX, the Siemens RapidLab 1265, and even the AU 2700, 50% of the analytes detailed here are below 3 Sigma.

Summary of Performance by Sigma-metrics Method Decision Chart and OPSpecs chart

We can make visual assessments of each instrument's performance using a Normalized Sigma-metric Method Decision Chart:

Siemens RapidLab 1265 Method Decision Chart

 

Nova CCX Method Decision Chart

Here's the core laboratory analyzer (remember, no bias estimate, so all dots are on the x-axis):

AU 2700 Method Decision Chart (no bias estimate)

With all of these instruments, we can see that a few methods hit the bull's eye, while many methods seem to be missing the target. A lot of those dots are actually "off the map" - so far off the chart that they are floating above or to the right your monitor or phone.

Now what about QC? How do we monitor and control these methods? For that, we need a Normalized OPSpecs chart:

 2015-Siemens-RapidLab1265-NOPSpecs chart

Now for the Nova CCX:

2015-Nova-CCX-NOPSpecs

Finally, for the AU 2700:

2015-AU-2700-NOPSpecs

Many of the methods are simply not controllable. That is, even with the full "Westgard Rules" we probably won't be catching errors when they first occur - it will take a while before we pick them up. We may need to consider running more (double or triple) controls in use for the trouble-some methods, raising the expense of running this instrument. We may even need to increase the frequency of running controls because of the poor performance. However, HDL, CK, Potassium, LDL, Chloride, these are methods that can be controlled, some of them with only single rules.

Conclusion

The authors abstract conclusion states "BGAs and core autolaboratory analyzer demonstrated variable performances and not all tests met minimum performance goals. It is important that clinicians and laboratories are aware of the limitations of their assays."

It's refreshing to see authors who are willing to honestly scrutinize their own data on performance. They didn't automatically jump to the conclusions that because correlation coefficients were high, the methods were acceptable. This conclusion matches our own analysis, where we find a lot of performance problems and a lot of issues that should be resolved before accepting either BGA and before reporting test results out to clinicians.

While this study was conducted at a laboratory in Turkey, it's worth noting again that this is further evidence that the POC performance is higher, not lower, risk. If this was a US laboratory, and the laboratory was considering developing an IQCP, it would be very unwise to reduce QC frequency to only once a week or once a month. Indeed, performance more appropriately demands more effort in QC, not less.