Tools, Technologies and Training for Healthcare Laboratories

When the POC and the Core Lab don't agree

When a point-of-care device is compared to a core laboratory analyzer, we assume the core laboratory analyzer is always "right". But what if we can't tell whether either the point-of-care method or the core lab method is correct? When methodology is not the same, how do you handle the differences and bias between devices?

Sigma-metrics of a POC device in comparison with a (significantly different) Core Laboratory Analyzer

August 2015
Sten Westgard, MS

sixsigma2edbook [Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the link provided.] bmv3edbig

This analysis looks at a study that is still in press, just became aviailable online on August 6th, so we're jumping the publication gun more than a little bit, but we're going to keep the names of the devices blank to concentrate on a significant issue: a lack of comparability between a point-of-care device and a core laboratory analyzer. :

Analytical Performance of the XXXX Point of Care Analyzer in Whole Blood, Serum and Plasma Forthcoming in Clinical Biochemistry 2015.

One of the most interesting part of this study is the comparison between the POC device and the core laboratory analyzer, both of which are from entirely different methodologies.  The core laboratory analyzer is in the "dry chemistry" family, while the POC device utlizies microfluidics and a small single-use reagent disc. While perhaps each device may be traceable to reference methods, for the purpose of the patient care within the health system that uses both devices, any differences or biases that exist between the methods may introduce unncessary errors. Each device may work fine in isolation, but in combination they may produce incomparable results.

The Imprecision and Bias Data

While we could look at all the sample types, we're going to focus on the serum samples, partly because we know more about the quality requirements for measuring analytes in serum. The study evaluated imprecision by "running two levels of control material supplied by the manufacturer over 16 nonconsecutive days". Unfortunately, while imprecision was measured at two levels, those levels were not specified in the study. Because of that, we're going to take an average of the two levels and use that as the overall measure of performance for the methods.

Performance of POC
Assay CV% 1
CV% 2
Avg CV%
Sodium 0.8% 0.9% 0.85%
Potassium 2.7% 1.2% 1.95%
CO2 2.9% 5.1% 4.0%
Chloride 1.8% 1.0% 1.4%
Calcium 1.6% 2.0% 1.8%
Glucose 1.1% 0.6% 0.85%
BUN 3.6% 1.6% 2.6%
Creatinine 10.1% 3.2% 6.65%
ALP 8.9% 5.8% 7.35%
ALT 4.6% 1.9% 3.25%
AST 2.4% 1.4% 1.9%
Total Bilirubin 6.9% 1.8% 4.35%
Albumin 1.8% 1.1% 1.45%
Total Protein 1.0% 0.6% 0.85%

Next, we need the bias data. The study states that "Sixty-five previously frozen serum samples drawn during routine clinical practice were thawed and immediately tested on both instruments.... All specimens were analyzed according to the manufacturer's instructions."

The study used Passing-Bablok regression to calculate the correlation coefficient, slope and y-intercept. The regression equation can be used to determine the difference between the POC device and the core laboratory analyzer methods. We will not duplicate all of that regression data, but we will show just the bias values.

One of the problems that rears its head again is that we don't have a specific level for our CV. We just have an average CV, which we can assume represents performance in the middle of the working/reportable range. So we'll need to take the range data and determine some type of midpoint. From there, we'll then assume we can use that to characterize the "average" bias and match it up to the average CV.

Newlevel = (slope * Oldlevel ) + Y-intercept

As an example, let's take Sodium, where the study determined that the POC had a slope of 0.53 and y-intercept of 62.5. The midpoint of the range that was in the study is 149. If we use the regression equation at the level of 149 mmol, this is what we see for bias

Newlevel1 = (0.53 * 149) + 62.5

Newlevel1 = (78.97 ) + 62.5

Newlevel1 = 141.47

The bias between the old and new level is the absolute value of the difference between 149 - 141.47 = -7.53. We express this bias as an absolute value: 7.53 mmol.

This is equivalent to a 5.05% bias at the level of 149. That might not sound like a whole lot, but wait until we talk about the allowable total error for sodium...

Now we'll just fill in the biases...

Performance of POC
Assay Avg CV% Avg bias%
Sodium 0.85% 5.05%
Potassium 1.95% 2.35%
CO2 4.0% 5.73%
Chloride 1.4% 2.2%
Calcium 1.8% 3.25%
Glucose 0.85% 4.7%
BUN 2.6% 0.21%
Creatinine 6.65% 7.74%
ALP 7.35% 16.2%
ALT 3.25% 19.85%
AST 1.9% 26.23%
Total Bilirubin 4.35% 7.94%
Albumin 1.45% 5.03%
Total Protein

0.85%

2.16%

 

Now you may be wondering about these bias numbers, because some of them are quite large. They seem to dwarf some of the imprecision figures. This may be our first sign that the methodology difference is going to be more significant than the precision performance of either method.

Determine Quality Requirements at the decision levels

Before we calculate the Sigma-metrics, we can judge the acceptability of the methods with separate specifications for imprecision and bias. If we consult the 2014 "Ricos goals" database for desirable specifications for imprecision and bias, we can compare those numbers to the errors we've observed in the instruments and methods:

Performance of POC
Assay Total Allowable Error [TEa%]
(source of goal)
Desirable CV% Avg CV% Desirable bias% Avg bias%
Sodium +/- 4 mmol/L  (CLIA) 0.3% 0.85% 0.23% 5.05%
Potassium +/- 0.5 mmol/L (CLIA) 2.3% 1.95% 1.81% 2.35%
CO2 +/- 25% (CAP survey) 2.0% 4.0% 1.56% 5.73%
Chloride +/- 5.0%  (CLIA) 0.6% 1.4% 0.5% 2.2%
Calcium +/- 1.0 mg/dL  (CLIA) 1.05% 1.8% 0.82% 3.25%
Glucose  +/- 6.0 mg/dL or 10%, 
whichever is greater  (CLIA)
2.8% 0.85% 2.34% 4.7%
BUN  +/- 2 mg/dL or 9%, 
whichever is greater  (CLIA)
6.05% 2.6% 5.57% 0.21%
Creatinine  +/- 0.3 mg/dL or 15%, 
whichever is greater  (CLIA)
2.98% 6.65% 3.96% 7.74%
ALP  +/- 30.0%  (CLIA) 3.23% 7.35% 6.72% 16.2%
ALT  +/- 20.0%  (CLIA) 9.7% 3.25% 11.48% 19.85%
AST  +/- 20.0%  (CLIA) 6.15% 1.9% 6.54% 26.23%
Total Bilirubin  +/- 0.4 mg/dL or 20%, 
whichever is greater  (CLIA)
10.9% 4.35% 8.95% 7.94%
Albumin  +/- 10.0%  (CLIA) 1.6% 1.45% 1.43% 5.03%
Total Protein  +/- 10.0%  (CLIA) 1.38%

0.85%

1.36% 2.16%

 

As the colors make clear, almost none of these methods are meeting the desirable bias specifications from the Ricos database, but about 75% of the analytes do meet the Ricos goals for imprecision. That makes things a bit more confusing: what does it mean when a method is acceptable by imprecision standards but not by bias standards? Do we reject the method? Accept it? This is where Total Analytic Error comes in handy. The Total Analytic Error approach combines imprecision and inaccuracy into a single estimate, which can be compared to the Allowable Total Error and an overall judgment can be made. The column which lists the CLIA goals provides us with those allowable total errors.

But we're going to hold these methods to a higher standard than just the typical or traditional Total Error. We're aiming for Six Sigma quality, not 2 or 3 Sigma quality.

Calculate Sigma metrics

Remember the equation for Sigma metric is (TEa - bias) / CV.  (everything is a percentage here)

Example calculation: for Sodium for the POC method, a 4 mmol goal for quality at a level fo 149 mmoll turns out to be a  2.68% quality requirement. Now remember we are given an average CV of 0.85% and an average bias of 5.05%:

(2.68 - 5.05) / 0.85 = -2.37 / 0.85 = ?? Negative Sigma ??

For this particular analyte, we've got some bad news.  The bias is almost twice as large as the entire error budget. We don't really report "negative sigma" - the equation is just telling us that we've got two methods that are aiming at completely different targets. The difference between the POC and core lab methods is probably clinically significant. Clinicians should be advised to be careful about comparing results between the POC and core lab methods. (One silver lining, this shows us that there is agreement between the individual precision/bias Ricos goals - when a method fails both Ricos goals for imprecision and bias, the Sigma-metric is probably going to be low as well)

Here's the full table with all the metrics:

Performance of POC
Assay TEa% Avg CV% Avg bias% Sigma-metric
Sodium 2.68 0.85% 5.05% negative
Potassium 11.8 1.95% 2.35% 4.8
CO2 25.0 4.0% 5.73% 4.8
Chloride 5.0 1.4% 2.2% 2.0
Calcium 12.5 1.8% 3.25% 5.1
Glucose 10.0 0.85% 4.7% >6
BUN 9.0 2.6% 0.21% 3.4
Creatinine 15.0 6.65% 7.74% 1.1
ALP 30.0 7.35% 16.2% 1.9
ALT 20.0 3.25% 19.85% 0
AST 20.0 1.9% 26.23% negative
Total Bilirubin 20.0 4.35% 7.94% 2.8
Albumin 10.0 1.45% 5.03% 3.4
Total Protein 10.0

0.85%

2.16% >6

Now remember, these are Sigma-metrics based on the CLIA goals, which are supposed to be more lenient than the Ricos goals. Nevertheless, the Sigma-metric CLIA verdict is still pretty harsh. While there are 2 methods that do achieve better than Six Sigma, there are also many methods below 3 Sigma. 

Recall that in industries outside healthcare, on the short-term scale, 3.0 Sigma is the minimum performance for routine use and 6.0 Sigma is considered world class quality. We're looking at the long-term scale for this Sigma-metric calculation, which is 1.5s higher (the short-term scale builds in a 1.5s shift, to allow for "normal process variation"). So possibly, we could go as low as 1.5 for the bare minimum acceptability. Still, what this is telling us is that both of these analyzers have problem assays, particularly if we compare the performance to the local core lab analyzer. 

 

Summary of Performance by Sigma-metrics Method Decision Chart and OPSpecs chart

We can make visual assessments of each instrument's performance using a Normalized Sigma-metric Method Decision Chart:

2015 POCvCore big NMEDX

With all of these methods, we can see one or two methods hit the bull's eye, but many more methods seem to be missing the target. A lot of those dots are actually "off the map" - so far off the chart that they are floating above and/or to the right of whatever screen you're looking at right now.

Now what about QC? How do we monitor and control these methods? For that, we need a Normalized OPSpecs chart:

2015 POCvCore NOPSpecs

 

Many of the methods are simply not controllable. That is, even with the full "Westgard Rules" we probably won't be catching errors when they first occur - it will take a while before we pick them up. We may need to consider running more (double or triple) controls in use for the trouble-some methods, raising the expense of running this instrument. We may even need to increase the frequency of running controls because of the poor performance. However, glucose, calcium, CO2, potassium, and total protein, these are methods that can be controlled, some of them with only single rules.

Conclusion

The authors conclusion acknowledge the performance challenges in the POC device, noting poor correlation observed in another study for "sodium, potassium, total CO2, glucose, calcium, creatinine, AST, albumin, and total protein."  Another study they reference also detailed "significant biases... for ALT, AST, alkaline phosphatase, amylase, and total bilirubin, which is consistent with our findings as well."

If this POC device only had significant discrepancies with this one particular core laboratory analyzer, that would be one thing. But the studies cited by the authors appear to provide additional evidence that the POC device suffers from significant biases with multiple core laboratory instruments. If that's true, then we can't share the blame among both the core laboratory analyzer and the POC device. It's more likely that the POC device is the cause of the bias.  For those who like to assert that bias problems are small, that they can be ignored, calibrated out of existence, etc., this is yet another demonstration that bias is real and bias can be significantly large. Even if we can't determine which method is "right" the patients who travel through a healthcare system with both of these devices will be impacted by the discrepancies in test results.

An interesting detail to note is that the POC device is waived. That is, there is no QC required beyond the manufacturer's instructions. So even if the OPSpecs chart is indicating that QC should be increased, the FDA and CMS won't required it of any US laboratory. This is yet more evidence that the FDA clearance of devices is not a guarantee of quality. It's also further evidence with POC devices, labs and healthcare systems need to be vigilant. The risk of these devices is usually higher, not lower, and QC frequency probably should be as frequent as the core laboratory, not less frequent. If this was a US laboratory, and the laboratory was considering developing an IQCP for this device, it would be very unwise to reduce QC frequency to only once a week or once a month. Indeed, performance more appropriately demands more effort in QC, not less.