Tools, Technologies and Training for Healthcare Laboratories

Vitros 5.1

A poster from the 2005 AACC convention provides us with a great set of data on CV and bias for a new chemistry instrument. From that data, we can evaluate the Sigma performance and the QC required by the instrument. We can even compare the new instrument with Sigma metrics from other competing instruments.

From Method Validation to Six Sigma Metrics: Evaluating a New Instrument

August 2005

[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the link provided.]

At the 2005 AACC/ASCLS/IFCC conference, a poster was presented on the "Performance evaluation of VITROS 5.1 FS Chemistry System" (E. Melo-Gomes, G. Belo, I. Araujo, P. Marques, T. Eloy, S. Sousa, R. Matos. Egas Moniz Hospital, Lisbon, Portugal) This poster provides a perfect opportunity to convert performance characteristics into Sigma metrics.

Recap: What do you need to go from method validation to Six Sigma?

From the Method Validation study, preferably performed in your laboratory. Otherwise, use the data provided by the manufacturer:

From other sources:

What data is available?

In this case, the paper provides a good starting point:

Analyte Level CV P-B Regression Equation n r
Glucose 82.2 mg/dL 1.98% y = 1.000x -0.500 122 0.987
292.1 mg/dL 2.58%
Creatinine 1.05 mg/dL 3.13% y = 1.000x - 0.100 143 0.978
5.87 mg/dL 2.84%
K+ 2.91 mmol/L 1.47% y = 1.000x - 0.200 118 0.973
7.95 mmol/L 2.47%
Ca2+ 9.24 mg/dL 1.93% y = 1.111x - 0.589 85 0.933
12.43 mg/dL 1.58%
LD 439 U/L 2.31% y = 0.984x - 19.937 83 0.932
1450 U/L 1.80%

We include the correlation coefficients - but remember, those numbers alone are not sufficient grounds to judge that the method is acceptable. All the r tells you is that the the regression techniques (in this case, Passing-Bablock) are good enough to use for this comparison. When the r is below 0.97 or 0.95, methods like Deming or Passing-Bablock regression are recommended. More details about the use and interpretation of statistics are available on this website.

The paper notes that the imprecision estimates are from at least a ten day period running two levels of performance verifier fluids for each assay. These estimates are better than within-run estimates that might be provided or calculated. Of the two estimates of imprecision for each analyte, we will select just one to use.

What calculations do I have to perform, and in what order?

  1. Use the regression equation to estimate bias at a medical decision level (Xc) where performance is important.
  2. Find the quality requirement for that important decision level.
  3. Calculate Sigma metrics.

To identify critical levels of interest, we consult the Medical Decision Levels provided by Dr. Statland's book. (This reference provides a number of different decision levels - our choices may not be the ones that you would prefer, so feel free to recalculate with your choices.) Then we select the CV estimate from the level closest to that critical level (in most cases, the level isn't exactly where the CV estimate is being run)

Analyte Level CV P-B Regression Equation
Glucose 82.2 mg/dL 1.98% y = 1.000x -0.500
critical level 120 mg/dL
Creatinine 1.05 mg/dL 3.13% y = 1.000x - 0.100
critical level 1.6 mg/dL
K+ 2.91 mmol/L 1.47% y = 1.000x - 0.200
critical level 3.0 mmol/L
Ca2+ 12.43 mg/dL 1.58% y = 1.111x - 0.589
critical level 11.0 mg/dL
LD 439 U/L 2.31% y = 0.984x - 19.937
critical level 500 U/L

Estimate Bias at the critical level of performance.

How do you do this? By using the Regression Equation:

Yc = a + b Xc where Yc and Xc represent the test and comparison values, respectively at a critical medical decision level, b is the slope, and a is the y-intercept. The slope and y-intercept are given from the comparison of methods experiment.

You use the critical medical decision level as your Xc value. Then solve the Regression Equation for Yc. This will estimate what the value of the test method will be at that level.

Next, take the value of Yc-Xc, and divide it by Xc. This provides an estimate of bias as a percentage.

At the end of these calculations, you have estimates of bias and CV at the same level.

Here’s what our example data looks like after we’ve performed these calculations:

y = 1.000(120) - 0.500

y = 119.5

[(120 -119.5 ) / 120 ] * 100 = 0.42% bias

Analyte Level CV P-B Regression Equation Bias
Glucose 82.2 mg/dL 1.98% y = 1.000x -0.500 0.42%
critical level 120 mg/dL
Creatinine 1.05 mg/dL 3.13% y = 1.000x - 0.100 6.25%
critical level 1.6 mg/dL
K+ 2.91 mmol/L 1.47% y = 1.000x - 0.200 6.66%
critical level 3.0 mmol/L
Ca2+ 12.43 mg/dL 1.58% y = 1.111x - 0.589 5.65%
critical level 11.0 mg/dL
LD 439 U/L 2.31% y = 0.984x - 19.937 5.59%
critical level 500 U/L

Note that even after those calculations, it’s still difficult to judge the quality of this method. We know CV and bias, but how does that relate to the necessary quality for the method? Here's where quality requirements enter the picture.

What’s a quality requirement and where do I find it?

Finding or defining quality requirements is a critical step in the QC Design Process. We refer you to those articles on the website for more explanation. Since we are working with a chemistry instrument, we are in luck. CLIA has defined the quality requirements for all the analytes covered in this paper. Let’s add that to our table:

Analyte Level CLIA
TEa
CV P-B Regression Equation Bias
Glucose 82.2 mg/dL 10% 1.98% y = 1.000x -0.500 0.42%
critical level 120 mg/dL
Creatinine 1.05 mg/dL 18.75% 3.13% y = 1.000x - 0.100 6.25%
critical level 1.6 mg/dL (0.3 mg/dL)
K+ 2.91 mmol/L 16.66% 1.47% y = 1.000x - 0.200 6.66%
critical level 3.0 mmol/L (0.5 mmol/L)
Ca2+ 12.43 mg/dL 9.09% 1.58% y = 1.111x - 0.589 5.65%
critical level 11.0 mg/dL (1.0 mg/dL)
LD 439 U/L 20% 2.31% y = 0.984x - 19.937 5.59%
critical level 500 U/L

In some cases, where noted, the CLIA quality requirement is given in concentration units, which must then be converted into percentages for the critical medical decision level. For example, the Calcium critical level is 11.0 mg/dL, and the CLIA requirement is for the result to be within 1.0 mg/dL of that level. A simple calculation gives us the total allowable error percentage: (1.0/11/0)*100 - 9.09

Now that we’ve added the quality requirement, we’re ready to get Sigma metrics! We’ll really be able to see how these tests stand up.

Calculating Sigma Metrics from Bias, CV and Quality Requirement.

Again, the website has already covered the relationship between Sigma Metrics and bias, CV, and quality requirements. There is even a free online calculator on Westgard Web to perform the caculations.

Let’s see the Sigma Metric:

The basic equation is this:

(Quality Requirement - Bias ) / CV = Sigma metric

[either all terms in units or all terms in percentages as the case is here]

In our glucose example:

(10 - .42) / 1.98 = 4.84

Analyte Level CLIA
TEa
CV Bias Sigma metric Sigma metric without bias
Glucose 82.2 mg/dL 10 1.98% 0.42% 4.84 5.05
critical level 120 mg/dL
Creatinine 1.05 mg/dL 18.75 3.13% 6.25% 3.99 5.99
critical level 1.6 mg/dL (0.3 mg/dL)
K+ 2.91 mmol/L 16.66 1.47% 6.66% 6.80 11.33
critical level 3.0 mmol/L (0.5 mmol/L)
Ca2+ 12.43 mg/dL 9.09 1.58% 5.65% 2.18 5.75
critical level 11.0 mg/dL (1.0 mg/dL)
LD 439 U/L 20 2.31% 5.59% 6.24 8.67
critical level 500 U/L

Here we see that the goal of Six Sigma has been achieved (and exceeded) by two analytes, and that the metrics for the other analytes are pretty good. The calcium metric is the lowest, but for most methods, this is usually a troublesome method. Note also that if the biases were reduced to zero, then the Sigma metrics would look much better.

Conclusion?

For two of these methods in the new instrument, defect-free operation is possible. We advise you to celebrate that fact. However, these metrics still require some context. For instance, what does this performance mean in terms of the QC required? And how does the performance stack up against other chemistry instruments?

Postscript 1: How would you QC this instrument?

For a moment, let’s assume that you have this instrument. Even for the methods where Six Sigma performance is possible, there is no promise that you're never going to see an out-of-control result. Errors will still occur, not only due to random factors, including human error (using the wrong control), but also during control lot switches, calibration times, etc. There are still problems that will occur when the instrument becomes "unstable." But under stable performance, there should be very few defects.

But even during stable performance, you may get out-of-control flags due to "noise" or false rejection. Your choice of control rule at this point will determine how many false rejections (false alarms) you can expect to encounter. When a method has a Sigma metric of 6 or higher, the out-of-control flags you see will often be due to false rejection, not real errors. The goal with six Sigma metric methods is to use control rules that reduce the false rejection to as few as possible.

For the methods with Sigmas below 6, you need to do a QC Design phase, and correctly select the appropriate control rule (or rules) to match the performance of the method.

There are free QC Design tools available on the website - Normalized OPSpecs charts - and there is also the QC Design software EZ Rules® 3.

What follows below are OPSpecs charts generated by the EZ Rules® 3 program:

Potassium

For this method, the Automatic QC Selection function of EZ Rules® 3 has selected a 13.5s with 2 control measurements per run.

LD

Again, performance is so good you need very minimal statistical QC. Two controls and 3.5s limits.

Glucose:

For this method, the Automatic QC Selection function of EZ Rules® 3 has selected a 12.5s with 2 control measurements per run.

Creatinine

For this method, the Automatic QC Selection function of EZ Rules® 3 has selected a multirule consisting of the 13s/22s/R4s/41s rules with 4 control measurements per run. This is "more" QC than the earlier analytes.

At this point, it's helpful to look at a Sigma metrics / Critical-Error graph:

Here you can see the actual error detection (Ped) and false rejection (Pfr) characteristics of the chosen multirule, as well as some other control rules. The goal is to achieve 90% error detection with minimal false rejection. It may be that you can accept a little less error detection to exchange the multirule for a simpler 12.5s rule, maybe even with only 2 controls instead of 4. That would lower your error detection to 70%, which is well below the ideal, but if the rest of the analytes only need to run 2 controls, this might be a practical compromise.

Calcium

For this method, the Automatic QC Selection function of EZ Rules® has indicated that a "Max QC" should be used, which was manually selected as a multirule procedure with 4 control measurements per run, and using data collected over 2 runs. Even that procedure won't guarantee 90% error detection, unfortunately. Basically, for this method, you need to run as many controls and use as many rules as practically possible.

Again, looking at a Sigma metrics / Critical-Error graph is helpful:

Even the "Max QC" procedure is only reaching about 11% error detection. As you can see, less stringent and less complex control rules are down in the single digits. Clearly, statistical quality control isn't the solution for this method. Additional non-statistical steps need to be taken. Reducing CV and bias is a high priority. Ultimately, after you've taken all of those steps, you may also need to consider running duplicate samples.

Postscript 2: How does this instrument stack up?

It's all well and good to know how this instrument performs, but one question lingers: does this instrument perform better than its competitors? Also, how does the paper's performance compare to the specifications (Instructions for use) put out by the manufacturer of the Vitros.

Using data from the New York Proficiency Testing database, we can get a rough idea of the performance of other instruments, including the current generation of Vitros instruments:

Potassium

This graph shows that most instruments do a very good job with potassium. Basically, all of these instruments only need to use a 13.5s with N of 2 for this method.

LD

Here the Vitros is performing better than all competitors. And can't get any better really.

Glucose

With glucose, the Vitros 5.1 performance is in the middle of the pack. There are some instruments that perform better, some that perform worse. Given that the rule recommendation is 12.5s with N of 2, though, it's not worth complaining about.

Creatinine

This analyte shows that performance is again in the middle of the pack. Most of the methods seem to have a bias issue, and if the Vitros 5.1 performance detailed in the paper was closer to specification, life would be easier.

Calcium

For calcium, the bias is significantly higher than for other instruments. It leads one to suspect that the bias might be higher than other labs should expect. Reducing bias really is a big priority here. On the plus side, one can see that most other instruments have higher CVs than the Vitros 5.1