From Method Validation to Six Sigma:

Translating Method Performance Claims into Sigma Metrics

Sten Westgard

Note: This essay is a synthesis of ideas and applications of Method Validation, Six Sigma, and QC Design. It assumes you are familiar with many of these concepts, but if you aren’t, there are links to essays and lessons with in-depth coverage. The main goal of this essay is to show how to put Method Validation together with Six Sigma Metrics in order to make a definitive decision on the suitability of a method for your own application.

Recently, we had the opportunity to examine and review a method validation study of a POC chemistry instrument. (This instrument, for reasons that will soon become clear, shall remain nameless.) At first glance, the method validation study indicated that this instrument had methods of excellent quality. However, method validation studies are notoriously hard to interpret. Typically a large amount of data is used, multiple experiments are performed, and dozens of statistics are generated, with numerous graphs, to summarize the findings. At the end of the day, however, there is often no clear conclusion about the acceptability of the new methods. All the data and statistics don’t come together to support a definitive yes or no, good or bad decision. While the authors of this particular study praised the instrument and the manufacturers who had commissioned the study, it was really not clear why they believed the instrument was so great.

In the absence of a clear verdict, many labs latch onto the correlation coefficient as the be-all and end-all of a method validation study. If r is near its ideal value of 1.00 (or above 0.90 or 0.95), conventional lab wisdom has it that the method is good. And in our example study, the authors typically spoke about correlation as if it were the most significant fact in the details of each method. As we shall see, conclusions of this nature may be far from the truth.

The good news is that there are ways to eliminate confusion about a method validation study. One of them is to understand the statistics better. Another is to use a Method Decision Chart. Still another technique is to translate the method validation results into Six Sigma metrics. Once converted into Sigma metrics, the numbers are stark, the conclusions are clear, and your decision is simple.

What is Six Sigma?

A huge volume of work has accumulated about the topic of Six Sigma. There are several detailed articles about it in our public archives. To reduce (and oversimplify) Six Sigma, there now are "Sigma metrics" that provide a universal benchmark for process performance. The performance of all processes can be characterized on the "Sigma scale." Values typically range from 2 to 6, where the goal for "world class quality" is 6. If the Sigma metric is less than 3, you’ve essentially got a process that is so unreliable it shouldn’t be used for routine production. A process with a low sigma metric will cost you a lot of time and effort to maintain. To give you a benchmark for understanding process performance on the sigma scale, airline baggage handling is about a 4 sigma process. We hope that healthcare, including laboratory tests, has a better Sigma than that!

Let’s find out!

What do you need to go from method validation to Six Sigma?

Translating method performance data from a method validation study into Sigma metrics isn’t hard. It involves minimal calculations. You can do this on a napkin if you have to. However, you do need to have access to all the pertinent data. Some of this data comes from the manufacturer. You’ll have to provide some other information, like quality requirements and decision levels – but the good news there is that you can find the needed data right here on Westgard Web at no cost.

Here’s the specific list of what you need:

From the Method Validation study provided by the manufacturer:

From other sources:

Now, here is the Method Validation study data from our anonymous instrument for a glucose method:

Test Name
Control/Level CV Slope Y-Int R with Comments
Glucose I: 217.9 0.79 1.0377 5.37 correlation of the instruments is extraordinary at 100%
II: 81.5 0.93    

On first glance, this looks like good method. It’s hard to understand the real meaning of the numbers, but the words used by the report about the correlation are clear: extraordinary. Let’s take this manufacturer-supplied data and see if that is true.

What do you do with the method validation data?

First, you need to get estimates of bias at the levels where you have estimates of precision. Put another way, you have to synchronize your precision results with your accuracy results from the comparison of methods study.

How do you do this? By using the Regression Equation: Yc = a + b Xc where Yc and Xc represent the test and comparison values, respectively at a concentration level of interest, b is the slope, and a is the y-intercept. The slope and y-intercept are given from the comparison of methods experiment.

For your Xc value, pick a whole number that is close to the mean values observed in the manufacturer’s replication experiment. For the examples here, the calculation can be made at glucose concentrations of 80 and 220 mg/dL. The calculated Yc provides the best estimate of what the test result will be for a true value of Xc by the comparison method.

Next, take the value of Yc-Xc, and divide it by Xc. This gives you a % bias measurement at that level.

At the end of these calculations, you have estimates of bias and CV at the same level.

Here’s what our example data looks like after we’ve performed these calculations:

Yc = 5.37 + 1.0377(Xc) = 5.37 + 1.0377*80 = 88.4

Yc – Xc = 88.4 – 80 = 8.4

(8.4/80)*100 = 10.5% (bias expressed as %)

Yc = 5.37 + 1.0377(220) = 233.7

Yc – Xc = 233.7 – 220 = 13.7

(13.7/220)*100 = 6.2%

Test Name Control/Level CV Bias % Slope Y-Int R with Comments
Glucose I: 217.9 0.79 6.2 1.0377 5.37 correlation of the instruments is extraordinary at 100%
II: 81.5 0.93 10.5    

Note that even after those calculations, it’s still difficult to judge the quality of this method. Certainly, we can look at the bias and wonder if it’s too high, but we haven’t defined how good the glucose test results need to be. That’s where the "tolerance limits" from Six Sigma come in to play. In laboratory terms, we need to know the "quality requirement."

What’s a quality requirement and where do I find it?

Again, this is a topic covered in a wealth of detail by other essays and lessons on this website. Quality requirement should be a self-explanatory term. However, in many US laboratories, the practice of defining the quality required by a test is rarely done, and the concept is almost unknown. No one denies that a logical quality process should begin by defining how "good" the process has to be. That is exactly what laboratories need to do. In some cases, the government has already taken this step. The CLIA rules for proficiency testing define acceptable limits for about 80 tests. These limits are in the form of an allowable total error that includes both the precision and accuracy of the test result. In other words, the effects of both the method CV and bias are included in the allowable total error.

Finding or defining quality requirements is a critical step in the QC Design or QC Planning Process. We refer you to those articles on the website for more explanation. Since we are working with a chemistry instrument, we are in luck. CLIA has defined the quality requirements for all the tests on our new instrument. For glucose, it is 10% - let’s add those to our table:

Test Name Control/Level QR % CV % Bias % Slope Y-Int R with Comments
Glucose I: 217.9 10 0.79 6.4 1.0377 5.37 correlation of the instruments is extraordinary at 100%
II: 81.5 10 0.93 10.5    

One important thing to note is that the CLIA quality requirements are most often given as percentages, which means the size of the error in concentration units actually gets larger as the concentration gets higher. Other times, the CLIA requirements specify an absolute value in concentration units, and sometimes, both are given, with the inference to use either the highest or lowest value appropriate at the concentration of interest. For glucose, the CLIA requirement is stated as Target Value plus or minus 10% or 6 mg/dL, whichever is greater. That means the 10% figure would apply for any concentration above 60 mg/dL and the 6 mg/dL would apply to anything below 60 mg/dL.

Now we’re ready to get Six Sigma metrics and will really be able to see how a test performs!

Calculating Sigma Metrics from Bias, CV and Quality Requirement.

The gruesome details of how and why Six Sigma Metrics are related to bias, CV, and quality requirements are (yet again) covered by other essays on Westgard Web. Here we merely note that you can calculate Sigma metrics from performance data by this simple equation:

Sigma = (TEa – bias)/CV.

That’s pretty simple, isn’t it!

Enough discussion: let’s see the Sigma Metrics:

At the 220 mg/dL level, Sigma = (10-6.5)/0.79 = 4.56

At the 80 mg/dL level, Sigma = (10-10.5)/0.93 = negative

Test Name Control/Level Q. R. CV Bias % Sigma Metric Slope Y-Int R with Comments
Glucose I: 217.9 10 0.79 6.4 4.56 1.0377 5.37 correlation of the instruments is extraordinary at 100%
II: 81.5 10 0.93 10.5 negative    

At this point, we expect that you may have some concern about these numbers. One looks okay (4.56) and one looks terrible (negative). Can this really be the performance of an actual method? Remember, this is method validation performance data supplied by the manufacturer of the instrument itself. The manufacturer gave us these numbers, but the manufacturer probably doesn’t understand what these numbers mean in terms of a Sigma metric.

What does it mean when a test has 2 widely different Sigma metrics?

While it is probably disconcerting to find that a single test process has two different Sigma metrics, it’s not surprising that a test performs differently at different levels. It would be far more unusual if a test performed the same at all the levels of concentration.

For the high level control, a Sigma metric of 4.56 is actually acceptable. But for the low level control, a Sigma metric of less than zero is clearly bad. Taken together, what do the two values mean? What’s the overall Sigma metric of the test?

Remember that these Sigma metrics are calculated at the levels where controls are being run. Are those the best levels to judge the performance of the test? Or are there better, more appropriate levels to use? If you think about it, ultimately, the Sigma metrics of where the controls are run matter less. If you think about it, we should be more interested in finding the Sigma performance at the level where medical decisions are being made - and where patients are being most affected by the test results.

Dr. Bernard Statland has a provided a critical reference for this area. He has graciously allowed us to post an extensive list of recommendations for "medical decision levels"on the website. Using those medical decision levels, we can recalculate the Sigma metrics at medically important concentrations. For example for glucose, medical decision levels are given as 45, 120, and 180 mg/dL. Because the new ADA/HSS guidelines for diagnosis of diabetes emphasize a decision level of 126 mg/dL, we’ll use the 120 value in our calculations here.

The process for working with the critical medical decision levels is similar to our earlier calculations. We use the regression equation again to estimate Yc and then obtain the bias as Yc-Xc. However, for CV, we will need to rely on the precision studies. The practice here is to use the CV estimate which is closest to the critical level. So for glucose, where the known CV values are found at levels of 217.9 and 81.5, and the critical medical decision level is 120, we would use the CV value from the study at 81.5, since that is the closest. (Alternatively, we could use the average of the two CVs since they’re quite close.)

After that, the process is identical to what was done earlier. We find quality requirements for that critical level, then we recalculate the Six Sigma metric.

To summarize the steps here:

  1. Find a critical medical decision level.
  2. Use the regression equation to estimate bias at that level.
  3. Pick the closest precision study to estimate CV at that level.
  4. Find the quality requirement for that level.
  5. Calculate Six Sigma metrics.

Having completed this process for all the tests, here are the final results:

Test Name Control/Level Q. R. CV Bias % Sigma Metric Slope Y-Int R with Comments
Glucose I: 217.9 10 0.79 6.4 4.56 1.0377 5.37 correlation of the instruments is extraordinary at 100%
II: 81.5 10 0.93 10.5 negative    
Crit: 120 10 0.93 8.2 1.94    

Based on the final calculations, the Sigma metric is below 3. As you may recall, in industry, any process below 3 sigma is considered too unreliable for routine use. Therefore, your final judgment on the glucose method of this instrument should not be positive. For this test, method precision is actually quite good (why? because the precision study was only over 2 days -- but that's another story...), but method bias is the killer here. When considered together, bias and CV will make this test unreliable for its diagnostic application.

Conclusion: Method performance data can and should lead to conclusive judgments

The translation of method performance data into Sigma metrics will bring manufacturer’s claims into "stark reality." Mountains of data and statistical calculations will no longer obscure the true performance and acceptability of a method. Whatever the manufacturer claims for performance, you can boil the data and numbers down to a definitive and meaningful Sigma metric.

Customers in healthcare are going to enjoy a new era of empowerment. You can now easily determine whether or not you should purchase an instrument. You are no longer in danger of being sold a "lemon." Indeed, you can now begin demanding a better instrument from the manufacturers with simple Sigma metrics. Tell your local sales rep you want an instrument with methods that perform at 5 sigma or higher. This will get you results, maybe not today, but in a year, you will begin to see instruments with truly high sigma performance.

Postscript: How would you QC glucose this instrument?

For a moment, let’s assume that you already have this instrument and you’re stuck with it – there’s no money in the budget to get a new one for quite some time. If this instrument is the only method to provide test results, you’ll still have to use it, no matter how bad the performance is.

If the Sigma metrics were above 3 sigma, we would recommend using a QC Design or QC Planning tool like the Normalized OPSpecs charts available on the website, or the software programs QC Validator® 2.0, or EZ Rules®. But in this case, performance is so poor that a blanket recommendation will suffice.

For methods below 3 sigma, you want to use the full "Westgard Rules" with as many controls as you can afford. 13s/22s/R4s/41s/8x for example, with 4 control measurements or more. This is a lot of QC and is impractical in most POC settings. Does it follow that this instrument is really not suitable for a POC site?