METHOD VALIDATION:  
THE INNER, HIDDEN, DEEPER, SECRET MEANING

James O. Westgard
A word from
Dr. Westgard
 
 

When I was a freshman in college, I had an English professor who taught me something I've never forgotten. He always asked "what's the inner, hidden, deeper, secret meaning" in what you're writing. What are you really trying to accomplish? You'd better figure it out if you expect someone else to understand it. Sure, you can write something, but you've really got to be clear on what you want to accomplish, otherwise the real purpose and meaning will remain a secret.

The real surprise came on my first job as a clinical chemist when I began evaluating the performance of a new multichannel chemistry analyzer. I studied all the existing scientific literature that provided guidelines for performing method validation (MV) studies, but it wasn't at all clear how to tell whether or not a new method was acceptable. No one was telling the secret! And that secret is of paramount importance to evaluate a method properly. Sure, you can collect some data, calculate some statistics, and provide some paper in a folder to show a lab inspector, but is that really why you're doing this?

While I won't claim my english professor made me a better writer (nor can you blame him), he did make me a better scientist by helping me search for the deeper meaning and real purpose in what I do. What's the real purpose of method validation? What's the problem we're trying to solve? Does the present practice provide a correct solution? Is there a better way to do this? How do you know what's the right way to validate the performance of a method?

The Secret Revealed

Here's the inner, hidden, deeper, secret meaning of method validation - and you don't have to read any further to get the message - ERROR ASSESSMENT. You want to estimate how much error might be present in a test result produced by a method in your laboratory. With this information, you then want to be sure that amount of error won't affect the interpretation of the test result and compromise patient care. If your observed errors are so large they can cause an incorrect interpretation, the method isn't accceptable. To be acceptable, the observed errors need to be small relative to changes that will cause a change in the interpretation of a test result.

A focus on analytical errors is the key to the whole method validation process. What kinds of analytical errors might occur with a laboratory method? What experiments can provide data about those errors? What's the best way to perform those experiments to assess the errors? How much data needs to be collected to obtain good estimates of errors? What statistics best estimate the size of those errors from the experimental data? What size errors are allowable without affecting the interpretation of a test and compromising patient care?

Method validation is about error assessment - that's the secret!

A Quick Proof

The correlation coefficient is a statistic that is almost always calculated and reported to describe the results from a comparison of methods study. A value of 1.000 indicates perfect correlation between the results of two methods. Other statistics (such as slope, intercept, and standard deviation of the residuals) can also be calculated from the same data to estimate the size of errors occurring between the methods. Which are more useful?

Consider the following situation. Here's a new glucose method where the results from a comparison of methods study give a correlation coefficient of 0.999, which is very close to ideal value of 1.000. Sounds pretty good, doesn't it? How close are the results between the two methods? Is the new method acceptable?
Let me give you some additional information: Here's the plot of the test results by the new method vs those from the comparative method. Note first that the correlation coefficient shows that the results are close to the best line of fit between the methods, not that the test values are the same as the comparative values. 
 
Results of a comparison study, where the 
new or "test" method values are plotted on y-axis
and comparison values on x-axis.
 
The plot shows that almost all the new method values are systematically higher by 15 mg/dL. Does this information that there is a systematic error of 15 mg/dL help with your decision on the acceptability of the new method? It doesn't look so good anymore, does it? Being in error by 15 mg/dL may limit the usefulness of the test results produced by the new method.

As laboratory people, we intuitively understand errors and have a sense of how they might affect the interpretation of test results and the related care of patients. We don't have the same sense about statistics! That's why statistics should be used to provide estimates of the errors that are meaningful to us - that's a second important secret and we'll deal with it in detail later [1]. From this simple example, you can recognize the difficulty in interpreting a correlation coefficient because it doesn't provide a useful estimate of analytical errors. Information about the size of analytical errors is more useful for judging the performance of a method[2]. The fact that the correlation coefficient is commonly calculated doesn't make it useful. It just shows that people just don't know the secret of method validation!

Analytical Errors

Let's focus on analytical errors and be sure we have a common understanding of the different kinds of errors that need to be estimated. Here's a list of terms that you need to understand: random error or imprecision, systematic error or inaccuracy, constant error, proportional error, and total error.
Random error, RE, or imprecision is described as an error that can be either positive or negative, whose direction and exact magnitude cannot be predicted, as shown in the accompanying figure where the distribution of results when replicate measurements are made on a single specimen. Imprecision is usually quantitated by calculating the standard deviation (SD) from the results of a set of replicate measurements. The SD oftens increases as the concentration increases, therefore it is often useful to calculate the coefficient of varation (CV) to express the SD as a percentage of the mean concentration from the replication study. The maximum size of a random error is commonly expressed as a 2 SD or 3 SD estimate to help understand the potential size of the error that might occur. 
 

Random Error (RE) or Imprecision,
as shown by the distribution of test values

 
 
Systematic error, SE, or inaccuracy is an error that is always in one direction, as shown in the accompanying figure where a systematic shift displaces the mean of the distribution from its original value. In contrast to random errors that may be either negative or positive and whose direction can not be predicted, systematic errors are in one direction and cause all the test results to be either high or low. 
 

Systematic Error (SE) or Inaccuracy,
as shown by shift or bias between mean value and correct value

 
How high or how low can be described by the bias, which is calculated as the average difference, or the difference between averages, between the value by the "test" method and a "comparative" method in a comparison of methods experiment. Alternatively, the expected systematic difference may be predicted from the equation of the line that best fits the graphical display of test method values on the y-axis vs comparative method values on the x-axis. SE may stay the same over a range of concentrations, in which case it can also be called constant error, or it may change as concentration changes, in which case it can be called proportional error.
 
 
Total Error, TE, is the net or combined effect of random and systematic errors, as shown in the accompanying figure. It represents a "worst-case" situation, or just how far wrong a test result might be due to both random and systematic errors. Because laboratories typically only make a single measurement for each test, that measurement can be in error by the expected SE, or bias, plus 2 or 3 SD, depending on how you quantitate the effect of RE. 
 

Total Error (TE),
includes both systematic error (SE) and random error (RE).

 
While we in the laboratory like to think about imprecision and inaccuracy as separate errors, the physician and the patient experience the total effect of the two, or the total error. Total error provides a customer or consumer oriented measure of test performance, which makes it the most important parameter for judging the acceptability of analytical errors.

Importance of Method Validation Practices

Laboratory regulations in the USA require that method performance for any new method be "verified" prior to reporting patient test results. Precision and accuracy are specifically identified, along with analytical sensitivity, analytical specificity, reportable range, reference values, and any other applicable characteristic. It was intended that those methods cleared by FDA as meeting CLIA requirements for general quality control will require only verification of precision, accuracy, reportable range, and reference range, however, there aren't yet any methods that have been given QC clearance by FDA and it isn't yet known when this clearance process will become operational. The responsibility for method verification or validation, therefore, still resides with each laboratory. While manufacturers will often collect method validation data during the installation of new analytical systems, the laboratory is still accountable to see that adequate data has been collected and that this data shows that the new methods provide acceptable performance in the laboratory.

References

  1. Westgard JO, Hunt MR. Use and interpretation of common statistical tests in method comparison studies. Clin Chem 1973;19:49-57.
  2. Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1974;20:825-33.


Other Essays:

Copyright © 2000. All rights reserved.
Westgard QC, 7614 Gray Fox Trail, Madison WI 53717
Call 608-833-47183 or e-mail us at westgard@westgard.com

A Message from JOW
QC Lessons | QC Applications | Questions | Multirule
CLIA | What's New? | Catalog | Demo Download
Home  | Glossary | WQC Archives | Links | Feedback