Final5 CLIA Rule. Part VI: Method Validation -
Statistical Sense, Sensitivity, and Significance

April 24, 2003
James O. Westgard, Ph.D.

The use of statistics in method validation studies is still a major problem in laboratories today. Given the Final CLIA Rule's requirement for method validation studies on all non-waived methods introduced after
April 24, 2003, analysts will need guidance and training on the use and application of statistics in method validation studies.

State of the art

One barometer of current method validation practices is found in the current issue of MLO (April 2003), where the problem of using statistics in a comparison of methods study is discussed (see Answering your questions column and question about comparing two analyzers) [1]. What's the best way to analyze the data - correlation coefficient, regression statistics (slope, y-intercept, standard error about the line), t-test statistics (t-value, bias, SD of the differences), or the F-test? While it is not difficult to make statistical calculations today because of the availability of statistics packages on personal computers, people still have difficulties in making sense of the calculated statistics.

Thirty years ago we investigated the issue of proper use and interpretation of statistics in the method comparison experiment [2]. The findings are as relevant now as then because the statistical skills of laboratory managers and analysts have not improved.

Here's the secret

Statistics are tools to estimate the size of the analytical errors that occur between the test method and the comparative method. The acceptability of a method's performance depends on the size of the errors observed. Method performance is acceptable when the observed errors are small, less than the amount of error that is allowable (without causing misinterpretation of a test result and misdiagnosis or mistreatment of the patient). An objective decision on method performance depends on defining the quality required, i.e., the amount of error that is allowable, then comparing the estimates of observed errors to that requirement for quality.

A graphical tool for judging acceptability is the "method decision chart" [3], which is prepared for the define quality requirement and shows the allowable inaccuracy (bias) on the y-axis versus the allowable imprecision (CV) on the x-axis. To evaluate performance, you plot an "operating point" whose y-coordinate is the estimate of bias from the comparison of methods experiment and x-coordinate is the estimate of imprecision from the replication experiment. The outcome from the statistical analysis of the comparison data should be an estimate of the method's bias or systematic error. The right statistics are those that provide a correct estimate of the method's systematic error.

Sensitivity of statistics to types of errors

To determine the sensitivity of different statistics to different types of errors, we can construct data sets having known amounts of random error (RE), constant systematic error (CE), and proportional systematic error (PE). Those data sets are then submitted to the various statistical calculations to see which statistics respond to which errors. The results are summarized by the table shown here.

Random error between methods is reflected by changes in sy/x, SDdiff, and r. Constant error shows up in the y-intercept and the bias. Proportional error can be estimated by the slope's deviation from an ideal value of 1.00, but also causes changes in the bias and SDdiff in t-test statistics. That's a problem and the reason for the question mark in the table - proportional error confounds the interpretation of t-test statistics! There also is a problem with the correlation coefficient because it responds only to random error, not systematic errors. The whole point of the method comparison experiment is to estimate systematic error, but the correlation coefficient doesn't tell anything about systematic errors.

Making Sense of Statistics

Correlation coefficient. The correlation coefficient provides information only about random error, even though the objective in a method comparison study is to estimate systematic error. Therefore, the correlation coefficient is of little value for estimating analytical errors in a method-comparison experiment. However, because r is sensitivity to the range of data collected, r is useful as a measure of the reliability of the regression statistics. Isn't it wonderful that a limitation can be turned into a useful feature once the behavior is properly understood.

t-test statistics. The estimates of errors may be confounded by the presence of proportional error. There are two cases where the estimates of systematic error will be reliable: (1) if proportional error is absent, then the systematic error is constant throughout the concentration range; (2) if the mean of the patient results is close to the medical decision level of interest, then the overall estimate of constant and proportional error is reliable at the mean of the data, but that estimate of systematic error should not be extrapolated to other decision level concentrations.

Regression statistics. It is ideal to have three statistical parameters that can each estimate a different type of error. Proportional error can be estimated from the slope, constant error by the y-intercept, and random error (between methods) from the standard deviation about the regression line. Systematic error can be estimated at any concentration using the regression equation, i.e., Yc = a + bXc, where Xc is the critical medical decision concentration and Yc is the best estimate of that concentration by the test method. The difference between Yc and Xc is the systematic error at that critical concentration, i.e., Yc-Xc = SE.

The estimates of errors from regression statistics will not be reliable unless the data satisfies certain conditions and assumptions.

Statistical vs clinical significance. We have not included the t-value in the discussion so far because it does not provide an estimate of errors! This statistic is a "test of significance" that is mainly useful for deciding whether sufficient data have been collected to demonstrate that a difference exists. If the calculated t-value is greater than the critical t-value (which is 2.02 for the example data sets having 41 points), the observed bias is said to be statistically significant, which in practical terms means "real." If the calculated t-value is less than the critical t-value, then the data are not sufficient to demonstrate that a "statistically significance bias" exists between the test and reference sets of values.

From my perspective, this information on statistical significance is secondary in importance. The judgment on method acceptability must be made on clinical significance, not statistical significance. An error can be statistically significant, i.e., real, yet so small that it isn't clinically important. On the other hand, an error can be large and clinically important, yet the data may not be sufficient to demonstrate that it is statistically significant.

The acceptability of method performance depends on whether or not the errors will affect the clinical usefulness of the test results. Clinical significance depends on defining allowable limits of errors, then comparing the observed errors to those limits. If the observed errors are smaller than the allowable errors, method performance is acceptable. If the observed errors are larger than allowable, method performance is not acceptable.

Statistical tests can provide estimates of errors upon which judgments can be made, but they are not a substitute for the judgments that need to be made. Clinical significance is determined by comparing the statistical estimates of errors to the defined allowable error. The tool for doing this is the method decision chart mentioned earlier..

References

  1. Answering your questions column. Medical Lab Observer 2003;(April):39.
  2. Westgard JO, Hunt MR. Use and interpretation of statistical tests in method-comparison studies. Clin Chem 1973;19:49-57.
  3. Westgard JO. A method evaluation decision chart (MEDx Chart) for judging method performance. Clin Lab Science 1995;8:177-83.
  4. Stockl D, Dewitte K, Thienpont M. Validity of linear regression in method comparison studies: Is it limited by the statistical model or the quality of the analytical input data? Clin Chem 1998;44:2340-6.
  5. Cornbleet PJ, Gochman N. Incorrect least-squares regression coefficients in method-comparison analysis. Clin Chem 1979;25:432-8.
  6. Passing H, Bablock W. A new biometrical procedure for testing the equality of measurements from two different analytical methods. J Clin Chem Clin Biochem 1983;21:709-720.


See the Entire CLIA Final Rule Series:



Copyright © 2003. All rights reserved.
Westgard QC, 7614 Gray Fox Trail, Madison, WI 53717
Call 608-833-4718 or e-mail westgard@westgard.com

A Message from JOW
QC Lessons | QC Applications | Questions | Multirule
CLIA Requirements | What's New? | Catalog | Demo Download
Home | Glossary | ARCHIVES | Links | Feedback