QC Lesson of the MonthMETHOD VALIDATION -
THE EXPERIMENTAL PLAN
James O. Westgard, Ph.D.

I start this discussion with the assumption that the method to be tested will be carefully selected, as discussed earlier in MV - Selecting a Method to Validate. Therefore, the application requirements will be satisfied and the methodology characteristics will be considered.

We can then focus on the performance characteristics, which include precision, accuracy, interference, working range, and sometimes detection limit. These characteristics may already have been estimated by the manufacturer to make claims for the method or by a user to publish an evaluation study. These claims or published results still need to be verified to show that the method works properly and is acceptable in an individual laboratory. That's the purpose of the method validation study.

Approach for formulating a plan

To carry out a good method validation study, you need to do the following:

An experimental plan can be formulated by:

Types of errors to be assessed

All measurements have some error! Even simple measurement devices, such as a bathroom scale, have errors. Whenever someone weighs on a bathroom scale, they observe error. That's why you immediately step off the scale, get back on, and make another measurement. You usually observe that these measurements, though performed closely in time and under essentially identical conditions, are still not exactly the same - that's the random error or imprecision discussed earlier in MV - The inner, secret, deeper, hidden meaning. You may also have noticed that virtually all scales are inaccurate - they read too high, don't they! That's an example of  the systematic error or bias described earlier.

In response to the weights being too high, we usually try to adjust the zero point of the scale and make the results lower. This assumes that there is a systematic error that is constant in nature, i.e., all people who weigh on that scale would be high by the same amount. If instead the weights of all people are in error by a proportion of their total weight, i.e., proportional error, the measurement needs to be corrected by a calibration type of adjustment, rather than a zero point adjustment.

If such simple devices as scales are subject to errors, it is readily understandable that measurements from the complex devices and systems used for laboratory tests are subject to these same kinds of error:

All these errors can be recognized when a group of measurements are compared to the correct or true values. For example, the accompanying figure shows how different types of errors are revealed when the results from a test method are plotted on the y-axis versus those from a comparative method on the x-axis. The dashed line in the middle of the figure represents ideal method performance where the test method and the comparative method give exactly the same results. The bottom line in the figure shows the effect of a proportional systematic error, where the magnitude of the error increases as the test result gets higher. The top line shows the effect of a constant systematic error, where the whole line is shifted up and all results are high by the same amount. Note that these results will also be subject to the random error of the method, therefore the actual data points would scatter about the line as illustrated in the figure. The range of this scatter above and below the line provides some idea of the amount of random error that is present.

Experiments for estimating analytical errors

While a comparison of methods experiment can reveal all these different types of errors, it is not necessarily the best way to go about studying a new method. For example, random error might be estimated more quickly by just analyzing a single specimen or a stable control material.

There are specific experiments for estimating different types of analytical errors, as shown in the table. The first column lists the type of error. The second column is labeled "preliminary" because these experiments are generally easier to perform and take less time and effort than the "final" experiments. The final experiments are more demanding and should be performed after preliminary results have shown that everything was okay so far. However, a poor showing on a preliminary experiment can be grounds for stopping the study and rejecting a method because a specific error condition has been identified.

Here's a brief description of these different experiments:

Organizing the experiments into a plan

A general plan for validating the performance of a new method is outlined below. It includes four phases - the initial familiarization with the method, the quick and dirty preliminary evaluation experiments, followed by the more extensive studies of precision and accuracy, and concluding with the steps to implement the method for routine service.

Note that this plan can and should be adjusted to consider any unique characteristics of a method or any special requirements of a laboratory and the patients it serves. For example, the studies for interference and recovery might be more extensive if the hospital is a cancer center whose patients are likely being treated with many different drugs. In a transplant center, comparison of methods studies may focus on the transplant patients who are treated with anti-rejection medications. The amount of data collected can also be adjusted on the basis of what's already available in the literature or what's required by regulatory or accreditation guidelines. For example, this plan should take into account the complexity of methods, as classified under the USA CLIA-88 guidelines. Highly complex methods require more careful study and might apply the NCCLS protocols for the replication experiment [1] and the comparison of methods experiments [2]. Moderately complex methods may be tested with simpler experiments that require less time and effort and also less data.

Walking tour of the plan

The first step with any new method is to get the method working and establish an operating protocol. This is the "familiarization" period, where the objective is to learn how to perform the method properly and establish an operating protocol that provides consistent test results. With our bathroom scale, for example, you have to get the scale out of the box, transport it to the proper location, find space to put it, and try it out. With a new analytical method, you have to set up the instrument, prepare the reagents, calibrate the methods, and obtain results from test samples. One of the critical factors is to check the standards and be sure the method is properly calibrated, otherwise calibration errors will show up throughout the experimental studies.

Once the method is operating, the next step is to determine the working range. With our bathroom scale, we generally are concerned that the range be adequate for the weight of a small child to a moderately large adult. The common scale is not likely to be sensitive enough to weigh a small baby, so if a very low detection limit is important, this characteristic needs to be dealt with up-front when selecting the scale. The common scale is also unlikely to handle a +300 pound football player, so if an extremely wide working range is important, that characteristic again should have been considered during the selection process. Likewise, with a new analytical method, the working range will vary from test to test and must be defined as part of the specifications for the method, then checked by analyzing a series of solutions, usually in duplicate or triplicate, whose concentrations cover the range of interest. If detection limit is a critical characteristic, it may be assessed at this time or in the next phase of preliminary experiments.

Next the preliminary experiments would be performed to determine within-run imprecision, recovery, and interference. Minimum amounts of data are collected in the minimum amount of time to facilitate a quick judgment of the acceptability of performance under the simplest conditions. The replication experiment might include 20 samples of two or three materials whose concentrations closely match the medical decision levels of interest for the tests. Interference experiments should test common problems such as hemolysis, lipemia, and high bilirubin. Recovery experiments assess whether there are any competitive reactions due to the matrix or other materials in the native specimens. If the errors revealed by the preliminary experiments are small, the final replication and comparison of methods experiments should be performed.

This final replication experiment should cover at least 20 working days. The comparison of methods experiment will usually be performed on fresh patient specimens, but stored specimens should also be tested if those storage conditions represent the typical processing and handling of the specimens routinely analyzed. The comparative study should ideally be one that has been previously well studied and a minimum of 40 well-chosen patient samples should be tested over a minimum of 5 working days. These samples should be distributed one-third in the low to low-normal range, one-third in the normal range, and one-third in the high abnormal range. Once these data have been collected, method acceptability should be judged on the basis of the sizes of the random, systematic, and total analytical errors. If these errors are small compared to amount of error that would invalidate the use and interpretation of a test result, the method is acceptable. If too large, it will be necessary to reject the method or to identify and eliminate the causes of the errors.

When method performance is judged acceptable, it may still be necessary to estimate or at least verify the reference interval(s). If the performance of the comparative method is not well documented, it may also be necessary to perform clinical studies to correlate test results with clinical conditions. Finally, all these studies need to be documented for future reference.

Implementation starts by writing the method protocol or laboratory procedure, which will be used in training other analysts to perform the new method. An essential part is a description of the Quality Control procedures that will be used to monitor routine performance. Once analysts are trained and the method is in routine service, it will be very important to monitor performance closely during the first month, identify the sources of problems, improve the preventive maintenance procedures, and update analysts about how to better manage the quality of the method.

References:

  1. NCCLS EP5-T2: Precision performance of clinical chemistry devices. 2nd ed, Tentative Guideline, 1992. National Committee for Clinical Laboratory Standards, Wayne, PA.
  2. NCCLS EP9-A: Method comparison and bias estimation using patient samples. Approved Guideline, 1995. National Committee for Clinical Laboratory Standards, Wayne, PA.

   

Copyright © 2000. All rights reserved.
Westgard QC, 7614 Gray Fox Trail, Madison WI 53717
Call 608-833-4718 or e-mail us at westgard@westgard.com

 

A Message from JOW
QC Lessons | QC Applications | Questions | Multirule
CLIA Requirements | What's New? | Catalog | Demo Download
Home  | Glossary | ARCHIVES | Links | Feedback