Please Note: The Basic Method Validation manual is now in its FOURTH edition. The questions presented in this article are drawn from the first edition. Many of the answers here still apply, but some things, particularly on the regulatory side, have changed in the past 20 years.
It's important to demonstrate that the method performs well under the operating conditions of your laboratory and that it provides reliable test results for your patients. There are many factors that can affect method performance, such as different lots of calibrators and reagents, changes in supplies and suppliers of instrument components, changes in manufacturing from the production of prototypes to final field instruments, effects of shipment and storage, as well as local climate control conditions, quality of water, stability of electric power, and, of course, the skills of the analysts. In US laboratories, method validation studies are actually required by the CLIA regulations.
In the US, CLIA defines minimum standards of analytical quality in the form of the criteria for acceptability in proficiency testing surveys. These criteria define the allowable total error around a target value (TV). For example, acceptable test results for cholesterol are described as TV plus/minus 10%, which means that test results should be within 10% of the correct value. Most countries have proficiency testing or external quality assessment schemes that define standards for analytical quality in a similar manner. Note that use of an allowable total error does not provide a specification for an individual characteristic, such as imprecision, inaccuracy, interference, recovery, etc., but provides a requirement of the total amount of error when all sources are combined.
One possibility is that the manufacturer's technical personnel will perform validation studies when installing a new system in your laboratory. This seems to be a growing trend in the US, probably due both to tight laboratory staffing and also the strategy of purchasing whole systems and holding the manufacturer accountable for all problems. If the manufacturer performs the studies, it's important that you review the experimental design, monitor the data collection, and perform your own statistical analysis and interpretation of the data.
In many other cases, however, the studies will need to be organized and carried out by the laboratory itself. It is advisable to have one analyst organize the studies, monitor the data collection, review the data, perform the statistical analysis of the data, and be responsible for the interpretation and conclusions. Other analysts can participate as operators and perform the tests needed in the different validation experiments.
The method should be operated in the way intended under routine service conditions. If routine service operation will make use of commercial calibrators, then those calibrators must be part of the testing process that is validated. It is generally advisable to analyze both commercial calibrators and primary standards together, when possible, to see if they agree. Any disagreement should be resolved prior to performing the recovery, interference, and comparison of methods experiments.
These almost always include the reportable range, precision (or imprecision), accuracy (or inaccuracy, bias), and the reference interval. Sometimes the studies include detection limit (or sensitivity), interference, and recovery. In US laboratories, the CLIA regulations define which characteristics need to be validated for methods with difference classifications of complexity. Fewer studies are required with less complex methods. More extensive testing is necessary for methods developed by the laboratory or modified by the laboratory.
Reportable range is validated by a linearity experiment, imprecision (or random error) determined from a replication study, and inaccuracy (or systematic error) assessed from a comparison of methods experiment, as well experiments for interference (constant systematic error) and recovery (proportional systematic error). Sensitivity is determined by a detection limit experiment. Reference intervals can be verified by testing samples from healthy people.
It's actually the reportable range that must be validated. The objective in determining reportable range is to define the highest value that can be reported without diluting the sample. This is usually done by performing a linearity type of experiment, but there is no strict requirement that the method response has to be linear. However, the readout from instrument systems often is linear in the units that are reported.
No, for most tests it is sufficient to validate the reportable range using a linearity type of experiment. A more exact estimate of analytical performance around zero is needed only when there is special significance attached to low values for the test. Drug tests are an obvious example. Tumor markers are another example.
Good planning would be to analyze the number of materials that will be used in routine quality control for that test. In US laboratories, CLIA places certain requirements on the number of materials to be used for different tests - e.g., a minimum of 2 levels or materials. Laboratory practices commonly include 3 materials for certain tests, such as blood gases and hematology. When possible, select control materials that can be continued for QC once the test is implemented in your laboratory.
Ideally, the comparison method should be a method that is free of systematic errors, i.e., a method whose accuracy or bias is minimal. In practice, most studies involve the routine service method that is to be replaced by the new method. In such studies, the objective is really to assess whether there will be any systematic changes in test values between the "old" method and the "new" method. If such systematic changes are uncovered, then it is important to document which method has the problem. Interference and recovery experiments are often helpful for pinpointing the problem and the method at fault.
Probably because this experiment uses real patient samples and reveals the kind of errors that will be encountered when the tests are used for patient care, which is particularly important when a laboratory changes methods. It also reveals different kinds of errors - proportional systematic, constant systematic, random error between methods - therefore providing a lot of quantitative information about method performance. Some of the other experiments seem to test conditions that may not be observed very often - e.g., interference, recovery, and detection limit.
Perfect correlation, i.e., a correlation coefficient of 1.000, means that the values by the test method increase directly in proportion to the values by the comparison method increase. However, a value of 1.000 doesn't mean that the test method values are identical to those of the comparison method. Systematic differences can be present, e.g., the test method could be running 100 units higher than the comparison method, or the test method could be providing results that are only half of the values by the comparison method, yet the correlation coefficient could still give a value near 1.000. Because the comparison of methods experiment is performed to validate the accuracy of a method, the statistical analysis must provide estimates of systematic errors, not just the correlation or results.
The best use of the correlation coefficient is to help decide whether ordinary linear regression will provide reliable estimates of slope and intercept. If r=0.99 or greater, it is generally accepted that ordinary linear regression calculations are adequate for estimating the errors between the methods.
Remember that the purpose of the comparison of methods experiment is to estimate systematic errors, which may be constant or proportional in nature. Regression statistics can provide estimates of these components of systematic error by the y-intercept and slope, as well as estimation of the overall systematic error or bias at any decision level concentration of interest by calculation from the regression equation. The difference plot, on the other hand, emphasizes the random errors between the methods. You actually need to calculate the average difference or bias from paired t-test statistics to get a good estimate of the systematic error, thus the difference plot by itself (without statistical calculations) does not provide sufficient information about the systematic error of the method. Regression statistics are preferred over t-test statistics in order to calculate the systematic error at any decision level, as well as getting estimates of the proportional and constant components of systematic error.
There are two cases where t-test statistics will provide reliable estimates of systematic errors.
Plot the data on a comparison plot (test value on the y-axis, comparison value on the x-axis) to assess whether proportional error is present or absent. If absent, then plot the data on a difference plot, i.e., the plot the difference of the test minus comparison values on the y-axis versus the comparison values on the x-axis.
When using t-test statistics, present the following:
Plot the test value on the y-axis versus the comparison value on the x-axis, then inspect the data for:
Calculate the correlation coefficient as a measure of the range of data, however, you should first inspect a graph to be sure the data is spread fairly uniformly over the range so the r value is not being influenced by a few high or low points. If r=0.99 or greater, the range of data is wide enough to provide reliable estimates of the slope and y-intercept using ordinary linear regression analysis. If r<0.95, it is generally advised to use an alternate statistical technique, such as t-test statistics, to estimate the overall systematic error; or use an alternate regression technique, such as Deming regression, to calculate the slope and y-intercept.
Calculate the slope, y-intercept, and standard deviation of points about the regression line. Interpret the deviation of the slope from an ideal value of 1.000 as proportional error, the deviation of the y-intercept from an ideal value of 0.00 as an estimate of constant systematic error, and the value of the standard deviation of the points about the regression line as a measure of the random error between the methods.
Calculate the systematic error at medically important decision concentrations (Xc) using the regression equation. SE = Yc - Xc = (a + bXc) - Xc.
Present the following:
This refers to an alternate way of calculating regression statistics when the range of data isn't as wide as desired for ordinary linear regression (i.e., the correlation coefficient doesn't satisfy the criterion of being 0.99 or greater). An assumption in ordinary linear regression is that the x-values are well known and any difference between x and y-values is assignable to error in the y-value. In Deming regression, the errors between methods are assigned to both methods in proportion to the variances of the methods. The calculations are not commonly available in standard statistical programs, however, special computer programs for laboratory method validation will often include Deming regression.
For a detailed discussion of Deming regression and the calculations, see Cornbleet PJ, Gochman N. Incorrect least-squares regression coefficients in method-comparison analysis. Clin Chem 1979;25:432-438.
You can collect the data very carefully to permit the application of other statistical calculations. Consider strategies to:
Tests that may have a narrow analytical range include analytes such as calcium, chloride, and sodium, where the body itself attempts to maintain a narrow range of concentrations. Other tests, such as creatinine, may have a narrow concentration range in a healthy population and therefore need to be evaluated using a patient population from a hospital. Therapeutic drug levels, of course, will depend on obtaining patient specimens for varying doses and varying times following the doses. As a general strategy, make use specimens from a hospital population to obtain a wide range of concentrations.
Tests of significance are useful mainly to assess whether there are sufficient data to support a conclusion that a difference or error exists (statistical significance), not whether that difference or error is large enough to invalidate the usefulness of a test (clinical significance). It is best to judge the acceptability of method performance by comparison of the observed errors to the total error that is allowable (such as defined in the CLIA criteria for acceptability of proficiency testing performance).
The method decision chart provides a graphical way of comparing the observed errors with standards of performance, whereas the earlier performance criteria provided a mathematical comparison. Therefore, the method decision chart is easier to use. In addition, the method decision chart permits simultaneous assessment against the different definitions of allowable total error, such as bias + 2s, bias +3s, and bias + 4s, which have evolved since the original description of "performance criteria."
For the original discussion of "performance criteria", see Westgard JO, Carey RN, Wold S. Criteria for judging the precision and accuracy in method development and evaluation. Clin Chem 1974;20:825-833.
The National Committee for Clinical Laboratory Standards (NCCLS, 90 West Valley Road, Suite 1400, Wayne, PA 19087-1898, phone 610-688-0100) provides a series of documents that provide extensive information about individual experiments:
For over 25 years, WESTGARD QC has provided the latest news, education, and tools in the quality control field. Our goal is to bring tools, technology and training into today's healthcare industry — by featuring QC lessons, QC case studies and frequent essays from leaders in the quality control area. This is also a reference source for quality requirements, including CLIA requirements for analytical quality. This website features the best explanation of the Multirule ("Westgard Rules") and how to use them. For laboratory and healthcare professionals looking for educational and reference material in the quality control field.
THIS IS THE WEBSITE FOR YOU!