|The Comparison of Methods Experiment|
|Written by James O. Westgard, Ph.D.|
The comparison of methods experiment is critical for assessing the systematic errors that occur with real patient specimens. Guidelines for performing the experiment are provided and there is an introductory discussion of how to graph the data and what statistics should be calculated.
A comparison of methods experiment is performed to estimate inaccuracy or systematic error. Review MV – The Experimental Plan to see how this experiment fits together with the other experiments. You perform this experiment by analyzing patient samples by the new method (test method) and a comparative method, then estimate the systematic errors on the basis of the differences observed between the methods. The systematic differences at critical medical decision concentrations are the errors of interest. However, information about the constant or proportional nature of the systematic error is also useful and often available from appropriate statistical calculations. Both the experimental design and the statistical calculations are critical for obtaining reliable estimates of systematic errors.
The analytical method that is used for comparison must be carefully selected because the interpretation of the experimental results will depend on the assumption that can be made about the correctness of results from the comparative method. When possible, a “reference method” should be chosen for the comparative method. This term has a specific meaning that infers a high quality method whose results are known to be correct through comparative studies with an accurate “definitive method” and/or through traceability of standard reference materials. Any differences between a test method and a reference method are assigned to the test method, i.e., the errors are attributed to the test method because the correctness of the reference method is well documented.
The term “comparative method” is a more general term and does not imply that the correctness of the method has been documented. Most routine laboratory methods fall into this latter category. Any differences between a test method and a routine method must be carefully interpreted. If the differences are small, then the two methods have the same relative accuracy. If the differences are large and medically unacceptable, then it is necessary to identify which method is inaccurate. Recovery and interference experiments can be employed to provide this additional information.
A minimum of 40 different patient specimens should be tested by the two methods . These specimens should be selected to cover the entire working range of the method and should represent the spectrum of diseases expected in routine application of the method. The actual number of specimens tested is less important than the quality of those specimens. Twenty specimens that are carefully selected on the basis of their observed concentrations will likely provide better information than the a hundred specimens that are randomly received by the laboratory. The quality of the experiment and the estimates of systematic errors will depend more on getting a wide range of test results than a large number of test results.
The main advantage of a large number is to identify individual patient samples whose results do not agree because of interferences in an individual sample matrix. This is often of interest when the new method makes use of a different chemical reaction or a difference principle of measurement. Large numbers of specimens – 100 to 200 – are recommended to assess whether the new method’s specificity is similar to that of the comparative method.
Common practice is to analyze each specimen singly by the test and comparative methods. However, there are advantages to making duplicate measurements whenever possible. Ideally, these duplicates should be two different samples (or cups) that are analyzed in different runs, or at least in different order (rather than back-to-back replicates on the same cup of sample). The duplicates provide a check on the validity of the measurements by the individual methods and help identify problems arising from sample mix-ups, transposition errors, and other mistakes. One or two such mistakes could have a major impact on the conclusions drawn from the experiment. At the least, such mistakes will cause much consternation in deciding whether or not discrepant results represent the performance of the method or whether they are “outliers” that should be removed from the data set. Duplicate analyses would demonstrate whether or not these observed discrepancies were repeatable.
If duplicates are not performed, then it is critical to inspect the comparison results at the time they are collected, identify those specimens where the differences are large, and repeat those analyses while the specimens are still available.
Several different analytical runs on different days should be included to minimize any systematic errors that might occur in a single run. A minimum of 5 days is recommended , but it may be preferable to extend the experiment for a longer period of time. Since the long-term replication study will likely extend for 20 days, the comparison study could cover a similar period of time and would require only 2 to 5 patient specimens per day.
Specimens should generally be analyzed within two hours of each other by the test and comparative methods , unless the specimens are known to have shorter stability, e.g., ammonia, lactate. Stability may be improved for some tests by adding preservatives, separating the serum or plasma from the cells, refrigeration, or freezing. Specimen handling needs to be carefully defined and systematized prior to beginning the comparison of methods study. Otherwise, the differences observed may be due to variables in the handling of specimens, rather than the systematic analytical errors that are the purpose of the experiment.
Here’s where the going gets tough! There’s a lot of debate and discussion about the right way to analyze data from a comparison of methods experiment . This has been going on for as long as I’ve been a clinical chemist and seems to be a chronic problem that flares up with each new generation of laboratory scientists. We studied the use and interpretation of statistics in method comparison studies almost twenty five years ago [3,4] and the lessons we learned still apply today. Our intention here is to provide some brief guidelines and then discuss the statistics in more detail later on in this series.
The most fundamental data analysis technique is to graph the comparison results and visually inspect the data. Ideally, this should be done at the time the data is collected in order to identify discrepant results that will complicate the data analysis. Any patient specimens with discrepant results between the test and comparative methods should be reanalyzed to confirm that the differences are real and not mistakes in recording the values or mixups of specimens.
If the two methods are expected to show one-to-one agreement, this initial graph should be a “difference plot” that displays the difference between the test minus comparative results on the y-axis versus the comparative result on the x-axis, such as shown in the accompanying figure. These differences should scatter around the line of zero differences, half being above and half being below. Any large differences will stand out and draw attention to those specimens whose results need to be confirmed by repeat measurements.
Look for any outlying points that do not fall within the general pattern of the other data points. For example, there is one suspicious point in the plot shown here. Note also that the points tend to scatter above the line at low concentrations and below the line at high concentrations, suggesting there may be some constant and/or proportional systematic errors present.
However, this type of graph is generally advantageous for showing the analytical range of data, the linearity of response over the range, and the general relationship between methods as shown by the angle of the line and its intercept with the y-axis.
While difference and comparison graphs provide visual impressions of the analytic errors between the test and comparative methods, numerical estimates of these errors can be obtained from statistical calculations. Remember the inner, hidden, deeper, secret meaning of method validation is error analysis. You need to know what kinds of errors are present and how large they might be. The statistical calculations will put more exact numbers on your visual impressions of errors.
Given that the purpose of the comparison of methods experiment is to assess inaccuracy, the statistics that are calculated should provide information about the systematic error at medically important decision concentrations. In addition, it would be useful to know the constant or proportional nature of that error (review QC – The Experimental Plan for definitions of constant and proportional errors). This latter information is helpful in determining the cause or source of the systematic error and assessing the possibility of improving method performance.
For comparison results that cover a wide analytical range, e.g., glucose or cholesterol, linear regression statistics are preferable. These statistics allow estimation of the systematic error at more than one medical decision concentration to judge method acceptability and also provide information about the proportional or constant nature of the systematic error to assess possible sources of errors. Statistical programs typically provide linear regression or least squares analysis calculation for the slope (b) and y-intercept (a) of the line of best fit and the standard deviation of the points about that line (sy/x). The systematic error (SE) at a given medical decision concentration (Xc) is then determined by calculating the corresponding Y-value (Yc) from the regression line, then taking the difference between Yc and Xc, as follows:
Yc = A + bXc
|< Prev||Next >|