CLSI EP15-A3: verification of precision and estimation of bias
We are pleased to have a guest essay explaining the latest in Method Verification, specifically the newest version of the CLSI guideline EP15 on Method Verification. For labs seeking a quick check to insure their methods are meeting manufacturer specifications, EP15 may be the right choice
What's New in CLSI EP15-A3: User Verification of Precision and Estimation of Bias; Approved Guideline - Third Edition
R. Neill Carey, Ph.D.
CLSI EP15 was released as an A3 document in September 2014. This is its fourth iteration, and although it retains much of its original approach, there were some significant changes in the A3 version.
The most significant change is the creation of a relatively simple experiment that gives reliable estimates of a measurement procedure's imprecision and its bias. The essentials to accomplish this were present in EP15 through all of its previous versions, but they are refined and combined in EP15-A3 to make a single experiment. Here's a brief description of the protocol.
Specification of Acceptable Performance
Before doing anything else, the user should specify total allowable error, and derive from it the allowable standard deviation (or %CV), and the allowable bias. The user should ascertain that the imprecision of the candidate measurement procedure meets the criterion for allowable imprecision before beginning the evaluation. If the measurement procedure's imprecision reported in publications, such as the manufacturer's stated imprecision, does not meet the criterion, the precision verification procedure described in EP15-A3 is not appropriate. Being a limited experiment, it is less rigorous than the experiment described in CLSI EP5-A3, which would be more appropriate.
Verification of Precision
EP15 first describes a precision verification experiment. If the user is evaluating a procedure for which there are manufacturer's precision claims, or published precision results, that were developed using CLSI EP5, the user can verify the published precision in an experiment lasting as few as five days.
Patient samples, reference materials, proficiency testing samples, or control materials may be used as the test samples, provided there is sufficient sample material for testing each sample five times per run for five to seven runs. Precision should be tested with two or more sample materials at different medical decision point concentrations. The experiment produces at least 25 replicates collected over at least 5 days for each sample material.
The repeatability (previously termed "within-run") and the within-laboratory (previously termed "total") standard deviations are calculated by an analysis of variance technique (ANOVA) that properly accounts for the within-run and between-run contributions to the overall imprecision of the measurement procedure. The user needs access to software to do the ANOVA calculations, but they are available in Excel, Minitab, Analyze-it, and other software packages that do statistical calculations. The repeatability and within-laboratory standard deviations are then compared to the claimed or published standard deviations. If the calculated standard deviations are less than the published values, the user has verified the claim.
Sometimes the calculated standard deviations may exceed the published values, and yet the true standard deviations are less than the published values. For example, if the true standard deviations were actually exactly equal to their claimed counterparts, the calculated standard deviations would exceed their published counterparts fifty percent of the time in verification experiments. To allow for this possibility, the user calculates a "verification limit" based on the published standard deviation and the size of the user's experiment. If the calculated standard deviation is less than the verification limit, it is not statistically significantly larger than the published standard deviation, and the user has verified the published precision. If the calculated precision exceeds the verification limit, the calculated standard deviation is statistically significantly larger than the published standard deviation, and the user has failed to verify the published imprecision. The document includes tables to simplify the calculation of the verification limit.
Estimation of Bias
Because the precision experiment has so many replicate measurements, collected over several days, results from the precision experiment may be used to make a reliable estimate of the bias of the measurement procedure relative to the assigned (target) values of the sample materials used in the experiment. The only requirement is that the assigned value must be available. The choice of material depends on the purpose of the user in estimating the bias. Two or more appropriate materials should be tested in the precision experiment.
If the user is interested in estimating bias relative to the peer group for proficiency testing, and wants to estimate how the measurement procedure will perform well on proficiency testing, proficiency testing materials with peer group values for the measurement procedure being evaluated are appropriate.
For bias relative to the quality control peer group, quality control materials with peer group values for the measurement procedure are appropriate. "Assayed" quality control materials are not appropriate, unless peer group information is available. Typically, there is no way to estimate the uncertainty of the "assayed" values, which is needed to determine if the calculated bias is statistically significant.
Internationally recognized high order reference materials, such as a material from the U.S. National Institute of Standards and Technology, or from the Joint Committee for Traceability in Laboratory Medicine, or from similar organizations may be appropriate if the user wishes to estimate the bias relative to the assigned concentrations of such materials. Use of these materials is important in establishing the traceability of measurement procedures.
Patient samples or control materials which have been repeatedly assayed with a measurement procedure felt to be substantially equivalent to the measurement procedure being evaluated may be appropriate if the user is interested in estimating bias relative to that measurement procedure. This could be useful, for example, if the intent of the experiment was to estimate the bias of one laboratory in a system relative to another, or to the mean of the laboratories in a system.
If the sample materials are appropriate, and target concentrations are available, the user can estimate the bias between the mean concentration calculated in the precision experiment relative to the target concentration of each of the materials.
To determine whether there is any statistically significant bias between the mean concentration calculated from the experiment and the target concentration, the user calculates a “verification interval” around the target concentration. The width of the verification interval depends on the uncertainty of the target value of the reference material and the standard error of the calculated mean concentration from the experiment. Calculation of the verification interval would be complicated, but the committee simplified it greatly by providing tables for the difficult-to-calculate quantities based on the number of replicate measurements per run, the number of runs, and the uncertainty of the target value.
If the mean concentration from the user's experiment is within the verification interval, there is no statistically significant bias.
If the mean concentration from the user's experiment is beyond the verification interval, statistically significant bias exists. The user must evaluate the estimated bias versus allowable bias. If the estimated bias is less than allowable bias, the bias is acceptable. If the estimated bias exceeds allowable bias, it is not acceptable.
What Happened to the Patient Sample Comparison Experiment?
Previous versions of EP15 included a small comparison experiment, involving 20 patient samples, which was to be used to verify a manufacturer's claimed bias. There were two problems with this approach. First, users rarely have access to the measurement procedure used by the manufacturer (or authors of a publication) as the comparative method for the published bias. Sometimes the manufacturer identifies the comparative measurement procedure only generically. Second, most manufacturers provide only regression statistics as the results of comparison experiments, and do not provide bias claims, so the user has to calculate the bias to be expected from the regression statistics provided (and has little idea of the uncertainty of this estimated bias). The EP15-A3 committee felt that the patient comparison experiment had little value as it was, and that users who needed to perform a patient comparison experiment should consult CLSI EP9-A3 "Measurement procedure comparison and bias estimation using patient samples."
Benefit of the EP15-A3 Guideline
EP15-A3 enables the user to verify the claimed imprecision and estimate the bias of a measurement procedure in a single experiment lasting a little as five days’ time. This is valuable when the user wishes to verify precision and to estimate bias relative to a peer group or target concentration. It may be especially useful when patient samples are difficult to obtain for a traditional comparison of methods experiment.
Acknowledge Committee Members
The EP15-A3 document development committee was team of experts who worked together well. Principal authors were: Walter Hauck, Paul Durham, Anders Kallner, Marina Kondartovich, Jonathan Guy Middle, Merle Smith, James F. Pierson-Perry, and Aparna Srinivasan, with able assistance from CLSI staff member Ron Quicho.