Tools, Technologies and Training for Healthcare Laboratories

Update on Measurement Uncertainty: CLSI C51A

Uncertainty is an ISO-driven metrological concept. For years, while it has been popular in Europe, uncertainty has been discussed in the US, but never implemented. Now that CLSI has issued its C51A guideline, uncertainty is now official in the US, too. The C51 guideline is worth exploring in detail, for those who seek metrological orthodoxy in their testing processes.

 

Update on Measurement Uncertainty: New CLSI C51A Guidance

James O. Westgard, PhD
FEBRUARY 2012

In January 2012, CLSI released a new document on measurement uncertainty: C51A. Expression of Measurement Uncertainty in Laboratory Medicine [1]. This document will be mainly of interest to laboratories that are inspected or accredited under ISO 15189. For US laboratories, there is no requirement under CLIA to estimate measurement uncertainty (MU). Nonetheless, it should be useful to understand what is being recommended and how laboratories can position themselves to have the necessary information to estimate MU. Keep in mind that ISO recommendations are increasingly important for US laboratories. ISO inspired the recent CLSI EP23A guidance on the use of risk analysis to develop QC Plans. CMS has endorsed that approach and will phase out current EQC guidelines in favor of QC Plans based on risk analysis. MU could find its way into US laboratories in a similar manner.

Background

The “bible” on MU is the Guide to the expression of uncertainty in measurement, otherwise known as GUM [2]. The Gum approach usually emphasizes identifying the many factors that contribute to the variation of measurement results, then characterizing the variance of each of those factors and combining those variances to describe the uncertainty in the final test result. This approach is also described as a “bottom up” methodology. In contrast, there is also a “top down” methodology where the estimates of variation come directly from experimental data, such as method validation experiments or routine Quality Control data.

In looking at the C51A guideline, the bigger half of this 55 page document is devoted to the “bottom-up” methodology. The “bottom up” methodology is often more appropriate for manufacturers who want to identify and evaluate the many individual factors that contributes to the total variation in order to be able to isolate and reduce individual sources if necessary. For those who are stimulated by pages of mathematical equations, this should provide interesting reading. For others who are interested in the simpler “top-down” methodology, it will be useful to focus on section 7 (pages 28-31) and Appendix B (pages 53-55). The “top down” methodology is more suitable for medical laboratories where the interest is mainly to characterize the variation that will be expected in the final test results, which can be directly estimated from long-term QC data.

Accounting for random variability

As stated by White [3], the basic parameter of MU is the SD. The top-down approach depends primarily on obtaining a reliable and realistic estimate of the method’s SD or CV. C51A recommends long-term QC data. This refers to QC results on control materials that are analyzed repeatedly over a long period of time. Typically two or three different control materials are used to monitor performance at critical medical decision concentrations. If these QC results are obtained over a period of several months, they can be expected to reflect the contributions of different lots of reagents, calibrations, different lots of calibrators, analyzer pipetting, temperature stability, sensor stability, different operators, different operating conditions, etc., thus providing a realistic estimate of random error that affects the variability of laboratory measurements. We can debate the number of months, but it would be reasonable to consider 3 to 6 months of QC data as recommended in C24A3 for cumulative control limits [4].

Accounting for the uncertainty in the estimate of bias

In addition to the variability from random error, there is the possibility of systematic error, or bias. Bias may be estimated by analysis of certified reference materials, comparison of patient results between methods (comparison of methods experiment), or from an External Quality Assessment or Proficiency Testing program. Any estimate of bias has its own inherent uncertainty that depends on the experimental conditions. The uncertainty in the estimate of bias should be included in MU, regardless of whether or not bias is corrected. White has described this in practical terms, as follows [3]:

In practice, bias correction and replicate measurements can reduce, but not completely eliminate systematic and random errors, and therefore total error cannot be exactly known. It follows that the true value of a measured quantity cannot be exactly known either. This assumption is fundamental to the MU approach. The MU concept also assumes that if the bias of a procedure is known, then steps are taken to minimize it, e.g., by re-calibration. However, because the bias value cannot be known exactly, an uncertainty will be associated with such a correction. Thus, in the MU concept, a measurement result can comprise two uncertainties (i) that associated with a bias correction (uBias), and (ii) the uncertainty due to random effects (imprecision, uImp). Both of these uncertainties are expressed as SDs which, when combined together, provide the combined standard uncertainty for the procedure (uProc).

Correcting for bias

This is the crux of the problem of applying metrological principles in a medical laboratory! The bias of any measurement procedure must be eliminated when possible, corrected if practical, or ignored if necessary. Clearly, it is preferable to eliminate or correct for bias, but if that is not possible, the ISO and CLSI guidelines ignore bias as a factor contributing to the variation of measurement results. This may be acceptable in a single laboratory that employs a single measurement procedure for a test and establishes its own reference ranges and critical medical decision cutoffs, which allows the laboratory to assume that bias is constant, or remains stable, and thus does not cause any variation of test results. Most laboratories do not operate under these simplistic conditions, thus the bias between routine methods and reference methods will contribute to the values observed for test results and may affect their use and interpretation.

We have long argued that the elimination or correction for bias is not practical in medical laboratories, therefore it is necessary to account for the bias in any attempt to characterize the quality of the measurement process [5]. Traditionally, the estimate of “total error” has provided a practical approach for doing this. However, ISO does not recognize the utility of total error because it includes a linear contribution from method bias (i.e., bias is added to the 95% or 2*SD estimate of uncertainty). According to strict metrological principles, bias should be eliminated or corrected and therefore should not exist in reported test results. If bias can be completely eliminated or corrected, then only random error exists and the estimation of total error reduces to the estimate of random error. If bias cannot be completely corrected or eliminated, then we recommend  that it must be included when characterizing of the expected variation of the final test results.

Unfortunately, C51A does not resolve this issue. The document does recognize the concept of “total error”, but discourages its use for estimating MU:

Traditionally, a so-called total error for a measured quantity value is the calculated sum of two terms. The first term, the total systematic error, is based on observations or literature and expressed as the mean of the difference between observed values and the reference or target value. The second term is an estimate of the random measurement variation, ie, the SD of the observed differences multiplied by a coverage factor, according to the desired level of confidence. The sum of the two terms is an upper limit of the total error of a measurement, assuming random error follows a Gaussian distribution.

If a quantity for which a total error was calculated is used as input to another measurement, the total error has to be separated into its systematic and random components before they can be combined with those of the other input quantities in a measurement model. This lack of transferability is an important drawback of the error model.

The reasoning is that any estimate of MU needs to be combinable with other uncertainty components, which is done by squaring the SDs, adding the variances, then extracting the square root as the estimate of combined uncertainty. One application that is discussed in C51A is the need to combine uncertainties to estimate MU for calculated quantities, such as creatinine clearance, glomerular filtration rate, anion gap, etc. Other important applications involve adding the effects of pre-analytic variables, such as sampling variation and individual biologic variation. Note that error models can also accommodate additional sources of random variation, as well as additional sources of systematic errors, as recently discussed on this website (see Total Analytic Error and the Brain to Brain Loop).

Correcting for uncorrected bias

C51A doesn’t resolve this issue of what to do about uncorrected bias, except to make reference to a paper by Magnusson and Ellison that examines different ways to treat uncorrected bias in estimates of MU [6]. These authors first examine cases where bias corrections are not possible or not practical, then conclude that there are many situations that require laboratories to incorporate bias in reporting MU.

Routine laboratories are necessarily faced with the problem of treating uncorrected bias. For comparability of measurement results more guidance on bias and bias corrections is needed to help the laboratories in their work and to minimize differences in interpretation arising from different approaches to the treatment of bias. Given an observed bias or other strong reason for suspecting bias, it is misleading to report uncorrected results without reflecting the resulting bias. The options available are then:

1. Report the result and its uncertainty together with the bias (or the correction) and its uncertainty.

2. Report the result with an increased uncertainty interval.

This 1st option puts the burden of interpreting the meaning of bias and MU on the consumers of the test results. The 2nd option would be more practical in medical laboratories and the authors evaluated several approaches for increasing the uncertainty interval to include the effect of uncorrected bias.

On the basis of current studies, and taking into account testing laboratory needs for a simple and consistent approach with a symmetric interval, we conclude that for most cases with large degrees of freedom, linear addition of a bias term adjusted for exact coverage as described by Synek is to be preferred.

Therefore, the recommendation is to add the estimate of bias linearly to the expanded combined uncertainty (95% interval) of the observed long term imprecision plus the uncertainty in the estimate of bias. This is approach is actually consistent with the way bias is handled in the total error model and it can be expected that this estimate of MU will be slightly larger than the total error estimate because it includes the uncertainty of the estimate of bias.

C51A “top-down” approach

Long-term QC data should be used in order to include variations from changes in reagent lots, calibration, calibration lots, different operators, routine maintenance, etc. C51A recommends that important factors that contribute to variation be identified and the QC data be subjected to an Analysis of Variance (ANOVA) to estimate the various components of variation. C51A provides an example for creatinine that shows 5 replicates obtained for each of 5 different runs to illustrate the use of ANOVA. This example suggests that one might employ a short protocol, such as recommended in CLSI EP15, for the initial estimates of bias or trueness, precision, and measurement uncertainty.

A second example is provided in Appendix B. Those data illustrate triplicate measurements on one control material over a period of 42 runs with multiple operators and 2 different lots of reagents. ANOVA gives an estimate of 6.1% for the measurement CV, which would be multiplied by a coverage factor of 2 to provide an estimate of MU of ± 12.2%. By comparison, simple calculation of the SD from all 126 measurements gives a CV of 5.8% or a MU of 11.6%. The difference between estimates of MU of 12.2% and 11.6% is small and for practical purposes, both of these numbers represent an estimate of 12%. Maybe the simple calculation of the SD or CV from existing long-term QC data is a reasonable way to get started estimating MU.

The value of these examples is to show there are different ways of estimating MU and the experimental procedures and data calculations need not be overly complicated. For practical applications, it is a matter of working out a clear experimental protocol together with related calculation tools to make estimations of MU doable in busy production laboratories. It remains for those of us in the laboratory to provide those practical protocols and calculation tools.

What to do?

When a new method is being introduced into the laboratory, it is critical to evaluate its reportable range, imprecision, bias, reference intervals, and possibly limit of detection, recovery, and interference. To this list, we should now add MU, which could be estimated from the replication and comparison experiments in the initial evaluation study. These initial estimates will most likely be optimistic (low or small), but once the method is in routine use, the laboratory can collect long-term QC data to come up with better estimates of MU.

Estimates of long-term precision and on-going bias are the keys to providing realistic estimates of measurement uncertainty. There will be different sources for those estimates of random and systematic errors at different times in the life-cycle of a testing process. For estimating bias from CRMs, comparison studies, EQA or PT results, it will be necessary to work out the calculations to estimate both bias and the uncertainty of the estimate of bias. Depending on the design of QC protocols, it may be possible to employ simple calculations of the SD or CV, or it may be useful to employ ANOVA calculations. The good news is that the CLSI C51A guideline allows medical laboratories the flexibility to employ various “top-down” designs for estimating MU.

References

  1. CLSI C51A. Expression of Measurement Uncertainty in Laboratory Medicine. CLSI, Wayne, PA 2012.
  2. Guide to the expression of uncertainty in measurement. ISO, Geneva, 2008.
  3. White GH. Basics of estimating measurement uncertainty. Clin Biochem Rev 2008;29:S53-S60.
  4. CLSI C24A3. Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions. CLSI, Wayne, PA 2006.
  5. Westgard JO. Managing quality vs. measuring uncertainty in the medical laboratory. Clin Chem Lab Med 2010;48:31-40.
  6. Magnusson B, Ellison SLR. Treatment of uncorrected measurement bias in uncertainty estimation for chemical measurements. Anal Bioanal Chem 2008;390:201-213.