Tools, Technologies and Training for Healthcare Laboratories

W.o.W, part III: Intended Applications and Customers

October 2007

Looking beyond the terminology of Trueness and Uncertainty, Dr. Westgard examines the intended uses and customers of these terms. If we spend our time fighting about Metrological definitions, are we serving the patient?

A War of Words In Laboratory Medicine, Part III:
Intended Applications and Customers

It may be helpful to assess these concepts of trueness, accuracy, precision, total error, and uncertainty in the context of the “intended applications” and the “customers” of those applications, i.e., how they have been or are being “framed” for a particular audience. Different customers have different capabilities and/or needs for applying different concepts and terms. In the laboratory, we have traditionally used an “error framework” to understand the quality of tests and methods. In our changing global workplace, uncertainty is certainly of concern to everyone, but it is still uncertain whether the framework of uncertainty will offer any additional practical utility for assessing, controlling, and managing the quality of laboratory testing processes. Nonetheless, estimation of measurement uncertainty may be important to describe the quality that is being achieved in a laboratory. Here’s why!

Intended Applications

Trueness is a funny word that is not commonly used in the English language. Maybe that’s a good reason to use it in metrology. ISO [1] says that trueness is the “closeness of agreement of the average value obtained from a large series of measurements and a true value.” Since it is unlikely that any estimate of a patient test result in a service laboratory will ever be made from a large series of measurements, trueness exists more in theory than in practice. It might be estimated for a few measurands where certified reference materials are available and the laboratory performs an appropriate experiment that includes a large number of replicate measurements on the certified material. For trueness, the real burden of proof falls on manufacturers who are supposed to assure their measurement procedures provide the correct test results through traceability via reference materials and/or reference methods. To achieve comparability of test results from method to method and country to country, manufacturers must establish the trueness of their analytical methods and systems.

Accuracy, as used by ISO 15189 [1], is the “closeness of agreement between the result of a measurand and a true value of the measurand.” Unfortunately, the true value for any individual test result produced by a service laboratory is unknowable, so this definition of accuracy exists mainly as a theoretical concept. Like it or not, manufacturers and laboratories are left with the “systematic error concept of accuracy,” which is typically estimated from a comparison of methods experiment and the average difference between results by the method of interest and a reference or comparative method, often another field method. In these applications, bias provides a measure of the “residual” trueness, as shown in Figure 1. The bias remaining after correction characterizes the effectiveness of the manufacturer’s procedures for measurement specificity, calibration, standardization, correction, etc. US manufacturers are required by law to make a claim for the accuracy of their analytic measurement products and that claim typically is based on results from a comparison of methods experiment. Likewise, US laboratories are required by CLIA to verify a manufacturer’s claim for accuracy, which again would typically be done by data from a comparison of methods experiment.

Precision is an agreeable concept, i.e., the “closeness of agreement between independent test results under prescribed conditions,” which is pretty much agreeable to metrologists and laboratory scientists. But, the “prescribed conditions” are subject to some debate since the results depend on many variables and the conditions under which those variables are estimated. US manufacturers are required to make a claim for precision and typically provide estimates within a single run and for many runs performed over a period of one month. US laboratories are required to verify the manufacturer’s claims, which again is typically done for CLIA by performing 20 measurements within a single run and 20 measurements over 20 different runs over 20 different days. Again, both manufacturers and laboratories are customers who have practical applications in characterizing and verifying this performance characteristic.

Total error was intended for use by clinical laboratory scientists to more rigorously evaluate the effects of measurement performance on the clinical usefulness of test results [2]. The term “allowable total error” was defined as the goal or specification for the amount of error that could be tolerated in test results without invalidating the clinical interpretation for diagnostic and treatment decisions. Practical definitions of allowable total error are often provided by proficiency testing programs or external quality assessment programs, plus there is a large databank on biologic variation that can be used to calculate an allowable biologic total error for some four hundred quantities. While total error was not primarily intended to be communicated to the physician user or patient consumer, it does provide an estimate of error in a form that could be explained to others, avoiding the more technical terms such as imprecision and inaccuracy that are not well understood by users.

Measurement uncertainty is defined by ISO as a “parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand.” ISO 15189 states that “the laboratory shall determine the uncertainty of results, where relevant and possible.” That’s not a very strong statement about the importance and intended use of measurement uncertainty, but advocates emphasize that uncertainty is useful and necessary for a laboratory to:

  • describe the reliability of a measurement,
  • compare the reliability of different measurement procedures,
  • compare results from different measurement procedures,
  • ensure a measurement procedure is fit for its intended clinical use,
  • understand the significance of a test result in comparison to a reference limit,
  • understand the significance of any difference between two test results, and
  • educate users and physicians to properly interpret test results.

These applications identify the laboratory as the primary customer for purposes of assessing reliability and the physician as a secondary customer for utilizing measurement uncertainty to interpret patient test results in the context of their intended clinical uses. But the strongest driving force for doing this is to be accredited by ISO 15189. All these apparent intended uses can be achieved by available concepts, experimental procedures, and data analysis tools that are already part of the “error framework” for managing quality in the laboratory.

Apparent and Intended Customers

In the world of metrology, the intended customers are high level scientists who want to know the reliability of a value for a certified standard material. In this environment, the intended customers are knowledgeable about the meaning of trueness and measurement uncertainty and are able to incorporate such information into their intended use.

It’s quite different in world of healthcare! Physicians, while highly educated in medical sciences, have little understanding of analytical measurements and their performance characteristics. Laboratory tests appear to provide very exact information, particularly when the results are printed with many significance figures on a computer report. Therefore, the laboratory must intervene on behalf of the physician as well as the patient to provide test results that are reliable for the intended clinical use.

Similarly, manufacturers must intervene on behalf of laboratories to provide measurement procedures that provide comparable results. Laboratories depend on manufacturers to provide methods with analytical specificity, proper standardization, and calibrator materials that transfer trueness to the individual laboratory application.

Sometimes the apparent customer is not really the intended customer! We need to recognize what must be done in healthcare testing to manage quality and achieve comparable test results from different laboratories. It is important that these activities and responsibilities be properly focused, in the following ways:

  • Trueness must be primarily addressed by manufacturers. If traceability is to be achieved, it will need to be done by manufacturers. Few healthcare laboratories have the time and resources to do this. In the laboratory, the only practical approach is for those assays for which certified reference materials are available and can be used for the traceability of calibration. The practical measure is “bias” which indicates that the systematic concept of accuracy is applied here.
  • Accuracy, particularly the residual bias of a measurement procedure after appropriate standardization, correction for bias, and proper calibration in the field, must be claimed by manufacturers and verified by laboratories. The practical estimation of bias makes use of the systematic error concept of accuracy.
  • Precision must be claimed by manufacturers and verified by laboratories. Because there may be different experimental conditions for estimation, the manufacturer must describe those conditions as part of the precision claim. The laboratory must then consider those conditions in establishing its own experiment for verification of the claim.
  • Total error should be addressed by both manufacturers and laboratories to establish suitability for “intended use.” Unfortunately, US regulations do not require that manufacturers make any claim for test quality, though the FDA seems to be encouraging manufacturers to provide an estimation of total error for clearance of waived tests. Nonetheless, laboratories should adopt total error criteria for making their own decisions on the acceptability of methods for their intended applications. As stated in EP21-A [3]:

    “it is recommended that for most cases, if one has knowledge of total analytical error and outliers, then one has sufficient information to judge the acceptability of a diagnostics assay.”.

    In this case, the intended audience is the laboratory scientist who is involved with method validation, quality control, quality assessment, and quality management. The focus here is to manage analytical quality to assure that test results are fit for their intended clinical use.
  • Measurement uncertainty apparently has some applications intended for laboratory scientists and some intended for the physician user. Laboratory scientists seldom have the mathematical skills of metrologists, thus the recommended GUM methodology is not practical or useful in the laboratory. Physicians seldom have an understanding of measurement variation, thus it is not practical to provide statements of uncertainty and expect physicians to use that information in interpreting test results. In the real world, physicians and patients will likely be surprised to hear that test results are uncertain, disappointed to know they are in doubt, and concerned to find out they may be in error. These customers will not buy the concept of uncertainty and it will fail in this marketplace. Therefore, measurement uncertainty must be addressed by intermediary customers, primarily the manufacturer and secondarily the laboratory.

Nonetheless, important reference works, such as the Tietz Textbook of Clinical Chemistry [4], promote the ISO/GUM guidance, as shown in the following statement about the intended use and apparent customer of measurement uncertainty:

“The uncertainty concept is directed towards the end user (clinician) of the result, who is concerned about the total error possible, and who is not particularly interested in the question whether the errors are systematic or random…” [4, page 398]

Other strong advocates of measurement uncertainty reveal that the real intended customers are manufacturer who should identify sources and make improvements in their measurement procedures. For example, Kristiansen [5] makes the following statements about the need to estimate measurement uncertainty:

[5, page 1822, conclusions section of abstract] “…Information about uncertainty is necessary in the evaluation of the uncertainty associated with manufacturers’ measurement procedures and in general it may force manufacturers to increase their efforts in improving metrological and analytical quality of their products.”

[5, page 1828, concluding paragraph of paper] “…focusing on traceability and uncertainty has the potential to increase pressure on manufacturers of assays so that they increase their efforts to improve the quality of their products. This drive for improvement will include both the analytical quality, i.e., the specificity of the assay, and the metrologic quality of calibrators…”

Thus, one might conclude that total error is of value to physicians for understanding the overall quality of laboratory tests and to laboratories for determining acceptability for intended use, precision and accuracy are of use to laboratory scientists for managing the quality of laboratory tests, whereas detailed estimates of components of variation are important to manufacturers for improving the trueness and reliability of tests. If the real intended use of trueness and uncertainty is to get manufacturers to improve their analytical methodology and achieve transferability of results across methods, then we should be truthful about the real intent and the real intended customer. “Truth in labeling” is fundamental for all customers and should also be required by standard setting bodies and their guidance documents.

Utility of the Error Framework in the Real World of a Medical Laboratory

From my perspective, total error was a precursor of measurement uncertainty (actually corresponding to an expanded combined uncertainty with a coverage factor of 2). The similarities can be seen in Figure 2 where the concept of uncertainty is being shown as a “top-down” model that can be compared to the “top-down” model for total error. There are mathematical differences in how systematic errors or biases are added to the observed variance, or imprecision, but there is a commonality of purpose to describe the range of values that are implied or associated with a single test result. The recommended GUM uncertainty model is a more detailed bottom-up estimate that characterizes the uncertainty of each individual component or step in the process, then combines all the variances to determine their total effect. That model is impractical in a healthcare laboratory and should instead be aimed at manufacturers and their applications.

Existing guidelines for target or allowable errors. In assessing the reliability of measurement procedures, both manufacturers and laboratories have responsibilities to evaluate and verify measurement performance characteristics, identify sources of errors or variance, and make an assessment of the acceptability of those errors or variances for the intended clinical use of the test. Here’s where the error framework has a long history of laboratory applications and many guidelines and sources to assist in the definition of the allowable limits of error. An international consensus conference in 1999 recognized a hierarchy of quality specifications, as follows [6]:

  • clinical interpretation criteria, i.e., gray zone between two limits where different diagnostic or treatment decisions are made,
  • biologic goals for the maximum allowable imprecision and maximum allowable bias derived from intra-individual and group biologic variation, which have also been combined to provide a biologic total error goal,
  • analytical total error criteria for acceptable performance in proficiency testing (PT) and external quality assessment programs (EQA),
  • opinions of expert groups (e.g., NCEP), and
  • “state of the art” guidelines based on method performance, e.g., as observed in PT and EQA surveys.

Existing method validation protocols. There already exist standard practices, experimental procedures, and data analysis tools to assess the performance of measurement procedures and validate their acceptability for intended use (against quality requirements or target allowable errors). CLSI provides a series of documents that have been developed by technical committees, subject to consensus review and approval, and further modified and revised after experience in the field. These protocols cover precision, trueness or bias from comparison of methods, interference, linear range, detection limits, and reference intervals. They describe data analysis approaches that employ relatively simple statistical techniques and usually include worksheets to guide the data collection and calculations, making the protocols practical in laboratories today.

Compatibility with Six Sigma strategies and metrics. Given a statement of the amount of error that is allowable, there are tools and metrics that fit into today’s current quality management strategies, particularly Six Sigma Quality Management. Manufacturers have traditionally utilized indices of “process capability” to characterize the performance of processes relative to “tolerance limits” or quality specifications. In the 1990s, those indices were transformed to describe quality on a “sigma-scale” and to provide benchmarks for quality across industries and across processes [7]. Laboratory applications demonstrate the usefulness of Six Sigma concepts and metrics for assessment of quality of pre-analytic and post-analytic processes [8], characterization of performance and assessment of the acceptability of new measurement procedures [9], and selection of appropriate QC procedures [10].

Consideration of intra-individual biologic variation for interpreting serial patient test results. Considering the interpretation of serial test results, which is critical for ongoing treatment of patients, measurement variance is one contributor to the uncertainty, but the patient’s own biological variance must also be considered. Here’s where measurement uncertainty falls short in addressing the intended use and where a solution already exists in the form of the “reference change value,” RCV, an uncertainty term defined by Fraser [11] to include both analytical variation and within-subject biologic variation, as follows:

RCV = 21/2 * Z *(CVA2 + CVI2)1/2

Where Z corresponds to the “coverage factor” (e.g., 1.96 for 95% or 2.33 for 99%), CVA represents the imprecision or coefficient of variation for the measurement procedure, and CVI represent the within-individual biologic variation of the patient.

Practicality for communicating information to physicians. Fraser [11] has also developed a practical system for reporting tests results to help physicians recognize the importance of results, as follows:

> higher than reference limit

< lower than reference limit

>> higher than reference limit and likely clinically important

<< lower than reference limit and likely clinically important

* significant change (95% confidence level)

** highly significant change (99% confidence level)

In effect, measurement uncertainty is being reported here without actually providing any numerical values of uncertainty itself. Instead, the estimates of uncertainty are used to determine whether a test result should be flagged as an important deviation from a reference limit or an important change from a previous test result. This application takes into account the physicians’ needs for information in a format that can readily be understood and accepted by the customer.

What’s the point?

Trueness, total error, and measurement uncertainty have different roles to play and different players for those different roles. Trueness and measurement uncertainty should primarily be directed to the manufacturers of medical devices and diagnostic materials, who must be responsible for providing testing processes and materials that are traceable and reliable, as well as providing test results that are comparable from method to method, lab to lab, and country to country.

The error framework, with its total error concept and established approaches for setting quality specifications, is much more useful to laboratory scientists who must evaluate method performance, implement QC, and manage the quality of their testing processes to assure achievement of the intended clinical quality of test results.

In healthcare laboratories, the quality of diagnostic tests needs to be managed in the laboratory! Laboratory scientists must assume this responsibility and guarantee the quality of the test results that are produced, rather than just report the uncertainty and expect the physician user and patient consumer to make proper use of the test results. When uncertainty is reported, then the system for reporting must be carefully developed by laboratory scientists to aid and support their customers, e.g., Fraser’s system for flagging test results based on the reference limits and significant changes in serial test results [11].

Our world is full of uncertainty, to be sure, but it becomes more manageable in the laboratory when the focus is on analytical errors. However, if we do not manage quality quantitatively, then we must provide our users and customers with information about the quality being achieved, or the uncertainty of test results. The best approach is to manage quality quantitatively when intended use can be objectively defined and to provide information on uncertainty when intended use has not been well-defined.

References

  • ISO/FDIS 15189 Medical laboratories – Particular requirements for quality and competence. 2002. International Organization for Standards, Geneva Switz.
  • Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1974;20:825-833.
  • CLSI EP21-A. Estimation of total analytical error for clinical laboratory methods. Clinical Laboratory Standards Institute, Wayne, PA 2003.
  • Linnet K, Boyd J. Selection and analytical evaluation of methods – with statistical techniques. Chapter 14 in Tietz Textbook of Clinical Chemistry and Molecular Diagnostics, 4th ed, Burtis CA, Ashwood ER, Bruns DE, editors. Elsevier Saunders, St. Louis, MO, 2006.
  • Kristiansen J. The Guide to Expression of Uncertainty in Measurement approach for estimating uncertainty: An appraisal. Clin Chem 2003;49:1822-1829.
  • Petersen PH, Fraser CG, Kallner A, Kenny D. Strategies to Set Global Analytical Quality Specifications in Laboratory Medicine. Scand J Clin Lab Invest 1999;59(7):475-585.
  • Harry M, Schroeder R. Six Sigma: The Breakthrough Management Strategy Revolutionizing the World’s To Corporations. New York:Currency, 2000.
  • Nevalainen D, Berte L, Kraft C, Leigh E, Morgan T. Evaluating laboratory performance on quality indicators with the six sigma scale. Arch Pathol Lab Med 2000;124:516-519.
  • Westgard JO. Six Sigma Quality Design and Control: Desirable precision and requisite QC for laboratory measurement processes. 2nd ed. Madison WI:Westgard QC, 2006.
  • CLSI C24-A3. Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions; Approved Guideline – Third Edition. Clinical Laboratory Standards Institute, Wayne, PA, 2006.
  • Fraser C. Biological Variation: From Principles to Practice. AACC Press, 2001

James O. Westgard, PhD, is a professor emeritus of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.