Tools, Technologies and Training for Healthcare Laboratories

W.o.W., part II: Concepts and Terminology

October 2007

Trueness. Uncertainty. Accuracy. Precision. Why are there so many definitions for the same terms? Dr. Westgard traces the history of metric concepts in the clinical laboratory. See when and where ISO, IFCC, and CLSI began introducing new terminology into the lab.

A War of Words In Laboratory Medicine: Part II:
Concepts and Terminology in a Changing, Uncertain World

Influences on the terminology of the laboratory

Part I in this series described the differences of opinion about the usefulness of different concepts and terms, focusing on measurement uncertainty and total error. To better understand the context for this discussion, it may be helpful to trace the evolution of these concepts and the related terminology.

In the 1950s and 60s, analytical chemists and metrologists maintained a complete separation between accuracy and precision, with accuracy being considered more important than precision. Then clinical chemists asserted the importance of both precision and accuracy and their combined effect in the concept of total error. Today metrologists assert that trueness and uncertainty have become important in the global marketplace and that healthcare laboratories need to get onboard with these terms and concepts. ISO 15189 [1], an international standard specifically intended for medical laboratories, now brings that assertion into the real world of healthcare laboratories. It incorporates guidance from the world of metrology, both through the VIM vocabulary [2] and the GUM methodology for estimating measurement uncertainty [3]. Leading clinical chemists advocate this approach [4], thus laboratories need to understand what is going on and what they might do about it.

Influence of analytical chemistry.

In the 1950s and 60s, analytical chemists characterized the performance of measurement procedures in terms of accuracy and precision, or systematic and random errors, respectively, which are illustrated in Figure 1. I trained as an analytical chemist, completing my PhD degree in 1968, and can still remember struggling to eliminate every last little bit of systematic error in the measurement process that I was developing for my thesis project. Random error was not a big issue because replicate measurements could be made to compensate for precision and reduce its effects. Systematic error was the real issue in an analytical research laboratory, and we were held accountable for eliminating the systematic errors, down to the last few hundredths of a percent.

Figure 1. Traditional Concepts of Precision and Accuracy

Influence of metrology.

Similar terminology could be found in metrology and standards laboratories such as the National Bureau of Standards (NBS) [5], now known as the National Institute for Standards and Technology (NIST). This terminology, called “classical” in the current metrology guidelines, is defined in Table 1. The issue of “uncertainty” was also under discussion and NBS recommended that the “uncertainties of final results” be expressed in one of four ways: (Case 1) systematic error and precision both negligible, (Case 2) systematic error not negligible, precision negligible, (Case 3) neither systematic error nor precision negligible, and (Case 4) systematic error negligible, precision not negligible. [6] Uncertainty was attributed to the errors in the measurement procedure, but those errors were treated in the classical manner with separation of systematic error and imprecision. At this time, uncertainty related to imprecision and did not yet have a separate or distinct definition.

Table 1. Classical Terminology in analytical chemistry [VIM, 2]
Quality the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs (ISO 1994);
Accuracy closeness of agreement between a quantity value obtained by measurement and the true value of the measurand;
Error of measurement difference of quantity value obtained by measurement and the true value of the measurand;
Random error of measurement difference of quantity value obtained by measurement and average that would ensue form an infinite number of replicated measurements of the same measurand carried out under repeatability conditions;
Systematic error of measurement difference of average that would ensue from an infinite number of replicated measurements of the same measurand carried out under repeatability conditions and true value of the measurement;
Maximum permissible error one of the two extreme values of the error of indication permitted by specifications or regulations for a given measuring system;
Error of indication difference of indication of a measuring system and true value of the measurand.

Influence of clinical chemistry.

When I began working as a clinical chemist, I was surprised to see that replicate measurements were NOT commonly made in clinical laboratories, owing to the high workload of a production laboratory. Without replicate measurements, precision had a large impact on the quality of the test result and many clinical analysts were actually more concerned with precision than accuracy. That perspective challenged the systematic error concept of accuracy that was being applied in analytical and metrology laboratories. The reality of errors in a clinical laboratory was (and still is) that both precision and accuracy have an impact on the final test result and both need to be carefully evaluated and managed to assure the quality of laboratory tests.

With that recognition that that both accuracy and precision were important, it was necessary to consider their total effect when the performance of new analytical methods were validated for laboratory use. In developing method validation protocols, we described this total effect as “total error” to emphasize a “total error concept of accuracy,” as illustrated in Figure 2. At that time we argued [7]:

“None of this terminology [precision, accuracy, systematic error, random error] is familiar to the physician who uses the test values, therefore he [she] is seldom able to communicate with the analyst in these terms. The physician thinks rather in terms of the total analytical error, which includes both random and systematic components. From his point of view, all types of analytic error are acceptable as long as the total analytical error is less than a specified amount. The total analytical error is medically more useful; after all, it makes little difference to the patient whether a laboratory value is in error because of random or systematic error, and ultimately he is the one who must live with the error.”

Figure 2. Concepts of Random, Systematic, and Total Error

The “error terminology,” as defined in Table 2, was advantageous in the laboratory to emphasize the purpose of method evaluation studies and the strategy of performing different experiments to uncover different types of errors. In addition, the analysis of the experimental data was more understandable when the statistics were recognized as tools for estimating the sizes of different types of analytic error [8]. Finally, the decision on the acceptability of a new method became much more logical by comparing the size of the errors observed to the amount of error that was allowable for the particular test [9]. Thus, this error terminology was aimed internally to help laboratory analysts understand how to evaluate the overall quality of a testing process, and when of interest, to be able to describe and communicate the overall quality of a test to the physician.

It took five to ten years for this total error concept to become accepted in the clinical laboratory community. There also have been some recommendations to alter the concept and to alter the method of estimation. Levine and Miller [10] commented that the linear combination of bias + 2 SD should instead be replaced by squaring the terms, then extracting the square root, much like today’s methodology for estimating of uncertainty. Lawton, Sylvester, and Young recommended expanding the concept to include random interferences [11]. Krouwer developed a more rigorous methodology for estimation [12] by determining the direct differences by comparison to a reference quality method, then plotting a histogram of those differences (called a “mountain plot”). In 2003, Krouwer chaired a CLSI committee that developed a consensus standard on “Estimation of Total Analytical Error for Clinical Laboratory Methods” [13]. This document is commonly used today by manufacturers when formulating a performance claim for a waived test to obtain approval from the FDA for a new product. And it provides a somewhat official and widely accepted definition:

“…total analytical error is used to describe the following concepts:

1) the interval that contains a specific proportion (usually 90, 95, or 99%) of the distribution of differences in concentration between the test and reference methods, Note a) For example, 97.2% of the differences between the test and reference method fell within the limits of ± 4 mmol/L, hence the 95% total error goal was met.

2) ‘the result of a measurement minus a true value of the measurand,’ which is the VIM (93-310) definition of the term ‘error of measurement’. Note b) Both ‘total analytical error’ and ‘error of measurement’ contain random and systematic effects.’”

Thus, total error is recognized today in clinical chemistry as a standard and accepted concept. And it is interesting that VIM [Table 1 classical terminology] does recognize a similar concept with its “error of measurement” and also a worst case estimate of error with its term “maximum permissible error” which describes “one of the two extreme values of the error of indication permitted by specifications or regulations for a given measuring system.”

Table 2. Clinical Chemistry Error Terminology [9]
Accuracy, inaccuracy, precision, imprecision Same as IFCC definitions.
Random analytical error An error that can be either positive or negative, the direction and exact magnitude of which cannot be predicted. In contrast, systematic errors are always in one direction.
Systematic analytic error An error that is always in one direction, in contrast to random errors that may be either positive or negative and whose direction cannot be predicted.
Proportional systematic error An error that is always in one direction and whose magnitude is a percentage of the concentration of analyte being measured.
Constant systematic error An error that is always the same direction and magnitude, even when the concentration of the analyte changes.
Total error The net or combined effect of the random and systematic errors.
Total error specification, allowable total error, TEa The total amount of analytical error that can be tolerated without invalidating the medical usefulness of the analytical result. TEa can be used to decide the acceptability of a measurement procedure in method evaluating testing, or to calculate the size of medically important errors to aid in the selection or design of control procedures. When applied to method evaluation testing, we recommend that TEa be used as a 99% limit of error so that only 1 sample in 100 will have a greater amount of error; this allows a defect rate of 1% when the analytical process is under stable operation. When applied as a quality specification, we recommend that TEa be used as a 95% limit of errors, implying a maximum defect rate of 5% when the process experiences unstable operation.
Medically important errors Those errors that, when added to the inherent imprecision and inaccuracy of a measurement procedure, cause the total error specification to be exceeded.
Medical usefulness The concept that the requirements for the performance of an analytical process depend on how the analytical results are used and interpreted.

Influence of IFCC recommendations.

In the 1970s, the major global influence in clinical chemistry came from the International Federation for Clinical Chemistry (IFCC), which published a series of landmark papers on “quality control in clinical chemistry.” One of these, published in 1976, dealt with terminology [14], as summarized in Table 3. The IFCC replaced accuracy with “inaccuracy” and precision with “imprecision.” The intent was to emphasize the “differences” that could be characterized quantitatively by statistical estimates of bias for inaccuracy and standard deviation for imprecision. While this emphasis on “differences” should have re-enforced the error terminology, it caused a lot of confusion at that time. Nonetheless, imprecision and inaccuracy became the accepted terminology and bias became a common term for inaccuracy. Total error continued to be commonly used to describe the worst case estimate of error, but it was not part of the IFCC official terminology.

Table 3. Traditional terminology in clinical chemistry [IFCC, 14]
Analytical method Set of written instructions which describe the procedures, materials, and equipment, which are necessary for the analyst to obtain a result.
Analytical run This usually refers to a set of consecutive assays performed without interruption. The results are usually calculated from the same set of calibration standard readings. However, this definition may not be universally applicable, and in those cases the word series should be used after defining it.
Accuracy Agreement between the best estimate of a quantity and its true value. It has no numerical agreement.
Inaccuracy Numerical difference between the mean of a set of replicate measurements and the true value. This difference (positive or negative) may be expressed in the units in which the quantity is measured, or as a percentage of the true value.
Precision The agreement between replicate measurements. It has no numerical value.
Imprecision Standard deviation or coefficient of variation of the results in a set of replicate measurements. The mean value and number of replicates must be stated, and the design used must be described in such a way that other workers can repeat it. This is particularly important whenever a specific term is used to denote a particular type of imprecision, such as between-laboratory, within-day, or between-day.
Analytical error Difference between the estimated value of a quantity and its true value. This difference (positive or negative) may be expressed either in units in which the quantity is measured, or as a percentage of the true value.

Influence of NCCLS (CLSI) consensus standards.

Also in the mid 1970’s, the National Committee for Clinical Laboratory Standards was formed to provide consensus agreement on “good laboratory practices.” One group of standards, managed by the Evaluation Protocols Area Committee, focused on experimental and statistical guidelines for characterizing the performance of analytical systems. In the 1990s, a major effort of NCCLS was the establishment of a National Reference System for Clinical Laboratories (NRSCL). The National Bureau of Standards was a major player in this effort, thus increasing the influence of metrology on the concepts and terminology, as shown by NRSCL guideline for terminology [15]. Table 4 shows the extensive definitions for accuracy, bias, precision and error. Note that the terminology includes trueness and uncertainty, which are referenced to ISO and VIM. This was perhaps the earliest indication that trueness and uncertainty would make their way into the terminology and language intended for US laboratories.

Table 4. NCCLS/CLSI terminology for documents and standards [1996, 15]
Accuracy, Measurement accuracy, Result accuracy Closeness of the agreement between the result of a measurement and a true value of the measurand (VIM93-3.5).
Bias
  1. Statistics, the difference between the expected or mean value of an estimator and the value of the parameter it is estimating (RHUD1.7CD);
  2. A systematic, as opposed to a random, distortion of a statistic;
  3. Analytical science, a signed (+,-) quantitative measure of systematic departure from accuracy under specified conditions of analysis.
  4. The systematic deviation of the test results from the accepted reference value (WHO-BS/95.1793);
  5. The difference between the expectation of the test results and an accepted reference value (ISO3534-1/93-3. 13);
  6. The systematic deviation of test results from the accepted reference value (WHO-BS/95.1793);
  7. Inter-instrument bias, the difference observed by comparing two specified instruments under specified conditions of analysis, concentration range, method, etc.;
  8. Inter-method bias, the difference observed by comparing two specified methods under specified conditions of analysis;
  9. Inter-laboratory bias, the difference observed by comparing two laboratories that perform the measusrement of the same analyte under specified conditions;
  10. Result bias, the difference observed between a result and the true or expected value.
Precision
  1. The closeness of agreement between independent test results obtained under prescribed conditions (ISO Guide 3);
  2. Closeness of agreement between a series of measurements, under specified conditions, of a substance or biological product (WHO-BS/95.1793);
  3. The closeness of agreement between independent test results obtained under stipulated conditions (ISO3534-1-3. 140;
  4. Agreement between replicate measurements.

NOTE: Precision is not typically represented as a numerical value but is expressed quantitatively in terms of imprecision – the SD or the CV of the results in a set of replicate measurements.

Error
  1. Deviation from truth or from an accepted, expected true or reference value;
  2. Measurement error, result of a measurement minus a true value of a measurand (VIM93-3. 10);
  3. Random error, the nondirectional, patternless differences between successive results obtained with an analytical process;
  4. Result of a measurement minus the mean that would result from an infinite number of measurements of the same measurand carried out under repeatability conditions.
    NOTE a) Random error equals one minus the systematic error (VIM93-3. 14);
  5. A directional or patterned difference between the value obtained and that accepted as true or expected.
    NOTE: d) estimated independently of random error by averaging replicates, it is expressed in the units of the method as a bias, and it is calculated as the average difference between the values expected and obtained, or as a relative bias by dividing the bias by the average of the results;
  6. Systematic error, mean that would results from an infinite number of measurements of the same measurand carried out under repeatability conditions, minus a true value of the measurand.
    NOTES: b) Systematic error is equal to error minus random error; c) like true value, systematic error and its causes cannot be completely known (VIM93-3. 14);
  7. Proportional error, systematic error that is directly proportional to analyte concentration, intensity, or activity.
Trueness The closeness of agreement between the average value obtained from a large series of test results and an accepted reference value (ISO 3534-1-3. 12)
Uncertainty
  1. The stated range on either side of the best estimated of any given value within which that value may be expected to lie with some expressed degree of confidence.
  2. Measurement uncertainty, Uncertainty of measurement, parameter [and/or characteristic], associated with the result of measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand.
    NOTES: a) The parameter may be, for example, a standard deviation (or given multiple of it), or the half width of an interval having a stated level of confidence; b) uncertainty of a measurement comprises, in general, many components. Some of these components may be evaluated from the statistical distribution of the results of a series of measurements and can be characterized by experimental standard deviations. The other components, which can also be characterized by standard deviations, are evaluated by assumed probability distributions based on experience or other information; c) it is understood that the result of the measurement is the best estimate of the value of the measurand, and that all components of uncertainty, including those arising from systematic effects, such as components associated with corrections and reference standards, contribute to the dispersion (VIM93-3.9).

Influence of ISO global standards.

With the publication in 2003 of ISO 15189, a global quality standard became available that was intended for medical laboratories [1]. Earlier ISO standards had sometimes been utilized for medical laboratories, even though those standards were intended for general applications, e.g., ISO 9000 series for Quality Management Systems, or ISO 17025 for specific applications in metrology laboratories [General Requirements for the Competence of Testing and Calibration Laboratories]. ISO 17025 emphasized the concepts of trueness and uncertainty following the Guidelines for Uncertainty of Measurements [GUM, 3].

Two principles are important in understanding GUM and its application. First, any systematic differences are to be corrected in order to make test results comparable across laboratories and countries by establishing traceability to reference methods and materials. While that is understandable in principle, it is difficult in practice because there are relatively few reference materials and methods for the test performed in medical laboratories. Second, assuming that all systematic differences can be eliminated (or corrected), there supposedly will remain only random differences that can be characterized and described in terms of measurement uncertainty, which should then be reported to the consumer of the product, in our case, the physician customer on behalf of the patient consumer.

In this ISO/GUM world, “trueness” depends on a “traceable value,” as shown in Figure 3, where “traceable value” replaces the earlier “true value” which could never be known exactly. Of course, the traceable value can’t be known exactly either and its correctness must be assessed from the traceability chain and described in terms of “measurement uncertainty.” In this world, accuracy now becomes the error of an individual result, which has certain similarities to total error in that it can be affected by both random and systematic errors, but they are now considered different sources of variance, not different types of errors.

Figure 3. ISO concepts of Trueness and Accuracy

With development of 15189, the concepts and terminology that had been applied to testing and calibration laboratories, i.e., metrology laboratories, were applied to medical laboratories, as summarized in Table 5. We now live in the world of accuracy, trueness, precision, and uncertainty. New CLSI documents demonstrate that this terminology is being adopted for US laboratories, e.g., EP15-A2 is titled “User Verification of Performance for Precision and Trueness” [16]. In addition, CLSI has a project underway (C51) to produce a document on “Expression of Uncertainty of Measurement in Clinical Laboratory Medicine.”

Table 5. ISO terminology for medical laboratories [1]
Quality Degree to which a set of inherent characteristics fulfills requirements (ISO 2005);
Measurand Quantity intended to be measured;
Accuracy of measurement

Closeness of the agreement between the result of a measurand and a true value of the measurand.

Trueness of measurement

Closeness of agreement between the average value obtained from a large series of measurements and a true value.

Precision Closeness of agreement between quantity values obtained by replicate measurements of a quantity, under specified conditions.
Uncertainty of measurement

Parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand

Target measurement uncertainty Measurement uncertainty formulated as a goal and decided on the basis of a specific intended use of measurement results

To understand these more detailed guidelines and discussions about the estimation of uncertainty, it will be necessary to become more familiar with the GUM terminology, particularly those terms given in Table 6 and illustrated in Figure 4. An estimate of uncertainty can be provided in the form of a standard deviation (called the Standard Uncertainty). Estimates from multiple components of a measurement process can be combined by adding the variances of the individual components then taking the square root of the combined variance (called the Combined Standard Uncertainty). Those component variances can be estimated experimentally (called Type A uncertainty) or theoretically (called Type B uncertainty). Finally the uncertainty can be expressed as a confidence interval with a stated coverage factor (an Expanded Uncertainty or Expanded Combined Uncertainty with a coverage factor of 2 for a 95% interval).

Figure 4. ISO Uncertainty Concept and Terminology

Table 6. Additional GUM/ISO Uncertainty Terms
Type A uncertainty An uncertainty component evaluated from a statistical analysis of series of observations (GUM)
Type B uncertainty An uncertainty component evaluated by means other than the statistical analysis of observations (GUM)
Standard uncertainty

Uncertainty of the results of a measurement expressed as a standard deviation.

Combined standard uncertainty

Standard uncertainty of the result of a measurement when that result is obtained from the values of a number of other quantities, equal to the positive square root of a sum of terms, the terms being the variances or covariances of these other quantities weighted according to how the measurement result varies with changes in these quantities.

Expanded uncertainty Quantity defining an interval about the result of a measurement that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand;

NOTE 1. The fraction may be viewed as the coverage probability or level of confidence of the interval;

NOTE 2. To associate a specific level of confidence with the interval defined by the expanded uncertainty requires explicit or implicit assumptions regarding the probability distribution characterized by the measurement result and its combined standard uncertainty. The level of confidence that may be attributed to this interval can be known only to the extent to which such assumptions may be justified;

NOTE 3. Expended uncertainty is termed overall uncertainty in paragraph 5 of Recommendation INC-1 (1980). (GUM)

What’s the point?

Boring, boring, I can almost hear your thoughts as you struggle through all these terms and definitions! The right words are important for communication in the global marketplace. But, our scientific language should also be understandable and useful to the people who work in our laboratories. Changing the words and the language can itself create new obstacles to understanding what needs to be done to guarantee the quality of the test results being produced!

I believe that the concept of total error leads to a more practical estimate of test quality for the laboratory than the more complex estimate of measurement uncertainty from the GUM methodology. And I will argue that ISO inspectors should keep an open mind about how laboratories estimate the uncertainty of their test results. It will be much more practical to provide a top-down estimate of total error than a bottoms-up estimate from the GUM methodology. Both have their preferred applications in laboratory medicine and I’ll talk more about that in part III of this series.

References

  1. ISO/FDIS 15189 Medical laboratories – Particular requirements for quality and competence. 2002. International Organization for Standards, Geneva Switz.
  2. International Vocabulary of Basic and General Terms in Metrology (VIM). 3rd ed. Draft April 2004. Annex A.
  3. GUM. Guide to the expression of uncertainty in measurement. ISO, Geneva, 1995.
  4. Dybkaer R. Setting quality specifications for the future with newer approaches to defining uncertainty in laboratory medicine. Scand J Clin Lab Invest 1999;59:579-584.
  5. Precision measurement and calibration: Statistical concepts and procedures. NBS special publication 300, vol 1. National Bureau of Standards, US Department of Commerce, 1969.
  6. Eisenhart C. Realistic evaluation of precision and accuracy in instrument calibration systems. J Res NBS. 1963;67(C):161-187.
  7. Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1974;20:825-833.
  8. Westgard JO,Hunt RM. Use and interpretation of common statistical tests in method-comparison studies. Clin Chem 1973;19:43-57.
  9. Westgard JO, Barry PL. Cost-Effective Quality Control: Managing the quality and productivity of analytical processes. AACC Press, 1986.
  10. Levine S, Miller RG. Some comments on the judgment of the acceptability of new clinical methods. Clin Chem 1977;23:774-776.
  11. Lawton WH, Sylvester EA, Young BJ. Statistical comparison of multiple analytic procedures: application to clinical chemistry. Technometrics. 1979;21:397-409.
  12. Krouwer JS. Estimating total analytical error and its sources – techniques to improve method evaluation. Arch Pathol Lab Med 1992;116;726-731.
  13. CLSI EP21-A. Estimation of total analytical error for clinical laboratory methods. Clinical Laboratory Standards Institute, Wayne, PA 2003.
  14. IFCC. Buttner J, Borth R, Boutwell JH, Broughton PMG. International Federation of Clinical Chemistry provisional recommendation on quality control in clinical chemistry. I. General principles and terminology. Clin Chem 1976;22:532-40.
  15. NCCLS/CLSI. NRSCL8-P3. Terminology and Definitions for Use in NCCLS Documents. Clinical Laboratory Standards Institute, Wayne, PA, 1996.
  16. CLSI. EP15-A2. User Verification of Performance for Precision and Trueness. Clinical Laboratory Standards Institute, Wayne, PA, 2005.

James O. Westgard, PhD, is a professor emeritus of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.