ISO
W.o.W., part II: Concepts and Terminology
October 2007
Trueness. Uncertainty. Accuracy. Precision. Why are there so many definitions for the same terms? Dr. Westgard traces the history of metric concepts in the clinical laboratory. See when and where ISO, IFCC, and CLSI began introducing new terminology into the lab.
A War of Words In Laboratory Medicine: Part II:
Concepts and Terminology in a Changing, Uncertain World
Influences on the terminology of the laboratory
- Analytical chemistry
- Metrology
- Clinical chemistry
- IFCC
- NCCLS / CLSI consensus standards
- ISO Global Standards
- What's the point?
- References
Part I in this series described the differences of opinion about the usefulness of different concepts and terms, focusing on measurement uncertainty and total error. To better understand the context for this discussion, it may be helpful to trace the evolution of these concepts and the related terminology.
In the 1950s and 60s, analytical chemists and metrologists maintained a complete separation between accuracy and precision, with accuracy being considered more important than precision. Then clinical chemists asserted the importance of both precision and accuracy and their combined effect in the concept of total error. Today metrologists assert that trueness and uncertainty have become important in the global marketplace and that healthcare laboratories need to get onboard with these terms and concepts. ISO 15189 [1], an international standard specifically intended for medical laboratories, now brings that assertion into the real world of healthcare laboratories. It incorporates guidance from the world of metrology, both through the VIM vocabulary [2] and the GUM methodology for estimating measurement uncertainty [3]. Leading clinical chemists advocate this approach [4], thus laboratories need to understand what is going on and what they might do about it.
Influence of analytical chemistry.
In the 1950s and 60s, analytical chemists characterized the performance of measurement procedures in terms of accuracy and precision, or systematic and random errors, respectively, which are illustrated in Figure 1. I trained as an analytical chemist, completing my PhD degree in 1968, and can still remember struggling to eliminate every last little bit of systematic error in the measurement process that I was developing for my thesis project. Random error was not a big issue because replicate measurements could be made to compensate for precision and reduce its effects. Systematic error was the real issue in an analytical research laboratory, and we were held accountable for eliminating the systematic errors, down to the last few hundredths of a percent.
Figure 1. Traditional Concepts of Precision and Accuracy
Influence of metrology.
Similar terminology could be found in metrology and standards laboratories such as the National Bureau of Standards (NBS) [5], now known as the National Institute for Standards and Technology (NIST). This terminology, called “classical” in the current metrology guidelines, is defined in Table 1. The issue of “uncertainty” was also under discussion and NBS recommended that the “uncertainties of final results” be expressed in one of four ways: (Case 1) systematic error and precision both negligible, (Case 2) systematic error not negligible, precision negligible, (Case 3) neither systematic error nor precision negligible, and (Case 4) systematic error negligible, precision not negligible. [6] Uncertainty was attributed to the errors in the measurement procedure, but those errors were treated in the classical manner with separation of systematic error and imprecision. At this time, uncertainty related to imprecision and did not yet have a separate or distinct definition.
Table 1. Classical Terminology in analytical chemistry [VIM, 2] | |
Quality | the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs (ISO 1994); |
Accuracy | closeness of agreement between a quantity value obtained by measurement and the true value of the measurand; |
Error of measurement | difference of quantity value obtained by measurement and the true value of the measurand; |
Random error of measurement | difference of quantity value obtained by measurement and average that would ensue form an infinite number of replicated measurements of the same measurand carried out under repeatability conditions; |
Systematic error of measurement | difference of average that would ensue from an infinite number of replicated measurements of the same measurand carried out under repeatability conditions and true value of the measurement; |
Maximum permissible error | one of the two extreme values of the error of indication permitted by specifications or regulations for a given measuring system; |
Error of indication | difference of indication of a measuring system and true value of the measurand. |
Influence of clinical chemistry.
When I began working as a clinical chemist, I was surprised to see that replicate measurements were NOT commonly made in clinical laboratories, owing to the high workload of a production laboratory. Without replicate measurements, precision had a large impact on the quality of the test result and many clinical analysts were actually more concerned with precision than accuracy. That perspective challenged the systematic error concept of accuracy that was being applied in analytical and metrology laboratories. The reality of errors in a clinical laboratory was (and still is) that both precision and accuracy have an impact on the final test result and both need to be carefully evaluated and managed to assure the quality of laboratory tests.
With that recognition that that both accuracy and precision were important, it was necessary to consider their total effect when the performance of new analytical methods were validated for laboratory use. In developing method validation protocols, we described this total effect as “total error” to emphasize a “total error concept of accuracy,” as illustrated in Figure 2. At that time we argued [7]:
“None of this terminology [precision, accuracy, systematic error, random error] is familiar to the physician who uses the test values, therefore he [she] is seldom able to communicate with the analyst in these terms. The physician thinks rather in terms of the total analytical error, which includes both random and systematic components. From his point of view, all types of analytic error are acceptable as long as the total analytical error is less than a specified amount. The total analytical error is medically more useful; after all, it makes little difference to the patient whether a laboratory value is in error because of random or systematic error, and ultimately he is the one who must live with the error.”
Figure 2. Concepts of Random, Systematic, and Total Error
The “error terminology,” as defined in Table 2, was advantageous in the laboratory to emphasize the purpose of method evaluation studies and the strategy of performing different experiments to uncover different types of errors. In addition, the analysis of the experimental data was more understandable when the statistics were recognized as tools for estimating the sizes of different types of analytic error [8]. Finally, the decision on the acceptability of a new method became much more logical by comparing the size of the errors observed to the amount of error that was allowable for the particular test [9]. Thus, this error terminology was aimed internally to help laboratory analysts understand how to evaluate the overall quality of a testing process, and when of interest, to be able to describe and communicate the overall quality of a test to the physician.
It took five to ten years for this total error concept to become accepted in the clinical laboratory community. There also have been some recommendations to alter the concept and to alter the method of estimation. Levine and Miller [10] commented that the linear combination of bias + 2 SD should instead be replaced by squaring the terms, then extracting the square root, much like today’s methodology for estimating of uncertainty. Lawton, Sylvester, and Young recommended expanding the concept to include random interferences [11]. Krouwer developed a more rigorous methodology for estimation [12] by determining the direct differences by comparison to a reference quality method, then plotting a histogram of those differences (called a “mountain plot”). In 2003, Krouwer chaired a CLSI committee that developed a consensus standard on “Estimation of Total Analytical Error for Clinical Laboratory Methods” [13]. This document is commonly used today by manufacturers when formulating a performance claim for a waived test to obtain approval from the FDA for a new product. And it provides a somewhat official and widely accepted definition:
“…total analytical error is used to describe the following concepts:
1) the interval that contains a specific proportion (usually 90, 95, or 99%) of the distribution of differences in concentration between the test and reference methods, Note a) For example, 97.2% of the differences between the test and reference method fell within the limits of ± 4 mmol/L, hence the 95% total error goal was met.
2) ‘the result of a measurement minus a true value of the measurand,’ which is the VIM (93-310) definition of the term ‘error of measurement’. Note b) Both ‘total analytical error’ and ‘error of measurement’ contain random and systematic effects.’”
Thus, total error is recognized today in clinical chemistry as a standard and accepted concept. And it is interesting that VIM [Table 1 classical terminology] does recognize a similar concept with its “error of measurement” and also a worst case estimate of error with its term “maximum permissible error” which describes “one of the two extreme values of the error of indication permitted by specifications or regulations for a given measuring system.”
Table 2. Clinical Chemistry Error Terminology [9] | |
Accuracy, inaccuracy, precision, imprecision | Same as IFCC definitions. |
Random analytical error | An error that can be either positive or negative, the direction and exact magnitude of which cannot be predicted. In contrast, systematic errors are always in one direction. |
Systematic analytic error | An error that is always in one direction, in contrast to random errors that may be either positive or negative and whose direction cannot be predicted. |
Proportional systematic error | An error that is always in one direction and whose magnitude is a percentage of the concentration of analyte being measured. |
Constant systematic error | An error that is always the same direction and magnitude, even when the concentration of the analyte changes. |
Total error | The net or combined effect of the random and systematic errors. |
Total error specification, allowable total error, TEa | The total amount of analytical error that can be tolerated without invalidating the medical usefulness of the analytical result. TEa can be used to decide the acceptability of a measurement procedure in method evaluating testing, or to calculate the size of medically important errors to aid in the selection or design of control procedures. When applied to method evaluation testing, we recommend that TEa be used as a 99% limit of error so that only 1 sample in 100 will have a greater amount of error; this allows a defect rate of 1% when the analytical process is under stable operation. When applied as a quality specification, we recommend that TEa be used as a 95% limit of errors, implying a maximum defect rate of 5% when the process experiences unstable operation. |
Medically important errors | Those errors that, when added to the inherent imprecision and inaccuracy of a measurement procedure, cause the total error specification to be exceeded. |
Medical usefulness | The concept that the requirements for the performance of an analytical process depend on how the analytical results are used and interpreted. |
Influence of IFCC recommendations.
In the 1970s, the major global influence in clinical chemistry came from the International Federation for Clinical Chemistry (IFCC), which published a series of landmark papers on “quality control in clinical chemistry.” One of these, published in 1976, dealt with terminology [14], as summarized in Table 3. The IFCC replaced accuracy with “inaccuracy” and precision with “imprecision.” The intent was to emphasize the “differences” that could be characterized quantitatively by statistical estimates of bias for inaccuracy and standard deviation for imprecision. While this emphasis on “differences” should have re-enforced the error terminology, it caused a lot of confusion at that time. Nonetheless, imprecision and inaccuracy became the accepted terminology and bias became a common term for inaccuracy. Total error continued to be commonly used to describe the worst case estimate of error, but it was not part of the IFCC official terminology.
Table 3. Traditional terminology in clinical chemistry [IFCC, 14] | |
Analytical method | Set of written instructions which describe the procedures, materials, and equipment, which are necessary for the analyst to obtain a result. |
Analytical run | This usually refers to a set of consecutive assays performed without interruption. The results are usually calculated from the same set of calibration standard readings. However, this definition may not be universally applicable, and in those cases the word series should be used after defining it. |
Accuracy | Agreement between the best estimate of a quantity and its true value. It has no numerical agreement. |
Inaccuracy | Numerical difference between the mean of a set of replicate measurements and the true value. This difference (positive or negative) may be expressed in the units in which the quantity is measured, or as a percentage of the true value. |
Precision | The agreement between replicate measurements. It has no numerical value. |
Imprecision | Standard deviation or coefficient of variation of the results in a set of replicate measurements. The mean value and number of replicates must be stated, and the design used must be described in such a way that other workers can repeat it. This is particularly important whenever a specific term is used to denote a particular type of imprecision, such as between-laboratory, within-day, or between-day. |
Analytical error | Difference between the estimated value of a quantity and its true value. This difference (positive or negative) may be expressed either in units in which the quantity is measured, or as a percentage of the true value. |
Influence of NCCLS (CLSI) consensus standards.
Also in the mid 1970’s, the National Committee for Clinical Laboratory Standards was formed to provide consensus agreement on “good laboratory practices.” One group of standards, managed by the Evaluation Protocols Area Committee, focused on experimental and statistical guidelines for characterizing the performance of analytical systems. In the 1990s, a major effort of NCCLS was the establishment of a National Reference System for Clinical Laboratories (NRSCL). The National Bureau of Standards was a major player in this effort, thus increasing the influence of metrology on the concepts and terminology, as shown by NRSCL guideline for terminology [15]. Table 4 shows the extensive definitions for accuracy, bias, precision and error. Note that the terminology includes trueness and uncertainty, which are referenced to ISO and VIM. This was perhaps the earliest indication that trueness and uncertainty would make their way into the terminology and language intended for US laboratories.
Accuracy, Measurement accuracy, Result accuracy | Closeness of the agreement between the result of a measurement and a true value of the measurand (VIM93-3.5). |
Bias |
|
Precision |
NOTE: Precision is not typically represented as a numerical value but is expressed quantitatively in terms of imprecision – the SD or the CV of the results in a set of replicate measurements. |
Error |
|
Trueness | The closeness of agreement between the average value obtained from a large series of test results and an accepted reference value (ISO 3534-1-3. 12) |
Uncertainty |
|
Influence of ISO global standards.
With the publication in 2003 of ISO 15189, a global quality standard became available that was intended for medical laboratories [1]. Earlier ISO standards had sometimes been utilized for medical laboratories, even though those standards were intended for general applications, e.g., ISO 9000 series for Quality Management Systems, or ISO 17025 for specific applications in metrology laboratories [General Requirements for the Competence of Testing and Calibration Laboratories]. ISO 17025 emphasized the concepts of trueness and uncertainty following the Guidelines for Uncertainty of Measurements [GUM, 3].
Two principles are important in understanding GUM and its application. First, any systematic differences are to be corrected in order to make test results comparable across laboratories and countries by establishing traceability to reference methods and materials. While that is understandable in principle, it is difficult in practice because there are relatively few reference materials and methods for the test performed in medical laboratories. Second, assuming that all systematic differences can be eliminated (or corrected), there supposedly will remain only random differences that can be characterized and described in terms of measurement uncertainty, which should then be reported to the consumer of the product, in our case, the physician customer on behalf of the patient consumer.
In this ISO/GUM world, “trueness” depends on a “traceable value,” as shown in Figure 3, where “traceable value” replaces the earlier “true value” which could never be known exactly. Of course, the traceable value can’t be known exactly either and its correctness must be assessed from the traceability chain and described in terms of “measurement uncertainty.” In this world, accuracy now becomes the error of an individual result, which has certain similarities to total error in that it can be affected by both random and systematic errors, but they are now considered different sources of variance, not different types of errors.
Figure 3. ISO concepts of Trueness and Accuracy
With development of 15189, the concepts and terminology that had been applied to testing and calibration laboratories, i.e., metrology laboratories, were applied to medical laboratories, as summarized in Table 5. We now live in the world of accuracy, trueness, precision, and uncertainty. New CLSI documents demonstrate that this terminology is being adopted for US laboratories, e.g., EP15-A2 is titled “User Verification of Performance for Precision and Trueness” [16]. In addition, CLSI has a project underway (C51) to produce a document on “Expression of Uncertainty of Measurement in Clinical Laboratory Medicine.”
Table 5. ISO terminology for medical laboratories [1] | |
Quality | Degree to which a set of inherent characteristics fulfills requirements (ISO 2005); |
Measurand | Quantity intended to be measured; |
Accuracy of measurement |
Closeness of the agreement between the result of a measurand and a true value of the measurand. |
Trueness of measurement |
Closeness of agreement between the average value obtained from a large series of measurements and a true value. |
Precision | Closeness of agreement between quantity values obtained by replicate measurements of a quantity, under specified conditions. |
Uncertainty of measurement |
Parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand |
Target measurement uncertainty | Measurement uncertainty formulated as a goal and decided on the basis of a specific intended use of measurement results |
To understand these more detailed guidelines and discussions about the estimation of uncertainty, it will be necessary to become more familiar with the GUM terminology, particularly those terms given in Table 6 and illustrated in Figure 4. An estimate of uncertainty can be provided in the form of a standard deviation (called the Standard Uncertainty). Estimates from multiple components of a measurement process can be combined by adding the variances of the individual components then taking the square root of the combined variance (called the Combined Standard Uncertainty). Those component variances can be estimated experimentally (called Type A uncertainty) or theoretically (called Type B uncertainty). Finally the uncertainty can be expressed as a confidence interval with a stated coverage factor (an Expanded Uncertainty or Expanded Combined Uncertainty with a coverage factor of 2 for a 95% interval).
Figure 4. ISO Uncertainty Concept and Terminology
What’s the point?
Boring, boring, I can almost hear your thoughts as you struggle through all these terms and definitions! The right words are important for communication in the global marketplace. But, our scientific language should also be understandable and useful to the people who work in our laboratories. Changing the words and the language can itself create new obstacles to understanding what needs to be done to guarantee the quality of the test results being produced!
I believe that the concept of total error leads to a more practical estimate of test quality for the laboratory than the more complex estimate of measurement uncertainty from the GUM methodology. And I will argue that ISO inspectors should keep an open mind about how laboratories estimate the uncertainty of their test results. It will be much more practical to provide a top-down estimate of total error than a bottoms-up estimate from the GUM methodology. Both have their preferred applications in laboratory medicine and I’ll talk more about that in part III of this series.
References
- ISO/FDIS 15189 Medical laboratories – Particular requirements for quality and competence. 2002. International Organization for Standards, Geneva Switz.
- International Vocabulary of Basic and General Terms in Metrology (VIM). 3rd ed. Draft April 2004. Annex A.
- GUM. Guide to the expression of uncertainty in measurement. ISO, Geneva, 1995.
- Dybkaer R. Setting quality specifications for the future with newer approaches to defining uncertainty in laboratory medicine. Scand J Clin Lab Invest 1999;59:579-584.
- Precision measurement and calibration: Statistical concepts and procedures. NBS special publication 300, vol 1. National Bureau of Standards, US Department of Commerce, 1969.
- Eisenhart C. Realistic evaluation of precision and accuracy in instrument calibration systems. J Res NBS. 1963;67(C):161-187.
- Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1974;20:825-833.
- Westgard JO,Hunt RM. Use and interpretation of common statistical tests in method-comparison studies. Clin Chem 1973;19:43-57.
- Westgard JO, Barry PL. Cost-Effective Quality Control: Managing the quality and productivity of analytical processes. AACC Press, 1986.
- Levine S, Miller RG. Some comments on the judgment of the acceptability of new clinical methods. Clin Chem 1977;23:774-776.
- Lawton WH, Sylvester EA, Young BJ. Statistical comparison of multiple analytic procedures: application to clinical chemistry. Technometrics. 1979;21:397-409.
- Krouwer JS. Estimating total analytical error and its sources – techniques to improve method evaluation. Arch Pathol Lab Med 1992;116;726-731.
- CLSI EP21-A. Estimation of total analytical error for clinical laboratory methods. Clinical Laboratory Standards Institute, Wayne, PA 2003.
- IFCC. Buttner J, Borth R, Boutwell JH, Broughton PMG. International Federation of Clinical Chemistry provisional recommendation on quality control in clinical chemistry. I. General principles and terminology. Clin Chem 1976;22:532-40.
- NCCLS/CLSI. NRSCL8-P3. Terminology and Definitions for Use in NCCLS Documents. Clinical Laboratory Standards Institute, Wayne, PA, 1996.
- CLSI. EP15-A2. User Verification of Performance for Precision and Trueness. Clinical Laboratory Standards Institute, Wayne, PA, 2005.
James O. Westgard, PhD, is a professor emeritus of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.