Tools, Technologies and Training for Healthcare Laboratories

Total Analytic Error: from Concept to Application

An extended version of an essay originally prepared for the September 2013 issue of Clinical Laboratory News. It discusses how "total" our "total error" should be. Back when Total Analytic Error was introduced, it was clear that it concentrated on the analytical step. Over the years, "mission creep" has tempted others to keep expanding the types of errors to be considered.

 

 

 

Total Analytic Error from Concept to Application

James O. Westgard, Sten A. Westgard
September 2013

Abstract

This article reviews the concept of Total Analytic Error, its estimation, and its application for managing the analytical quality of laboratory testing processes. With the increasing attention being focused on quality systems and new risk management guidelines for quality control, one issue that is often overlooked is the quality goal or requirement for a laboratory test. How good does a test need to be? Related issues are how to define such quality goals, validate that analytical methods satisfy those goals (method validation), and assure achievement of those goals in routine testing (SQC). An effective system for managing analytical quality can be developed based on the concept of Total Analytic Error and practical application tools such as Sigma-metrics, the Method Decision Chart, the Sigma SQC Selection Graph, the Chart of Operating Specifications, and Normalized Method Decision and OPSpecs Charts. These tools can also be adapted to assess quality from Proficiency Testing or External Quality Assessment surveys, and finally to provide a practical estimate of Measurement Uncertainty.

Introduction

In Thornton Wilder’s play, Our Town, the character Rebecca Gibbs observes a noteworthy line address written on a letter:

“’It said: Jane Crofut, The Crofut Farm; Gover’s Corners; Sutton County; New Hampshire; United States of America….But listen, it’s not finished: the United States of America, Continent of North America, Western Hemisphere; the Earth; the Solar System; the Universe; the Mind of God – that’s what it said on the envelope.’”

Perhaps it’s a stretch, but some of our current thinking takes a similar approach to our traditional analytical concepts. Rather than focus on the core processes of laboratory testing, we keep expanding our scope, in an attempt to broaden our perspective, reaching further and further outside the laboratory to try and encompass not just an individual process, but an entire quality system. For laboratorians, we’re not trying to address the Mind of God, but the Mind of the Clinician (which may be a distinction without necessarily a statistically significant difference.)

This conceptual debate has embroiled the concept of “total error,” as laboratorians are re-considering what “total” means. A recent paper by Plebani and Lippi [1] urged that the concept of “total error” be expanded from its analytical focus to include pre-pre-analytic, pre-analytic, post-analytic and post-post-analytic components of the total testing process, ultimately to consider all the error sources in Lundberg’s “brain-to-brain” model for patient testing [2]. Lundberg’s model describes a 9-step patient testing process that includes ordering, collection, identification, transportation, separation or preparation, analysis, reporting, interpretation, and clinical action. Plebani and Lippi argue that quality goals are needed for all the steps in this process, not just the analytic step. That is certainly true and expansion of the total analytic error concept provides a framework for building more comprehensive quality management models. Such models will need to carefully define their intended use. Current examples are the “quality-planning” models that include other analytic factors, such as the detection capability of SQC procedures, as well as pre-analytic factors, including within-subject biologic variation. It will also be important to provide practical tools that facilitate the use and applications of such models.

Concept of Total Analytic Error

In 1974, a paper by Westgard, Carey, and Wold introduced the concept of Total Analytic Error in an effort to provide a more quantitative approach for judging the acceptability of method performance [1]. Prior practice was to consider precision (imprecision) and accuracy (inaccuracy, bias) as separate sources of errors and to evaluate their acceptability individually. That practice originated in classical analytical laboratories, where replicate measurements were usually made to reduce the effects of imprecision, leaving bias as the primary consideration in assessing the quality of a test result. In medical laboratories, however, common practice was to make only a single measurement on each specimen, thus the analytical quality of a test result depends on the overall or total effect of a method’s precision and accuracy. This difference in laboratory practice was the reason for introducing the concept of total analytical error.

“To the analyst, precision means random analytic error. Accuracy, on the other hand, is commonly thought to mean systematic analytic error. Analysts sometimes find it useful to divide systematic error into constant and proportional components, hence, they may speak of constant error or proportional error. None of this terminology is familiar to the physician who uses the test values, therefore, he [or she] is seldom able to communicate with the analyst in these terms. The physician thinks rather in terms of the total analytic error, which includes both random and systematic components. From his [or her] point of view, all types of analytic error are acceptable as long as the total analytic error is less than a specified amount. This total analytic error…is more useful; after all, it makes little difference to the patient whether a laboratory value is in error because of random or systematic analytic error, and ultimately, [the patient] is the one who must live with the error.”

The paper recommended that the acceptability of method performance be judged on the sizes of the observed errors relative to a defined allowable error.

Estimation of Total Analytic Error (TE, TAE)

A specific recommendation was made to estimate total analytic error by combining the estimate of bias from a method comparison study and the estimate of precision from a replication study, using a multiple of the SD or CV, e.g., TE = bias + 2 SD (or TE = bias + 1.65SD for a one-sided estimate) for a 95% confidence interval or limit of the possible analytic error. The terminology in the original paper refers to random error (RE), systematic error (SE), and total error (TE), as shown in Figure 1.

Concept of Total Analytic Error

Figure 1. Total Analytic Error Concept. SE, Systematic Error; RE, Random Error, TE or TAE, Total Analytic Error; Bias, inaccuracy ; SD, standard deviation.

It took several years for the concept of total analytical error to be accepted in medical laboratories. Some laboratory scientists argued that bias (or SE) was not that big an issue because laboratories’ reference ranges compensated for existing biases. However, biases between methods became a more common problem with the expansion of laboratory testing services and the implementation of different methods for the same measurand. Today, bias remains a significant problem in healthcare and laboratory networks, particularly when electronic patient records intermix the test results without identifying the measurement procedures.

Some analysts argued that there were additional components of error that should be considered, e.g., interferences that affect individual patient samples, sometimes referred to as random biases [4]. To include such effects, Krouwer recommended direct estimation from comparison with a reference method [5] and worked with CLSI to develop a guidance document for that approach [CLSI EP21A, 6]. Direct estimation requires a minimum of 120 patient samples, thus that approach is mainly useful for manufacturers who perform extensive validation studies for new methods. For clinical laboratories, where the CLIA guidance is to test a minimum of 20 control samples to estimate precision and a minimum of 20 patient samples to verify a manufacturer’s claim for bias, it is more practical to make an initial estimate of total analytic error by combining results from the replication experiment and the comparison of methods experiment. Ongoing estimates can be made on the basis of long-term SQC data and periodic estimates of bias from PT or EQA surveys.

Manufacturers generally make claims for precision and bias, not total analytic error, thus individual estimates of precision and bias are necessary to verify manufacturers’ claims. The exception is for “waived” methods where the FDA recommends that manufacturers establish a criterion for the Allowable Total Error (ATE) for each waived device before beginning the clinical studies in order to objectively evaluating the new method [7]. FDA now recommends that manufacturers evaluate the Total Analytic Error (TAE), which FDA describes as “the combination of errors from all sources, both systematic and random, …often expressed in terms of an interval that contains a specified proportion (e.g., 95%) of the observed differences between the working method and the comparative method.” FDA recommends at least 120 patient sample comparisons for each of 3 decision level concentrations, requiring a total of 360 patient comparisons. These comparison results are to be plotted on “error grids” that display the results by the Working Method (WM) on the y-axis vs the results of the Comparative Method (CM) on the x-axis. Graphical limits are to be drawn for zones for Allowable Total Error, which should include 95% of test results, and Limits for Erroneous Results (LER), where there should not be any test results. Further guidance on the preparation and application of error grids can be found in CLSI EP27A [8].

Goals for Allowable Total Error (TEa, ATE)

Given that Total Analytic Error is intended to be an estimate of the quality of a measurement procedure, its practical value depends on comparison to the quality required for the intended use of a test result, or the amount of error that is allowable without invalidating the interpretation of a test result. Terminology and abbreviations complicate this discussion and can lead to confusion. [Because the FDA seems to have settled on TAE and ATE, these terms may become standard in the US and will be used in the rest of this discussion.] Standardization of terminology worldwide will be more difficult because ISO doesn’t consider Total Analytic Error to be a politically correct term, owing to metrological rules that only allow for correction of bias and not inclusion of bias in estimates of measurement uncertainty. Actually, the ISO definition of “accuracy” as the “closeness of the agreement between the result of a measurement and the true value of the measurand” has a meaning similar to TAE, but it does not consider a 95% interval for estimation; instead, ISO recommends a concept of Measurement Uncertainty (MU) for estimating a 95% interval, but excludes bias from that estimate.

Quality requirements in the form of allowable errors have a long history in the clinical chemistry literature, going back to papers by Tonks’ [9] and Barnett’s [10] in the 1960s. Tonks defined allowable error as one-fourth of the reference range, or 10% of the measured value, whichever was larger. Barnett defined medically allowable SDs based on surveys of physicians’ use of test results. This again shows the conflict in error concepts as Tonks’ criteria represent any and all errors, i.e., Allowable Total Error, and Barnett’s criteria represent precision alone (allowable SD or allowable CV).

After many years of arguing about the right format for stating quality goals, specifications, and requirements, a consensus recommendation for a hierarchy of quality goals emerged from the 1999 Stockholm conference [11]. The recommendations encompass several different forms of quality goals, starting with clinical decision making (medically important changes in test results, e.g., a clinical decision interval, medically significant change interval), biologic variability (allowable SD, allowable bias, allowable TE), expert group recommendations (allowable SD, allowable bias), proficiency testing and external quality assessment criteria (allowable total error), and lastly “state-of-the-art” performance (allowable SD, allowable bias). This system of quality goals is illustrated in Figure 2 [12]. Note that this figure uses abbreviations smax and biasmax rather than sa and biasa, which are sometimes used to represent observed analytic performance as well as allowable performance; also it includes ATE for Allowable Total Error, as well as TEa from the original figure.

Different Quality Goals

Figure 2. Quality Goals and Requirements. Systems view of sources, types of criteria, and relationship to operating specifications.

Recommendations for ATE can be found in many national (e.g., CLIA) and international proficiency testing and external quality assessment programs. A database of biologic goals has been developed and maintained by Carmen Ricos and colleagues in Spain [13] and can be found on our website (www.westgard.com). This database includes over 350 measurands based on the published studies on biologic variation and provides recommendations for allowable SDs, allowable biases, and allowable biologic total errors, in accordance with Fraser’s guidelines for combining allowable SDs and allowable biases [11,14].

Operating Specifications

Setting a goal and achieving that goal are different activities. The latter requires a practical strategy that can be implemented in the real world. With the recent announcement of a new “Stingray” styled Corvette, one of us (guess who) has set a goal of a new Corvette! What is now needed is a strategy to make that Corvette happen – win the lottery, strike oil, get lucky in the stock market, or start a systematic saving plan? Setting a goal and achieving that goal are different activities. The latter requires a practical “operational” strategy that can be implemented in the real world.

For example, the CAP criterion for acceptable performance in a proficiency testing survey is 7.0% for HbA1c. To achieve that goal, a laboratory must select a method that has appropriate stable performance in terms of precision and bias, plus apply the right Statistical QC to detect analytic problems that cause instability. We use the term “operating specifications” to describe the precision and bias that are allowable for a measurement procedure and the SQC (control rules, number of control measurements) necessary to monitor performance at the bench level and assure a defined quality goal is achieved.

This approach is consistent with the ISO 15189 requirements [7] that:

5.5.1.1 “the laboratory shall select examination procedures which have been validated for their intended use”  and

5.6.2.1 “ the laboratory shall design quality control procedures that verify the attainment of the intended quality of results.”

Intended use and intended quality of results are ISO phrases for quality goals or requirements. Such quality goals are supposed to guide the selection of methods and the design of SQC procedures. An appropriate combination of precision, bias, and SQC becomes the strategy for achieving a defined quality goal.

Such an approach is also appropriate to satisfy the CLIA standard 493.1256 for control procedures, which states:

“(a) For each test system, the laboratory is responsible for having control procedures that monitor the accuracy and precision of the complete analytic process.

(b) The laboratory must establish the number, type, and frequency of testing control materials using, if applicable, the performance specifications verified or established by the laboratory as specified in 493.1253(b)(3).

(c) The control procedure must (1) detect immediate errors that occur due to test system failure, adverse environmental conditions, and operator performance. (2) Monitor over time the accuracy and precision of test performance that may be influenced by changes in test system performance and environmental conditions, and variance in operator performance.”

Under CLIA, control procedures should monitor the precision and accuracy of a method and be able to detect medically important errors, as well as assure laboratories will satisfy the criteria for acceptable performance in proficiency testing surveys. Again, there is a relationship between quality goals, precision, accuracy, and SQC that must be understood to properly manage the analytic quality of a laboratory testing process.

Application Tools

The FDA requires manufacturers to define ATE and to estimate TAE only for “waived” tests, whereas CLIA only requires users to follow the manufacturer’s directions without any need to verify or validate method performance nor perform SQC unless specified in the manufacturer’s directions. For non-waived tests, which form the bulk of testing processes in medical laboratories, CLIA requires verification of a manufacturer’s performance claims for precision and bias, implementation of a minimum SQC procedure with 2 levels of controls per day, and successful performance in periodic proficiency testing surveys. A more optimum system would require definition of quality goals in the form of ATE for all methods, waived and non-waived, plus participation in PT for all methods, including waived methods. In the meantime, laboratories can provide more optimal management of analytical quality by defining their own quality goals and utilizing the following application tools.

Sigma-metrics. While the original recommendation for a total error criterion was ATE ≥ bias + 2SD, later papers recommended a criterion of ATE ≥ bias + 4SD [8] and, with adoption of Six Sigma concepts [9], recommended criteria for ATE ≥ bias + 5SD and ATE ≥ bias +6SD. The application of Six Sigma “tolerance limits” corresponds to the laboratory limits for Allowable Total Error and facilitates the calculation of a “Sigma-metric” [(ATE – Bias)/SD or (%ATE-%Bias)/%CV] to characterize test quality, as shown in Figure 3. The higher the Sigma-metric, the better the quality of the testing process. Industrial guidelines recommend a minimum of 3-sigma quality for a routine production process. As sigma increases, SQC becomes easier and more effective, thus methods with 5 to 6 sigma quality are preferred with CLIA’s minimum requirement of 2 levels of controls per analytic run).

Sigma-metric Equation

Figure 3. Sigma-metric Calculation. Six Sigma concept of “tolerance limits” described in terms of TEa or ATE and method performance illustrated by Bias and SD.

Method Decision Chart. This is a graphical tool for evaluating the quality of a laboratory test on the sigma-scale [18]. Once ATE has been defined, here’s how to construct the chart. Scale the y-axis from 0 to ATE and the x-axis from 0 to 0.5 ATE. Label the y-axis “allowable bias” and the x-axis “allowable precision”. The units for ATE, bias, and precision must be the same, either concentration or percentage. Then draw lines that represent the various ATE criteria by locating the y-intercept at ATE and the x-intercept at ATE/m, where m is the multiple of the SD or CV in the total error criterion. Figure 4 shows an example of a Method Decision Chart for HbA1c based on the CAP PT criterion of 7.0%.

 ATE-Fig4

Figure 4. Method Decision Chart. Prepared for HbA1c where CAP PT criterion is 7.0%. Allowable inaccuracy (%Bias) is plotted on the y-axis vs allowable imprecision (%CV) on the x-axis. Diagonal lines represent, from left to right, 6-sigma, 5-sigma, 4-sigma, 3-sigma, and 2-sigma quality. Operating point (A) shows a method having a bias of 1.0% and a CV of 1.5%, which demonstrates 4-sigma quality.

To assess the quality of a method, an “operating point” is plotted to represent the observed bias as the y-coordinate and the observed SD or CV as the x-coordinate. For example, a HbA1c method with a bias of 1.0% and CV of 1.5% is shown as point A in Figure 3 and falls on the line corresponding to 4 sigma. You can calculate the Sigma-metric to confirm this is correct [sigma = (7.0-1.0)/1.5 = 4.0]. Point B describes a method with a bias of 0.0%, a CV of 1.4%, and demonstrates 5 sigma quality [(7.0-0.0)/1.4 = 5.0]. Point C shows 6 sigma quality, where bias is zero and the CV is 1.17% [(7.0-0.0)/1.17 = 6.0]. Point D represents a method with a bias of 1.0% and CV of 2.0% and falls on the line for 3 sigma [(7.0-1.0)/2.0 = 3.0]. Point E is a method with a bias of 2.0% and CV of 2.5% and illustrates 2 sigma quality [(7.0-2.0)/2.5 = 4.0].

Sigma SQC Selection Graph. The detection capability of an SQC procedure can be described by a “power curve” that shows the probability for rejection in relation to the size of the error that occurs. Power curves for several different SQC procedures are shown in Figure 5. Probability for rejection is shown on the y-axis versus the size of systematic error (in multiples of the method’s SD) on the x-axis. The curves, top to bottom, correspond to different control rules and different numbers of control measurements, as shown in the key at the right, top to bottom. As expected, probability for error detection (Ped) increases as the error gets larger, the number of control measurements increase, and also with addition of control rules to form multirule SQC procedures. There also is some increase in the probability for false rejection (Pfr), as shown by the y-intercepts of the power curves.

Sigma Metric QC Selection Graph

Figure 5. Sigma SQC Selection Graph. Probability for rejection is shown on y-axis vs the size of systematic error on the lower x-axis (given in multiples of the SD or CV) and the sigma-metric of the method on the upper x-axis. The curves represent different SQC procedures, top to bottom, as shown in the key at the right, top to bottom. Vertical line represents a method having 4-sigma quality and illustrates selection of SQC procedures that have a total of 4 control measurements per run.

An appropriate SQC procedure is one that provides a high Ped for medically important errors and a low Pfr. The size of the medically important systematic error, called the critical systematic error, ΔSEcrit, can be calculated from the quality goal for the test and the bias and precision of the method, as follows: ΔSEcrit = [(ATE – Bias)/SD] – 1.65, where the factor 1.65 is chosen to minimize the risk of erroneous test results at 5%. Note that the term (ATE – Bias)/SD represents the sigma-metric for the testing process, which means that Sigma = ΔSEcrit + 1.65. That relationship allows the graph to be rescaled in terms of sigma, as shown by the horizontal scale at the top in Figure 5. Directions for using this Sigma-metrics tool for selecting SQC procedure can be found in CLSI C24A3 [20].

To select an appropriate SQC procedure, you draw a vertical line corresponding to the sigma-metric of the testing process. An example is shown in Figure 5 for a sigma of 4.0. To identify appropriate control rules and the number of control measurements needed, inspect the graph to assess Ped at the points where the vertical line intersects the power curves. Good practice is to achieve a Ped of 0.90, or a 90% chance, of detecting medically important errors, while maintaining false rejection as low as possible. For the example here, you could select either a 1:2.5s single-rule with N=4 or a 1:3s/2:2s/R:4s/4:1s multirule procedure with N=4.

Chart of Operating Specifications. This tool relates the precision and bias observed for a method to the SQC that is needed, employing the same format as the Method Decision Chart. Mathematical equations in the form of “error budgets” are used to describe the relationship between the various error components and the defined quality goal. The starting point is the total error budget that is composed of bias plus a multiple of the SD or CV. Addition of a factor that characterizes the sensitivity of the SQC procedure provides an Analytical Quality Planning model [12]. Further expansion to include pre-analytic variables and account for within-subject biologic variation provides a Clinical Quality Planning model [13] that relates medically important changes in test results to precision, accuracy, and SQC. The results of these models can be displayed as shown in Figure 6, which is called an OPSpecs Chart.

OPSpecs chart

Figure 6. Chart of Operating Specifications. Prepared for HbA1c where CAP PT criterion is 7.0%. Allowable inaccuracy (%Bias) is shown on the y-axis vs allowable imprecision (%CV) on the x-axis. The different lines below 3.0 sigma line represent different SQC procedures, as identified in the key at the right. Point A shows a method having a 1.0% bias and 1.5% CV and illustrates the selection of SQC procedures that have a total of 4 control measurements per run.

An OPSpecs chart is prepared for a defined quality goal and displays the allowable bias on the y-axis vs the allowable SD or CV on the x-axis. An operating point is plotted to represent the observed method bias as the y-coordinate and observed method imprecision as the x-coordinate. The lines on the chart show the allowable regions for the different SQC procedures. Any line above the operating point identifies an SQC procedure that will provide at least a 90% chance of detecting medically important systematic errors. The controls rules and number of control measurements are identified in the key at the right, where the lines on the chart, top to bottom, match those in the key, top to bottom.

For example, a HbA1c method that has a bias of 1.0% and a CV of 1.5%, as shown by point A, can be effectively controlled by a 1:2.5s single-rule procedure with N=4 or a 1:3s/2:2s/R:4s/4:1s multi-rule procedure with N=4. A method with a bias of 0.0% and a CV of 1.5% (shown by point B) requires tighter control limits or more control measurements, e.g., 1:2.5s with N=2 or 1:3s with N=4. A method with a bias of 0.0% and a CV of 1.2%, as shown by point C, can be effectively controlled by a 1:3s rule with N=2. Finally, point D shows a method with a bias of 1.0% and CV of 2.0% that cannot be controlled even with a multi-rule procedure with a total of 6 control measurements. More SQC is needed, but that is not practical in many laboratories. Therefore, for this 3-sigma method, none of these SQC procedures will be able to verify the attainment of the intended quality of test results, as required by ISO 15189. This illustrates the limitations of a low-sigma testing process and the difficulties of controlling such a process in a medical laboratory.

The CAP PT criterion of 7.0% happens to correspond to the clinical treatment criterion for monitoring therapy, where a change of 0.5 %Hb at a level of 7.0 %Hb amounts to 7.1% (0.5/7.0). However, the criterion for diagnosis of diabetes describes a clinical decision interval from 5.6 to 6.5, i.e., where a value of 5.6 %Hb or less is considered normal and a value of 6.5%Hb is the diagnostic cutoff. That corresponds to a “gray zone” of 13.8% (0.9%Hb/6.5%Hb). Within-subject biologic variability is documented as 1.9% in the Ricos database, thus some of the gray zone will be consumed by the patient’s own variability. Implementation of this clinical quality-planning model is more complicated and requires computer support [23] to perform the calculations, account for the within-subject variation, and then prepare the OPSpecs chart shown in Figure 7.

Clinical Decision Interval OPSpecs Chart

Figure 7. Chart of Operating Specifications. Prepared for HbA1c where diagnostic criterion at 6.5 %Hb corresponds to 13.8%. Allowable inaccuracy (%Bias) is shown on the y-axis vs allowable imprecision (%CV) on the x-axis. The different lines represent different SQC procedures, as identified in the key at the right. The operating point shows a method having a 2.0% bias and 2.0% CV and illustrates the effectiveness of single or multi-rule procedures with a total of 2 controls per run.

Interestingly, methods with up to 2.0% bias and 2.0% CV can be effectively monitored by SQC procedures with N=2 and still provide the performance needed to satisfy the diagnostic quality requirement! That indicates that the performance needed for the initial diagnosis is less demanding than that needed for monitoring patient treatment. That also suggests that point-of-care HbA1c methods may be more appropriate for diagnostic applications than for monitoring therapy. This is a controversial issue and there may be other factors than analytic performance that must be considered, but this assessment also suggests that current practice recommendations do not necessarily reflect the capabilities of analytic methods relative to their intended clinical use.

Normalized Method Decision and OPSpecs Charts. One difficulty is the need to prepare Method Decision Charts and OPSpecs Charts for the defined error goal for each test. An alternative approach is to prepare “normalized” charts that are scaled from 0 to 100% on the y-axis and 0 to 50% on the x-axis. The coordinates of the operating point are then calculated as a percent of the defined error goal, e.g., for a method with ATE of 7%, bias of 1.0%, and CV of 1.5%, the y-coordinate would be 14% and the x-coordinate would be 21%. The advantage of normalized charts is that different tests with different quality requirements can be presented on the same chart. For example, a point-of-care glucose (ATE =15%), a laboratory glucose (ATE=10%), and a HbA1c (ATE =7%) could all be presented on the same Method Decision or OPSpecs chart. Or, all tests on a multi-test analyzer can be presented on the same chart.

Proficiency Assessment Chart. Sigma-metrics have been applied to characterize the quality of laboratory testing in proficiency testing or EQA surveys [24]. Such information is valuable for assessing quality on a state, regional, or national scale, particularly for tests that have national guidelines for interpretation, such as HbA1c. The Method Decision Chart can be applied to provide a Proficiency Assessment Chart that displays the operating points of method subgroups relative to sigma quality criteria. In addition, an overall weighted Sigma-metric can be calculated for the whole survey group.

Figure 8 shows the sigma-performance for 27 method subgroups in the 2nd CAP survey in 2012, in which 3045 laboratories participated. The survey data was obtained from the National Glycohemoglobin Standardization Program website (www.NGSP.org). The y-axis is scaled from 0 to 7.0% to represent the CAP 7.0% criterion for acceptable performance. The x-axis is scaled to include the complete range of CVs observed for the 27 method subgroups. For survey sample GH2-04, a true value of 5.40%Hb was determined from analyses by reference methods. The survey report summarizes the means and CVs of each method subgroup. From this information, the bias for each method subgroup is calculated as the difference between the subgroup mean and the true value. Operating points are plotted to show the method bias (absolute percent) as the y-coordinate and the method precision (%CV) as the x-coordinate. The diagonal lines represent 2-sigma and 3-sigma performance. Only a third of the method subgroups show performance better than 2-sigma, but no method subgroups are better than 3-sigma. The overall weighted average for the entire survey group is 1.73 sigma.

Proficiency Assessment Graph

Figure 8. Method Decision Chart. Prepared for HbA1c Survey Sample GH2-04 (5.40%Hb) where CAP PT criterion for acceptable performance is 7.0%. Observed inaccuracy (%Bias) is plotted on the y-axis vs observed imprecision (%CV) on the x-axis. Diagonal lines represent, from left to right, 3-sigma, and 2-sigma quality. Operating points represent method subgroups in 2012 survey.

Lest you think this performance is unusual or atypical, Figure 9 shows the proficiency assessment for sample GH2-05, whose true value is 8.30%Hb. Only one method subgroup achieves better than 3-sigma performance and another 8 subgroups between 2 and 3-sigma performance, and the overall weighted average is 2.01 sigma.

Proficiency Assessment Graph

Figure 9. Method Decision Chart. Prepared for HbA1c Survey Sample GH2-05 (8.30%Hb) where CAP PT criterion for acceptable performance is 7.0%. Observed inaccuracy (%Bias) is plotted on the y-axis vs observed imprecision (%CV) on the x-axis. Diagonal lines represent, from left to right, 3-sigma, and 2-sigma quality. Operating points represent method subgroups in 2012 survey.

Proficiency assessment, in this form, should be very helpful to laboratories for selecting methods that will provide good performance, particularly when applied across laboratories in a healthcare system. Significant biases still exist between method subgroups, in spite of the NGSP certification that all these methods provide equivalent performance. For these two survey samples, the average absolute biases and 1.8% and 1.9% (0.13 to 0.15 %Hb), with maximum biases up to 5% (or 0.3 %Hb). According to Bruns and Boyd, a bias of 0.1 %Hb could cause misclassification of 1.1 million in the US population, whereas a 0.3 %Hb could cause a doubling of the number of people in the US classified as diabetic [25]. Bias is still a serious problem when there are national diagnostic classification guidelines in use, even though all methods in use in the US are certified as equivalent.

Measurement Uncertainty. Finally, there is the contentious issue regarding TAE vs measurement uncertainty (MU) [26]. ISO 15189 requires that “the laboratory shall determine measurement uncertainty for each measurement procedure in the examination phase used to report measured quantity values on patient’s samples. The laboratory shall define the performance requirements for the measurement uncertainty of each measurement procedure and regularly review estimates of measurement uncertainty” [15]. ISO 15189 recommends that MU be estimated from intermediate or long-term SQC data that encompasses changes in reagent lots, calibrators, operators, and system maintenance.

Likewise, TAE can provide a practical estimate of measurement uncertainty (MU) by utilizing long-term QC data and periodic estimates of bias from PT or EQA surveys. ISO purists will not be happy with such an estimate as they prefer to correct for bias, rather than account for an observed bias. At such time that bias is truly corrected by standardization and certification programs, such as NGSP for HbA1c, then an estimate of TAE would converge with the experimental estimate of MU (using long-term SQC data). We can only warn you that TAE vs MU is a complicated issue where metrological principles conflict with practical applications in medical laboratories [26]. Nonetheless, practical applications of MU often reference goals for Allowable Total Errors and recommend that laboratories monitor bias along with MU [27].

Summary

In science, there is often a zero-sum mindset: if we accept theory A, we must reject theory B. Much of the new thinking in quality management has a more abstract approach, focused on systems rather than details. In this age where laboratories must make better connections to the clinical context, we need these additional tools and perspectives. But when it comes to different approaches to Total Analytic Error, a Manichean choice is not necessary. We don’t need to abandon Total Analytic Error in order to consider additional models. If laboratories want to define a new “Total” for Total Error, for example Total Testing Process Error or Total Brain-to-Brain Error, that is perfectly fine and may indeed provide additional insight into the quality management of patient testing processes. The difficulty in declaring the brain-to-brain patient testing process as the necessary perspective for defining quality goals is the current lack of information about requirements outside the analytic step. Even when that information becomes available, there will still be a need to understand individual steps in the patient testing process, manage the core analytical performance requirements, and satisfy ISO standards and regulatory criteria. The utility of the Total Analytical Error has served us well for nearly forty years and continues to provide practical guidance for managing laboratory testing processes.

There is an old saying often invoked when one’s focus is too narrow: the problem that you can’t see the forest for the trees. But we should also remember that even with a broad view of the forest, we still need to know which tree is rotten and which tree is solid. With testing, we need to know which tests are analytically sound and which tests are analytically full of noise. Otherwise, when a test fails in the laboratory, the laboratory may not be capable of hearing (and detecting) the failure.

Recommendations

  • Total Analytical Error (TE, TAE) is defined by FDA as “the combination of errors from all sources, both systematic and random, …often expressed in terms of an interval that contains a specified proportion (e.g., 95%) of the observed differences between the working method and the comparative method.” That definition is consistent with the original recommendation that was made in 1974 by Westgard, Carey, and Wold.
  • Terminology and abbreviations are confusing and it will be difficult to standardize globally because ISO does not recognize the concept of Total Analytic Error due to conflict with the metrological concept of Measurement Uncertainty.
  • Manufacturers should estimate TAE from at least 120 patient sample comparisons with a reference method, following FDA guidance and the CLSI EP21A approach.
  • Laboratories should estimate TAE from a minimum of 20 measurements in a day-to-day replication experiment and 20 to 40 patient samples in a comparison of methods experiment. Ongoing estimates of TAE may be based on long-term quality control data and periodic estimates of bias from PT or EQA surveys.
  • Goals for the Allowable Total Error (TEa, ATE) may be based on PT and EQA criteria for acceptable performance or by combination of precision and bias goals based on biologic variability. CLIA PT criteria for acceptable performance provide a starting point for US laboratories. Ricos database provides a more comprehensive source for goals based on biologic variation.
  • Strategies for achieving quality goals should consider the precision and bias required for a method and the SQC needed to verify the attainment of the intended quality of test results.
  • Judgments on the acceptability of the precision and bias of a method can be aided by a Method Decision Chart which provides a simple graphical tool for comparison with various ATE criteria and also in terms of quality on the Six Sigma scale.
  • Selection of appropriate SQC procedures may be accomplished using a Sigma-Metric QC Selection Tool as described in CLSI C24A3 or by using a Chart of Operating Specifications.
  • “Normalized” Method Decision and OPSpecs Charts provide graphical tools for displaying the performance of many different tests on a single chart and are effective for summarizing the performance capabilities and assessing the SQC needed for multi-test analyzers.
  • PT or EQA surveys provide performance data for assessing the quality of method subgroups on the sigma-scale using a Proficiency Assessment Chart, which is a variation of the Method Decision Chart.
  • Practical estimates of Measurement Uncertainty can be based on TAE estimates from long-term SQC data and periodic PT or EQA surveys.

References

  1. Plebani M, Lippi G. Closing the brain-to-brain loop in laboratory testing. Clin Chem Lab Med 2011;49:1131-3.
  2. Lundberg GD. Acting on significant laboratory results. J Am Med Assoc 1981;245:1762-3.
  3. Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1974;20:825-33.
  4. Lawton WH, Sylvester EA, Young-Ferraro BJ. Statistical comparison of multiple analytic procedures: application to clinical chemistry. Technometrics 1979;21:489-498.
  5. Krouwer JS. Estimating total analytical error and its sources. Arch Pathol Lab Med 1992;116:726-731.
  6. CLSI EP21-A. Estimation of Total Analytical Error for Clinical Laboratory Methods. Clinical and Laboratory Standards Institute, Wayne, PA, 2003.
  7. Guidance for Industry and FDA Staff: Recommendations for Clinical Laboratory Improvement Amendments of 1988 (CLIA) Waiver Applications for Manufacturers of In Vitro Diagnostic Devices. Food and Drug Administration, Center for Devices and Radiological Health, Office of In Vitro Diagnostic Device Evaluation and Safety. January 30, 2008.
  8. CLSI EP27P. How to Construct and Interpret an Error Grid for Diagnostic Assays. Clinical and Laboratory Standards Institute, Wayne, PA, 2009.
  9. Tonks DB. A study of the accuracy and precision of clinical laboratory determinations in 170 Canadian laboratories. Clin Chem 1963;9:217-233.
  10. Barnett RN. Medical significance of laboratory results. Am J Clin Path 1968;50:671-676.
  11. Hyltoft Petersen P, Fraser CG, Kallner A, Kenny D. Strategies to set global analytical quality specifications in laboratory medicine. Scand J Clin Lab Invest 1999;59:No.7 (Nov).
  12. Westgard JO. The need for a system of quality standards for modern quality management. Scand J Clin Lab Invest 1999;59 (No.7, Nov)483-486.
  13. Ricos C, Alvarez F, Cava JV et al. Current databases on biological variation: pros, cons and progress. Scand J Clin Lab Invest 1999;59(No.7, Nov):491-500.
  14. Fraser CG. Biological Variation: From Principles to Practice. Washington DC, AACC Press, 2001.
  15. ISO 15189. Medical laboratories – Requirements for quality and competence. 3rd ed. International Organization for Standards, Geneva, Switzerland, 2012.
  16. Westgard JO, Burnett RW. Precision requirements for cost-effective operation of analytical processes. Clin Chem 1990;36:1629-32.
  17. Westgard JO. Six Sigma Quality Design and Control: Desirable precision and requisite QC for laboratory measurement processes. Madison, WI: Westgard QC, 2001.
  18. Westgard JO. Basic Method Validation, 3rd ed. Madison WI:Westgard QC, 2008.
  19. Westgard JO. Assuring analytical quality through process planning and quality control. Arch Pathol Lab Med 1992;116:765-769.
  20. CLSI C24A3. Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions. Clinical and Laboratory Standards Institute, Wayne, PA, 2006.
  21. Westgard JO. Charts of operational process specifications (“OPSpecs Charts”) for assessing the precision, accuracy and quality control needed to satisfy proficiency testing criteria. Clin Chem 1992;38:1226-1233.
  22. Westgard JO, Hyltoft Peterson P, Wiebe DA. Laboratory process specifications for assuring quality in the U. S. National Cholesterol Education Program. Clin Chem 1991;37:656-661.
  23. Westgard JO, Stein B, Westgard SA, Kennedy R. QC Validator 2.0: a computer program for automatic selection of statistical QC procedures for application in healthcare laboratories. Comput. Methods Programs Biomed 1997;53:175-186.
  24. Westgard JO, Westgard SA. The quality of laboratory testing today. An assessment of sigma-metrics for analytic quality using performance data from proficiency testing surveys and the CLIA criteria for acceptable performance. Am J Clin Pathol 1006;125:343-354.
  25. Bruns DE, Boyd JC. Few point-of-care Hemoglobin A1c methods meet clinical needs. Clin Chem 2010;4-6.
  26. Westgard JO. Managing quality vs. measuring uncertainty in the medical laboratory. Clin Chem Lab Med 2010;48:31-40.
  27. White GH. Basics of estimating measurement uncertainty. Clin Biochem Rev 2008;29:S53-S60.