It's time to talk about bias - is it a shift or an uncertainty?

A key part of the ongoing debate about measurement uncertainty and allowable total error hinges on how we talk about bias. Is bias just another uncertainty like imprecision? Or is it truly a shift, a linear change that cannot be included in uncertainty calculations? Recent metrological approaches have attempted to fudge the answer to this question. But it's time to tackle it head on.

Are you late to work, or are you merely uncertainly working?
A Commonsense discussion about bias as a shift and not an Uncertainty

Hassan Bayat, CLS
September 2022

Introduction

Griorgia Bianchi et al have recently studied the alignment of the second generation of a γ-glutamyltransferase (GGT2) method using serum pools at three levels traceable to the IFCC reference method [1]. The pools were the same previously used in the evaluation of the first generation of the method (GGT1) [2]. While the first generation had a small bias at all the three levels (average bias: -1.83%), the second generation has a substantial bias at all three levels (average bias: 7.43%).

Since this substantial positive bias with GGT2 cannot be ignored, and given that the authors “are not supporting the individual use of bias correction factors in daily practice” [3], they incorporated bias in the MU calculation to reflect its effect on the patient results. Using long-term imprecision (u_Rw=1.4%), uncertainty of calibrator (u_cal=1.06), uncertainty of bias (u_bias=1.45), and bias=8.85%, a MU (u_result) of 9.14% is calculated at the high level (pool H; Target=211.3 U/L) which fails to fulfil even the minimum analytical performance specification for MU (desirable MU APS= 4.45%; minimum: 6.68%). Therefore, the authors conclude “that problems exist in the correct implementation of metrological traceability of GGT2.”

While this conclusion seems pretty clear, there is a deeper problem with the paper's approach to calculating MU; it treats bias as an uncertainty component. Bias itself is not an uncertainty component and must not be incorporated in the MU calculation. This is clearly explained in the ISO/TS 20914:2019 document for calculating MU [4]. According to the ISO document, the only three uncertainty components are: u_Rw, u_cal, and u_bias. So, the authors' inclusion of bias in the calculation of MU is an obvious nonconformity from the ISO standard.

Including bias in the MU calculation was also done in the evaluation of GGT1 [2]. While the ISO standard was cited in the evaluation of GGT1, this more recent study does not cite this document. The authors have already published several papers on expanding ISO/TS 20914:2019 approach, so it’s strange to see they don’t cite this document. In fact, no other source for their MU calculation is cited.

So it seems there is a dilemma in treating the bias found in GTT2 assays: If the ISO 20914 approach is obeyed, then a small uncertainty of 2.28% would be calculated that would pass both minimum and desirable MU goal of 6.6% and 4.45% (even very close to optimum goal of 2.25%). On the other hand, deviation from the ISO 20914 (and including bias in the calculation of MU) would provide evidence to support the argument about the negative effect of the large bias of GTT2 on patient results.

Simply put: Bias differs from Uncertainty of Bias

A justification to treat bias as an uncertainty component is that a “bias estimate is an uncertain value.” Granted, bias is an uncertain entity because it’s the distance between two 'uncertain' points: (a) the reference value, and (b) the mean of replicate assays of the reference material by the laboratory. The combined uncertainty of these two uncertain ends is the uncertainty of bias:

u_bias = √(u_ref² + u_mean²)

where u_mean or SD_mean is SD of replicates of the reference material divided by the square root of the replicate number.

While bias has some uncertainty, bias itself is not an uncertainty component and should not be incorporated in the MU equation (as it is not included in the ISO/TS 20914:2019 equation). Let's look at a hypothetical example of a bias with zero uncertainty. Assume we take a WHO reference material with a ‘conventional’ assigned value (so theoretically u_ref is zero), and take a mean from 'infinite' repeats of that reference material (so, u_mean=0). Therefore, uncertainty of bias in this example would be zero, and we would have a bias with no uncertainty. This example, while extreme, demonstrates that the uncertainty in the estimation of bias cannot be a justification to include the actual bias in the calculation of MU. Now the question is, "What do we do with this ‘certain’ bias?" Should we ignore this bias just because it's not an uncertain estimate? Of course the answer is NO.

Let us stress this point: the above example shows that bias differs from uncertainty of bias; each one must be used in the right place and in right way.

Also, an analogy, the assigned value of a reference material is an ‘uncertain’ value. But this fact does not lead to the use of an assigned value itself in the calculation of MU. Instead, the uncertainty of the assigned value is used in the MU calculation. For example, if a calibrator's certificate reads: “Assigned Value: 100; 95%CI: 99.8 to 100.2”; then '0.1' is the uncertainty component, not the assigned value of 100. So, while u_ref is included in uncertainty of bias, the reference value itself is treated linearly to calculate bias. The same way, uncertainty of bias must be used in calculating MU, but the bias value itself (that represents the linear displacement of the mean) must be treated linearly.

A Commonsense Example of Bias not behaving like an Uncertainty

Assume Jack leaves home at a certain time every morning and arrives at the office about 8 AM. This ‘about’ represents the uncertainty of the time taken to arrive office that is dependent on various variables encountered along the way such as traffic. If someone asks Jack when he’ll arrive office on a certain day, in order to meet him right at the time he arrives at the office, Jack will answer ‘about’ 8 AM; meaning he will arrive in the interval of a few minutes before 8 to a few minutes after 8. If Jack is to a give a more statistical answer, he can do an experiment to determine the imprecision of the time taken to arrive office (let’s say the imprecision is 5 minutes); and say, “With a probability of 95%, I’ll be at office at 8±10 (between 7:50 and 8:10).”

Now, assume one day Jack leaves home one hour later (let's assume he has a forgiving supervisor and flexible hours). On this given day, if Jack is asked when he will arrive at the office, he’ll answer, “about’ 9 AM”; or, statistically saying, between 8:50 and 9:10. Easy, right? This is how we live in the real world. The time of delay (bias) is added linearly to the usual arrival time of 8 to find the center of uncertainty of 10 minutes; i.e., the whole ‘8±10’ is shifted 1 hour to the new place of ‘9±10’.

Let's take the example further still. Today Jack is going to be 3 hours late, therefore:

Arrival time = 3 + (8±10) = 11±10

Or, if he has to leave home 2 hours earlier (maybe to make time up from his previous tardiness), then:

Arrival time = (-2) + [8±10] = 6±10

No matter how much the delay/haste is, the amount of delay/haste must be treated linearly. However, there is something else related to the delay/haste that must be treated as an uncertainty component: the uncertainty of the delay/haste. In the above examples, it was assumed that the amount of delay/haste is an exact value; e.g. Jack leaves home exactly 1 hour later than usual. In the normal life; such exactness is not the case, so in the reality Jack would say e.g. “I’ll leave home ‘about’ 1 hour later”. Then there would be another uncertainty component: the uncertainty of the delay/haste. Let’s assume the uncertainty of leaving time is 4 minutes, then the total uncertainty of arriving office would be:

u_arrival = √(4² + 5²) = 6.4 min

**U_bias = 2 * 6.4 = ~13**

Given this; for the example with a delay of 1 hour:

Arrival time = 1 + (8±13) = 9±13 = [8:47 AM, 9:13 AM]

In the above examples, delay/haste is analogous to bias, and ‘arrival time’ is analogous to ‘true/reference vale’. As we see in the above examples; without delay/haste (i.e. without bias), the uncertainty of arrival is set around the usual arrival time of 8 AM. But, in the case of bias in leaving home, the bias is added linearly to the usual arrival time to find the shifted arrival time, and then the uncertainty is set around that shifted arrival time.

These simple examples show that while the correct way to treat u_bias is incorporating it in MU, the correct way to treat bias is treating it linearly.

Distorting Bias itself into an Uncertainty Interval

Let's look at the issue, taking in consideration the purpose of calculating MU that is to “characterizes the dispersion of the values that could reasonably be attributed to the measurand” [5]. Simply stated, MU gives an interval in which the true value would fall with a certain probability.

Recalling the above examples; without delay/haste (without bias), Jack’s arrival time is, with 95% probability, expected to fall in the interval around the usual time of 8 AM; e.g. from 7:50 to 8:10 AM. The interval of 7:50-8:10 characterizes the dispersion of the times that “can reasonably be attributed to” Jack's arrival time. In case of bias; e.g. a delay of 1 hour, Jack’s usual arrival time would be shifted from 8 AM to 9 AM, and he is expected to arrive in an interval around 9 AM, i.e. from 8:47 to 9:13 (u_delay included). If Jack is to have 1 hour delay, his friend must be around the office between 8:47 and 9:13 to meet Jack right at the time he arrives office.

But let’s see what happens if we don’t use bias to shift the usual arrival time, and instead include the 1 hour bias (60 minutes) in the MU calculation to get an ‘extended’ uncertainty around the ‘usual’ arrival time that could contain Jack’s true arrival; this way:

u_arrival = √(4² + 5² + 60²) = ~80 min (1 hour and 20 min)

U_arrival = 160 min (2 hour and 40 min)

According to this kind of uncertainty calculation, On the day Jack leaves home with a 1 hour delay, Jack’s arrival time will fall within an interval of 8±160 or 5:20 AM to 10:40 PM. Imagine if Jack says this to his friend that although he usually arrives in office between 7:50 and 8:10, but that day because he is leaving 1 hour later, it’s probable that he arrives between 5:20 to 10:40. Sure his friend will say: "Are you kidding? You’re going to leave later and then it's probable to arrive earlier than your usual arrival!" As we see, the ‘bias-included’ uncertainty gives an excessively wide interval, a large part of which simply cannot be reasonably attributed to Jack’s arrival time! Bias-included uncertainty is giving impossible results.

Back to the Laboratory

What we saw in the example of extending Jack’s arrival by including bias is exactly what happens when the laboratory includes the bias itself in the calculation of the uncertainty of patient results.

Assume we work with a bias-free method with a standard uncertainty of 2 arising from u_Rw and u_cal. According to the definition of uncertainty [5], we expect with a probability of 95% that the true value of a ‘measured’ result of 100 be somewhere in the interval 100±4% (96-104). In the case of an uncorrected bias; we intuitively know that the whole measured value and the MU around it are shifted to a higher or lower position depended on the direction of the bias. For example, with a positive bias of 10% in the method, we expect the true value for a measured value of 100 to be 10% smaller falling in the interval around 90. Assuming a u_bias of 1% added to the previous uncertainty of 2%, u_result would grow to 2.2% (U_result = 4.4%), and we expect true value be in the interval of 90±4.4% (86-94). But, if we add the bias itself to the other uncertainty components, the ‘bias-included’ standard uncertainty will now be 10.2%. Setting this extended uncertainty around the measured value of 100, we would expect the true value to be in the interval 100±20.4% (80-120). If we say this to a layman, s/he will surely say: "Are you kidding? You say that your analyzer is producing results that are about 10% higher than the correct values. So, if you report a measured result of 100, you expect me to assume that the true value for the measured value of 100 would be somewhere between 80 and 120. While I know that the measured result of 100 is 10% biased upward, how can I assume that the true value may be even higher than 100?" As we see (again!), the bias-included uncertainty gives a falsely widened interval so the true value can't reasonably be expected to fall within a certain portion of the interval (i.e., the portion above 100). In other words, the bias-included uncertainty interval would encompass some values that we know cannot "be attributed to the measurand."

Let’s keep in mind that if the results obtained from equations and calculations aren’t reasonable, those results shouldn’t be accepted; no matter how complicated the equations and calculations are, no matter how prominent and celebrated the proponents are, absurd results are still absurd. With a positive bias, the uncertainty interval must encompass values smaller than the measured values; and vice versa.

Back to the GGT2 Evaluation

Let's return to the GGT2 method for which Bianchi et al have found a high bias of 8.85% [1]. According to the ISO/TS 20914:2019; if bias is corrected, then uncertainty of bias correction (u_bias) is added to other MU components, but there is no clear recommendation in the standard on what to do with an uncorrected bias. Bianchi et al, trying to reflect the effect of the high bias in the GGT2 on the quality of patient results, have abandoned adherence to the ISO/TS 20914:2019 standard and have incorporated bias into MU calculation, leading to an extended uncertainty of 9.14%. The problem with the Bianchi et al approach is that, in contrast to the definition of MU, a main portion of the values included in this bias-included MU interval cannot "reasonably be attributed to the measurand.”

For example with a measured GGT2 value of 200 we know that, because of bias=8.85%, the corrected value would be 8.85% smaller (i.e. 182.3) and u_result (calculated from u_Rw, u_cal, and u_bias) would be 2.28% (U_result=4.56%). This way, the interval attributable to the measurand would be 182.3 ±4.56% or 174 to 190.6.

Now let's see what happens with an uncorrected bias and the bias-included MU of 9.14% (U=18.28%). With this U_result, the uncertainty interval would be 200±18.28 or 163.4 to 236.6. Now the layman's question is, "While we know that the measured result of 200 is 8.85% higher than the correct result; how is it wise to attribute to the measurand some results higher than 200 (as high as 236.6)?" And, if both approaches to treat bias i.e. either correcting for bias (and including u_bias in MU) or keeping bias (and including bias itself in MU) are correct, then a simple layman's expectation is to obtain the same MU interval from both methods; but the interval of 174-190.6 is more than four times tighter than the interval of 163.4-236.6

What’s the ‘root-cause’?

Over the recent years, there has been a growing tendency to consider MU "as the only worthwhile modern metrology concept and discount those who still find merit in the “good old” error concept as old-fashioned and not able to keep pace with modem developments.” [6] This has led to long-term criticism of Total Error (TE) model for treating bias linearly, and consequently removing TE and related measures (such as Sigma-metric) from the toolbox. In addition, some optimistic assumptions are made by the MU approach, such as bias can be deleted or completely corrected. So, when it comes to the real world where bias is neither corrected nor ignorable; and, in addition, treating bias linearly is taken an unpardonable sin, then there would remain no choice other than devising methods such including bias in MU.

As an example situation where bias is treated linearly, let’s consider the correction of bias throughout the traceability hierarchy so that over the chain of unbroken comparisons, the bias observed in any certain step is ‘linearly’ corrected for and the uncertainty of the correction is added to the total uncertainty accumulated up to previous steps. So, bias correction is nothing but to treat bias linearly. Even the correction for bias observed in the lab, that is advocated by MU fans, is nothing but treating bias linearly to correct either the calibration status of the analyzer or the patient results.

A mainstay of criticisms about TE is adding bias linearly to an uncertainty component (imprecision). This criticism ignores the fact that the linear treatment of bias in the TE model is to find the bias-shifted center of the distribution; and in this sense, treating bias linearly is completely correct. To have an example of adding bias linearly to an uncertainty component, let’s look at the equation for the significance of bias. A bias is considered significant if:

Bias ≥ U_bias

The above equation can be rearranged as:

Bias – U_bias ≥ 0

Simple! Bias is linearly combined with an uncertainty component.

In short, bias is nothing but a linear shift, and therefore it must be treated linearly.

‘Old-but-Correct’ Approach to Method Validation

Let’s take Sigma-metric (SM) from the TE toolbox and apply it to the results of GGT2 evaluation [1]. According to the evaluation results; bias at the high level (H pool; Target=211.3) is 8.85% and long-term imprecision (as CV) is 1.4%. Given allowable total error (TE_a) of 15% for GGT [7]:

SM = (%TE_a - |%Bias|)/CV
SM = (15-8.85)/1.4
SM = 4.4

This is a moderate Sigma method; an acceptable method that needs a somewhat though QC strategy (e.g., 1:2s/2:2s/R:4s multirule with N=2 and a run size of 100). To compare, the previous generation GTT1 with a bias of -1.83 [2] would have a Sigma of 9.4 that is a world class method, providing patients with so much more correct results and being controllable very easily (e.g. by 1:3s single rule with N=1 and a wide run size of 1000). In addition, if the bias of GGT2 is corrected due to “the manufacturer [accepts] to take an immediate investigation and eventually fix the problem with a corrective action" [1], the SM would grow from 4.4 to 10.7 that would be a very impressive improvement assuring so much more correct patient results and needing a very lenient QC procedure.

Final Words

Note that the biggest argument is not to correct for bias or not. Of course bias is important, and it should be somehow reflected in the method evaluations. In fact the idea behind introducing TE and SM is to provide laboratories with performance measures that are affected by both bias and imprecision. And it's a years-long recommendation from the TE side that manufacturers have the responsibility to correct bias rather than force re-calibration in the customer laboratories; and laboratories instead must combine bias with imprecision to make decision about acceptability of the method as well as design appropriate QC procedures.

The bias observed in the GGT2 method [1] is remarkable, and the authors’ intention to highlight it is appreciable, but this must be done using the correct measure; e.g. by using the Sigma-metric. It's strange to see while there are long-term arguments against TE model for including bias in the calculations (not correcting for it), when it comes to MU model, bias is included in the calculation even in the expense of deviating from the ISO/TS 20914:2019 standard.

In summary, the findings of Bianchi et al study [1] provide proof that “Laboratory professionals should independently verify the correct implementation of metrological traceability of commercial measuring systems and determine if their performance is fit for purpose” [2], and that bias can be estimated reliably so that it can be acted upon; cancelling claims made by those who insist on deleting bias from the Sigma-metric equation because "it is not easy to monitor and calculate real-time bias" [8].

As a final word, let us recall the sage advice of Dietmar Stockl, "the uncertainty concept is a perfect complement to the error concept" [6], and realize that the field of laboratory medicine can be empowered by the correct use of both models in the right places.

References

Giorgia Bianchi, Giulia Colombo, Sara Pasqualetti and Mauro Panteghini. Alignment of the new generation of Abbott Alinity γ-glutamyltransferase assay to the IFCC reference measurement system should be improved. Clin Chem Lab Med 2022; 60(10): e228–e231.
Aloisio E, Frusciante E, Pasqualetti S, Infusino I, Krintus M, Sypniewska G, et al. Traceability validation of six enzyme measurements on the Abbott Alinity C analytical system. Clin Chem Lab Med 2020;58:1250–6.
Braga F, Panteghini M. The utility of measurement uncertainty in medical laboratories. Clin Chem Lab Med 2020;58:1407–13.
ISO/TS 20914:2019. Medical laboratories – Practical guidance for the estimation of measurement uncertainty, 1st ed. Geneva, Switzerland: ISO, 2019.
JCGM 100:2008 Evaluation of Measurement Data—Guide to the Expression of Uncertainty in Measurement.
http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.
D. Stockl. Metrology and analysis in laboratory medicine: a criticism from the workbench. Scand J Clin Lab Invest 1996: 56: 193-197.
https://www.federalregister.gov/documents/2022/07/11/2022-14513/clinical-laboratory-improvement-amendments-of-1988-clia-proficiency-testing-regulations-related-to
Abdurrahman Coskun. Wrong Sigma metric causes chaos. J Med Lab, 2022. https://doi.org/10.1515/labmed-2022-0003.

Tools, Technologies and Training for Healthcare Laboratories

Quality Requirements and Standards