This question comes from Robbie Keith of Summit Laboratory We are in the process of evaluating our QC program. Our techs monitor Levy-Jennings charts for shifts and trends weekly. We would like to know what you consider to define a shift or trend (e.g. how many points are required increasing or decreasing to define a trend?) Consider control rules such as 41s, 10mean, etc., as good indicators of shifts and trends. The number of observations needed increases as the limit approaches the mean of the control material in order to keep the false rejections down. Minimum number of consecutive observations above or below the mean should probably be set as 6. There are some recommendations, particularly in the Germany, to use 7 above or below the mean, or 7 trending consecutively in one direction.

W.o.W. part IV: The Quality of Glycated Hemoglobin (Ghb)

December 2007

After traveling to five continents in 2007, Dr. Westgard gained a new perspective on US healthcare. As our healthcare system fails, the world is watching.

A War of Words in Laboratory Medicine, Part IV:
The Quality of Glycated Hemoglobin (Ghb):

A Sentinel Test: HbA1c
Current Performance
What is test quality today?
What quality needs to be assured?
What analytical performance is needed?
Lessons to be learned
References

December 2007

Part III of this series concluded that “if we do not manage quality quantitatively, then we must provide our users and customers with information about the quality being achieved, or the uncertainty of test results. The best approach is to manage quality quantitatively when intended use can be objectively defined and to provide information on uncertainty when intended use has not been well-defined.”

I have argued that the “Total Error framework” is most useful when laboratories want to manage quality in an objective and quantitative manner. But most laboratories today are not doing that! As evidence, let me share some results from a survey of some 300 laboratories in early 2007 (conducted in collaboration with Bio-Rad):

From your experience, what do you think about the performance of your QC?
- 63% - Provides reliable monitoring of test quality
- 24% - Probably is overly sensitive
- 10% - Often gives false alarms
- 4% - Probably not sensitive enough

How does your lab select its QC rules?
- 41% - Just use 2 SD for all or most tests
- 30% - Professional judgment/experience
- 20% - QC planning tools
- 8% - Peer practice

When asked to provide the quality requirements for their tests, only 8.6% of laboratories provided quantitative values!

These survey results suggest that (a) most laboratories today do not define the quality needed for the clinical use of the test results being produced, (b) most laboratories select their QC procedures arbitrarily without any consideration of the quality needed for patient care, and yet (c) most laboratories believe that their QC procedure provide reliable monitoring of that undefined and unknown test quality.

Given this absence of quality management, laboratories need to provide their customers with information about the uncertainty of their measurements. Otherwise, physicians may improperly interpret the significance of a patient test value vs a reference limit, cutoff, or serial changes in that patient.

A Sentinel Test!

The state of analytical quality management today can be illustrated using the glycated hemoglobin (GHb) test, which is one of the most important tests performed by laboratories today because of its prognostic value for diabetic patients. There are at least 30 different analytical methods available, according to a 2007 draft of the NACB evidence-based guidelines and recommendations for laboratory analysis in the diagnosis and management of diabetes mellitus [1]. This draft is an update of the earlier guidelines published in 2002 [2]. All these different GHb methods are certified by the National Glycohemoblogin Standardization Program (NGSP) as traceable to clinical studies and outcomes from the Diabetes Clinical and Complications Trial (DCCT). Thus, the GHb test provides an example of apparently “traceable” analytical methods that have been “corrected” by a standardization program so all should give the same test results.

Current Performance

The performance of GHb methods can be estimated from current proficiency data from the College of American Pathologists (CAP). The CAP surveys are recognized as providing the best estimates of method performance because they make use of fresh pooled samples. For example, the table below provides a summary of results for a recent CAP specimen having a true value of 7.60% assigned by the NGSP methodology. Column 1 identifies nineteen different method subgroups and Column 2 shows the numbers of laboratories in each subgroup. Note that two small subgroups with less than 10 members were not included, which leaves a total of 2806 laboratories in the survey.

Subgroup	Labs	Mean	Bias	SD	CV	TE	MU	RCV	LMQ	NMQ
1	47	7.4	-0.2	0.3	4.10%	0.79	0.72	1.19	3.76s	3.10s
2	33	7.7	0.1	0.4	5.2	0.88	0.82	1.41	2.85	2.6
3	227	7.4	-0.2	0.2	2.7	0.59	0.57	1.01	5.71	4.7
4	385	7.3	-0.3	0.34	4.6	0.96	0.89	1.25	3.39	2.5
5	169	8.1	0.5	0.2	2.5	0.9	1.07	1.08	5.63	3.16
6	11	7.3	-0.3	0.27	3.7	0.83	0.8	1.12	4.22	3.11
7	17	7.5	-0.1	0.28	3.7	0.64	0.59	1.15	4.11	3.75
8	253	7.9	0.3	0.21	2.6	0.7	0.73	1.06	5.55	4.09
9	95	7.8	0.2	0.18	2.3	0.55	0.54	1.02	6.35	5.24
10	593	7.6	0	0.21	2.8	0.42	0.44	1.05	5.36	5.36
11	23	7.7	-0.2	0.57	7.7	1.32	1.19	1.79	2	1.65
12	30	8.2	0.6	0.45	5.5	1.48	1.48	1.56	2.53	1.2
13	36	7.6	0	0.17	2.3	0.34	0.37	0.99	6.52	6.52
14	239	7.7	0.1	0.28	3.6	0.64	0.59	1.16	4.11	3.75
15	72	7.5	-0.1	0.17	2.3	0.44	0.41	0.98	6.61	6.03
16	63	7.5	-0.1	0.26	3.5	0.61	0.57	1.12	4.34	3.96
17	186	8	0.4	0.23	2.9	0.85	0.92	1.11	4.91	3.19
18	261	7.9	0.3	0.16	2	0.61	0.68	1	7.22	5.32
19	66	7.7	0.1	0.31	4	0.7	0.65	1.22	3.7	3.38
Total/WtAvg	2806	7.67	0.07	0.24	3.14%	0.68	0.68	1.11	5.08s	4.17s

Trueness. Column 3 shows the mean of each subgroup. The weighted average value for all laboratories is 7.67 %Hb, which is very close to the assigned value of 7.60 %Hb, therefore, trueness is 0.07 %Hb.

Method accuracy. Inspection of the different method subgroups reveals some “residual” biases, as shown in column 4, even though each method has been certified by NGSP. Some method subgroups may be inaccurate by as much as -0.3 %Hb to +0.6 %Hb.

Method precision. The variability expected for the different method subgroups is shown in Column 5 by the subgroup SDs and in Column 6 by the subgroup CVs. The SDs range from 0.16 to 0.57 %Hb, with a weighted average of 0.24 %Hb. The CVs range from 2.0% to 7.7%, with a weighted average of 3.14%.

Expected Total Error (TE). Column 7 shows how large an error might occur with the different method subgroups. This “total error” is obtained by adding the method subgroup bias plus a multiple of two times the method subgroup SD:

TE = (subgroup bias) + 2(subgroup precision)

The expected total errors may be as small as 0.34 %Hb or as large as 1.48 %Hb, with a weighted average of 0.68 %Hb.

Measurement Uncertainty (MU). Column 8 shows a top-down estimate that is calculated from the observed overall trueness plus the absolute bias of the method subgroup plus the SD of the method subgroup SD, with a coverage factor of 1.96, as shown below:

MU = 1.96[(trueness)² + (subgroup bias)² + (subgroup precision)²]^1/2

MU ranges from 0.41 to 1.48 %Hb for the different method subgroups, with a weighted average value of 0.68 %Hb.

Reference Change Value (RCV). Column 9 shows the change in a patient's serial test results that would be medically important. RCV is calculated from Fraser’s formula and his estimate of within-subject biologic variability of 4.1%, as follows

RCV = 2^{1/ 2} * 1.96 * [(CVsubgroup)² + (CVintra-individual)²]^1/2

The RCVs for different method subgroups range from 0.98 %Hb to 1.79 %Hb, with a weighted average of 1.11 %Hb. If a value of 2.0% is taken as the estimate of within-subject variation (as stated in the NACB guidelines), then the RCV would range from 0.62 %Hb to 1.63 %Hb, with a weighted average of 0.79 %Hb. The size of a medically important change in serial test results is, on average, between 1.11 %Hb and 0.79 %Hb, depending on which value (4.1% or 2.0%) is used for the within subject biologic variability.

What is test quality today?

The overall agreement of all the methods in this CAP survey with the NGSP assigned value is very close, 7.67 vs 7.60, or a trueness of 0.07 %Hb, which must be considered excellent performance. However, it is clear that some method subgroups still have serious problems with accuracy, e.g., biases up to 0.6 %Hb.

The average precision observed for the methods is 3.14%, remarkably close to the NACB desirable specification of 3.0%, which is excellent performance considering that this figure represents both within-lab and between-lab variation. But some method subgroups show CVs as high as 5.5% and 7.7%.

These performance figures indicate that a patient with a true value near the critical decision concentration of 7.0 %Hb may expect to see test results, on average, with errors as large as 0.68 %Hb or with a measurement uncertainty of 0.68 %Hb. Thus, a patient having a true concentration of 7.0 %Hb may be observed to have values ranging from 6.3 to 7.7 %Hb. However, there may be larger errors or larger uncertainty if the patient is tested by a particular method, such as subgroups 11 and 12, where the total error or measurement uncertainty may be twice as large, up to1.48 %Hb.

The differences in the quality of different method subgroups can be seen most directly from the Sigma-metrics shown in columns 10 and 11. [See the essay on Touchstone Test Methodology for a detailed description of the methodology for calculating Sigma-metrics from proficiency testing results] The weighted average of the LMQ estimate (Local Method Quality) is 5.08 sigma, which is very good. However, this estimate is optimistic because it doesn’t consider method bias. The NMQ estimate (National Method Quality), which does consider subgroup bias, is 4.17 sigma - respectable and typical of average business and production processes. Note that the sigma-metrics for individual method subgroups vary widely from 2.00 to 7.22 for LMQ and from 1.20 to 6.03 for NMQ. Some methods clearly perform better than others. Precision is still a problem for some methods and accuracy for others. In spite of national standardization and correction, the residual biases are significant, most likely due to differences in specificity between methods.

Keep in mind that the results discussed here are based on the CAP PT survey, which consistently shows the best performance among the available PT programs and represents only about half of the laboratories performing GHb testing. These estimates of test quality are therefore optimistic and better than the quality available when all laboratories are considered.

What quality needs to be assured?

CLIA has not defined a criterion for acceptable performance for GHb, but CAP has set its grading criterion as an allowable total error of 15%, which at a decision level of 7.0 %Hb corresponds to 1.05 %Hb. This is the quality requirement that has been used in the estimation of the Sigma-metrics above, but it may actually be too large. Given that GHb is being recommended as a long-term estimate of the average glucose concentration, it could be argued that the allowable total error should be 10%, the same quality required by CLIA for the the direct measurement of glucose concentration.

There is no specific guidance for the quality required for the clinical use, but the intended clinical quality is sometimes revealed by the test interpretation guidelines. The updated recommendations state the following [1, page 40]:

“Treatment goals should be based on the ADA recommendations which include maintaining GHb concentrations <7% and in individual patients as close to the non-diabetic range as safely possible.”

This treatment guideline has changed from the earlier 2002 recommendation [2], which defined a change of 1.0 %Hb as important for reevaluating treatment, as follows:

“Treatment goals should be based on ADA recommendations, which include maintaining GHb concentrations 8%.”

Evidently, the treatment guidelines are being tightened and smaller changes in GHb are supposed to lead to changes in treatment. Yet, the updated guidelines still acknowledge that “small changes in GHb (e.g., +/- 0.5% GHb) over time may reflect assay variability rather than a true change in glycemic status.” Thus, it might be inferred that changes greater than 0.5 %Hb, yet less than 1.0 %Hb, may now lead to changes in treatment.

Thus, the available guidance suggests that GHb tests should be correct within 0.5 to 1.0 %Hb at a critical concentration of 7.0 %Hb, which corresponds to allowable total errors of 7% to 14%.

What analytical performance is needed?

The NACB guidelines provide the following recommendations for analytical performance and quality control [1, page 38]:

“Laboratories should use GHb assay methods with an interassay CV<5% (ideally <3%). At least two control materials with different mean values should be analyzed as an independent measure of assay performance…

The recommendations for precision are justified as:

“reasonable; intra-individual CVs are very small (<2%) and many current assay methods can achieve CVs <3%...”

Interestingly, there is no evidence (references) to justify the figure “<2%” for intra-individual variability. In fact, references in the literature provide figures of 4.1% [3] to 5.6% [4] for the within-subject biologic variation.

Contrary to the NACB evidence-based guidelines, another discussion of evidence-based laboratory medicine places more importance on intra-individual biologic variation and its impact on changes in patient test results [5]:

“The biological variation of HbAlc is ~5%. For an assay with a between-assay imprecision of 3% at a level of 8% the minimum significant change would be ~1.1%.”

These authors applied Fraser’s Reference Change Value and came up with the same figure that we have estimated from the CAP proficiency testing data. They also advise that the method CV should be set on the basis of the intra-individual biologic variation, as follows:

“A long-standing rule of thumb is that for a test to be clinically useful the analytic CV should be no more than 50% of the biological CV.”

That would set the precision specification as 2.0% to 2.8 %, somewhat tighter than recommended in the NACB guidelines. If the same rule of thumb were applied to the 2% figure for intra-individual variation quoted by NACB, then the NACB method CV should be set as 1%. Clearly there are differences in the perspectives on test quality and analytical performance even when both claim to be “evidence-based.” And neither provides any objective assessment of the QC needed to verify the attainment of the intended quality of test results.

When QC is taken into account in the design of the testing process, we have shown that a CV of 2% and a bias of 0% are necessary for 2 controls (with 2.5s control limits) to be able to verify the attainment of an intended clinical quality of 14%. [See The Quality of Lab Testing, Part VI, Glycohemoglobin]

Lessons to be learned!

Clinical treatment guidelines may not be well-defined, even in evidence-based guidelines for an important laboratory test such as GHb. The new updated guideline provides less clear guidance than the earlier 2002 guideline concerning what changes in GHb values should lead to changes in treatment.
A change in serial test results for a patient must exceed 1.1 %Hb to be medically significant. Physicians will be tempted to over-manage patients if they believe that changes as small as 0.5 %Hb are medically significant.
Any single GHb test result can be expected to be in error up to 0.7 %Hb (TE) or expected to be uncertain within 0.7 %Hb (MU). If a treatment goal is defined as maintaining a patient at 7.0 %Hb or lower, values as high as 7.7 %Hb may actually represent only the analytical error or uncertainty expected for the average analytical method today.
Analytical performance goals have not been set objectively, but rather represent expert opinion which is commonly expressed as 3% or 5% without any supporting evidence. Likewise, QC requirements lack any specific guidance and tend to default to regulatory minimums, typically running two levels of control per run or per day.
National standardization and correction, according to the NGSP established procedures, works well “on average,” but still leaves many methods with substantial inaccuracies or “residual” biases, as large as 0.6 %Hb.
For today’s worst-case methods, test results may be in error as much as 1.3 to 1.5 %Hb.
For patients being monitored with test results from different methods, it is possible (worst case estimate for subgroups 6 and 12) that differences as large as 2.5 %Hb could be observed due only to analytical errors or measurement uncertainty.

Clearly, improvements are still needed in analytical methods and analytical quality management to keep up with clinical practices and the expectations for test quality today. The National Glycohemoglobin Standardization Program has certainly led to improvements in the quality of GHb testing today, but there is still need for additional improvements if patients are to be more closely monitored and managed, in accordance with the new updated guidelines.

Physicians should be informed that a GHb test may be in error or uncertain up to 0.7 %Hb, on average, and that a difference in patient serial test results must exceed 1.1 %Hb, on average, to be medically significant. Better yet, each laboratory should determine the maximum expected error or measurement uncertainty, as well as the Reference Change Value for its own method, and make that information available to its customers. Until such time when laboratories manage their analytical measurement procedures quantitatively to verify the attainment of the intended clinical quality, laboratories will need to inform their customers of the quality available from their testing processes.

References

The National Academy of Clinical Biochemistry. Laboratory Medicine Practice Guidelines: Guidelines and Recommendations for Laboratory Analysis in the Diagnosis and Management of Diabetes Mellitus: Update. Draft Guidelines, Version 1107. Accessed at AACC website November 27, 2007.
Sachs DB, Bruns DE, Goldstein DE, Maclaren NK, McDonald JM, Parrott M. Guidelines and Recommendations for Laboratory Analysis in the Diagnosis and Mnagement of Diabetes Mellitus. Clin Chem 2002;48:436-472.
Lytken Larsen M, Fraser CG, Hyltoft Petersen P. A comparison of analytical goals for haemoglobin A1c assays derived using different strategies. Ann Clin biochem 1991;28:272-278.
Ricos C, Alverez F, Cava F, et al. Current databases on biological variation: pros, cons and progress. Scand J Clin Lab Invest 1999;59:491-500. See the latest in the Quality Requirements section
Jones R, Cuckle H. Evidence-Based Laboratory medicine: Impact of Analytical Performance on Outcomes. Ch 8 in Evidence-Based Laboratory Medicine: Principles, Practice, and Outcomes, Second Edition.. Price CP, Christenson RH, eds. AACC Press 2007.

James O. Westgard, PhD, is a professor emeritus of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.

Tools, Technologies and Training for Healthcare Laboratories

ISO