Tools, Technologies and Training for Healthcare Laboratories

Quality of HbA1c, 2011

In 2011, Clinical Chemistry published an update on the status of hemoglobin A1c measurement and goals for improvement.Update on HbA1c quality goals and performance requirements. NGSP and CAP have been actively tightening standards in an effort to improve method performance. So, in light of these efforts, has there been improvement? If so, how much? And how much further do methods need to improve, given the clinical use and demands on the test?

Update on HbA1c quality goals and performance requirements

James O. Westgard, PhD
February 2011

The February 2011 issue of Clinical Chemistry is devoted to diabetes and includes a review of the “Status of hemoglobin A1c measurement and goals for improvement” by authors from the NGSP steering committee [1].  This review provides a very good discussion of the efforts to improve HbA1c testing since 1996 when the National Glycohemoglobin Standardization Program (NGSP) was established.  It describes the NGSP network, its relationship to the IFCC network, and the traceability of the US NGSP reference method to the IFCC higher order method.  It reviews the clinical use of HbA1c for diagnosis of diabetes and adds new guidelines on the clinical use for monitoring treatment.  Our interest here is to provide an update on the quality goals and performance requirements for certification, CAP proficiency testing, and clinical use of the HbA1c test.

NGSP certification criteria

The process for NGSP certification of manufacturers and laboratories is summarized in this review, along with the data analysis which is described as a “Bland-Altman assessment of agreement” with an estimate of a 95% confidence interval of the differences between the certifying test method and the reference method.  This estimate is the Total Error calculated from  the average bias observed between test and reference results and the standard deviation of the differences of the paired results (TE = bias ± 1.96*SDdiff), which must be less than ± 0.75 %Hb for the test method to be certified. 

In our 2010 Real World Episodes about the performance of HbA1c Point-of-Care methods, we identified the NGSP certification requirement as ±0.85 %Hb in 2007 and being tightened to ±0.75 %Hb in 2010, which remains the current requirement.  However, here’s an interesting note from the review:  “Clearly the NGSP limits will need to be tightened to match the 2011 CAP limit of 7%.”

CAP PT criteria

The CAP proficiency testing criterion had been 15% in 2007 and was tightened to 12% in 2008, 10% in 2009, and 8% in 2010.  At the time of our 2010 Real World discussions, it was projected that the CAP criterion would be further tightened to 6% in 2011.  One of the important updates is that the CAP criterion has since been changed to 7%, rather than 6%.  The need to further tighten the NGSP limits can be understood by considering the size of the errors that are allowable at the critical decision levels of 6.5 %Hb and 7.0 %Hb, which represent the diagnostic cutoff and the treatment goal, resp.  The CAP allowable error of 7.0% at a concentration of 7.0 %Hb would be 0.49 %Hb, whereas the NGSP criterion is 0.75%, which is considerably larger.  In effect, the CAP criterion for proficiency testing is more demanding than the NGSP certification requirement.  If laboratories are to be able to utilize all of the NGSP certified methods, some will likely fail proficiency testing, even though they are certified.

Intended clinical use criteria

The diagnostic cutoff is 6.5 %Hb, as established in 2010.  The treatment goal is 7.0 %Hb, i.e., it is desirable to manage patients to maintain their HbA1c at 7.0 %Hb or lower.  Earlier ADA guidelines recommended that treatment be re-evaluated for any patient whose HbA1c was 8.0 or higher.  That recommendation has now been changed to 7.5 %Hb:

“…many physicians have suggested that 0.5% HbA1c is a ‘clinically significant change.’  Importantly, treatment guidelines and algorithms from the ADA/EASD and National Institute for Clinical Excellence in the UK recommend evaluating new treatment regimens in terms of whether HbA1c is lowered by 0.5 percentage points or more.  Therefore it is important to be sure that a change of this magnitude is statistically significant and not due to analytical variation."

Observe that a change of 0.5 %Hb at a level of 7.0 %Hb is 7.1%, which is equivalent to the new CAP criterion of 7.0%.  We are finally starting to see some consistency between the PT criterion and the quality required for clinical use.  However, the NGSP certification criteria are not consistent, as shown in the error grid below.  (See the earlier 2010 HbA1c Part VII for a discussion of error grids and their use for comparing different quality goals and requirements.)


In this error grid, the dotted line shows the ideal y=x equality.  The solid lines show the clinical criteria for diagnosis and monitoring treatment.  The large-dashed lines show the NGSP certification criteria and the short-dashed lines show the CAP PT criteria.  Ideally, the certification criteria should be more demanding than the PT and clinical criteria, whereas the comparison here shows that the NGSP criteria are the least demanding at the critical levels for diagnosis and treatment.  “Clearly the NGSP limits will need to be tightened to match the 2011 CAP limit of 7%.”

Performance goals for precision and bias.

Also, for the first time, we are starting to see a rational approach developing for recommendations about the precision and bias required for analytical testing methods. There are two scenarios to be considered, the 1st involving consecutive measures to evaluate a change in a patient’s test values, the 2nd assessing the significance of a patient value vs a target value of 7.0 %Hb.  For the 1st scenario, here is the analysis, as discussed in the review paper:

"Taking a statistically significant difference of 0.5 %HbA1c and an HbA1c concentration of 7% as the goal for HbA1c, one can use the reference change value (RCV), also called critical difference to calculate an appropriate goal in terms of a method’s CV. For sequential results to be significantly different, the numbers must differ by more than the combined variation inherent in the 2 results:

RCV(%) =2(1/2) x 1.96 x [(CVA)2 + (CVI)2](1/2)

Where CVA is the analytical CV of the method (within-laboratory CV) and CVI is the within-subject biological variation. 

For HbA1c the CVI is low, <1% when estimated in individuals without diabetes.  If the analytical CV of the HbA1c method is 2% (feasible for many commercially available HPLC systems), then the RCV (95% probability) is <0.5 %HbA1c…"

So, for the 1st scenario, the CV that is required within a laboratory and within a single method should be 2.0% or lower in order to measure clinically important changes.  Note that in this clinical application, method bias is not a consideration because it would be canceled out in the difference between two consecutive measurements.

However, in the 2nd scenario, comparison to a target value of 7.0, method bias does need to be considered.  Here’s the analysis from the paper:

"In the situation in which a physician wants to look at the difference between a patient result and a goal of 7% HbA1c, both the bias and variability (%CV), in other words the total error, of the method must be taken into account."

For example, if a method has 0.0 bias, a CV of 3.5% is required to have 95% confidence that the HbA1c result for a patient with a ‘true’ result of 7% will read between 6.5 and 7.5% (±7%).  If there is a bias of 0.2 %HbA1c, the CV requirement would tighten to 2.3%...

In this 2nd scenario, the CV of the method can be larger if the bias is zero.  If bias is as large as 0.2 %Hb, then essentially the same CV is needed as in the 1st scenario, approximately 2%.

The good, the bad, and the ugly!

The good. The efforts of the NGSP certification program have certainly driven improvements in HbA1c testing.  Given the NGSP criteria for agreement within 0.75 %Hb, this corresponds to a quality requirement of 10.7% at the treatment goal of 7.0 %Hb.  Most methods now provide test results that are correct within 10%, as documented by CAP PT surveys.

The sigma performance of method subgroups can be determined from PT surveys [4], such as the CAP PT data available on the NGSP website.  For example, the CAP 2010 survey for sample GH2-06 had an NGSP reference value of 6.3 %Hb.  Over 2600 laboratory participated using methods that represent 30 different subgroups.  Subgroup biases (vs target value of 6.3) and CVs are shown in the accompanying sigma-metric chart, where the observed inaccuracy (%Bias) is plotted on the y-axis vs the observed imprecision (%CV) on the x-axis for each of the method subgroups. 


For a quality requirement of 10.0% (which is appropriately equivalent to the NGSP requirement in the critical concentration range for diagnosis and treatment), the diagonal lines represent (from top to bottom) 2-Sigma, 3-Sigma, 4-Sigma, 5-Sigma, and 6-Sigma quality.  Observe that nearly all but a couple method subgroups perform at the 2-Sigma level or better.  Approximately a third provide -Sigma quality or better, and a few are better than 4-Sigma and one is better than 5-Sigma.

The bad. NGSP certification requires that methods perform at only the 1.96-sigma level (TE = bias ± 1.96*SDdiff), meaning that even when they are working properly, they still generate a significant level of defective test results.  In industry, 3-sigma performance is considered to be the minimum for deployment of a production process.  In medical laboratories, it can be shown that 4-sigma performance is needed if the laboratory is to provide effective QC for a method and be able to detect medically important errors.  Not only do the NGSP tolerance limits of ±0.75 %Hb need to be tightened, but a higher portion of test results should be required to fall within the tolerance limits to drive methods towards higher performance on the sigma-scale.  That means that the multiplier in the NGSP calculations should be changed to a minimum of 3 and at some time in the future to a value of 4 to assure quality can be reliably controlled in laboratories.

The ugly. As the quality of testing has improved, physicians have also changed how they use and interpret test results, as evidenced by the current guidelines that changes of 0.5 %Hb are clinically important and should be treated.  This clinical change corresponds to approximately 7.0% at the treatment goal of 7.0 %Hb.  For a CAP PT criterion of 7.0% for acceptable performance, the 3 diagonal lines represent (from top to bottom) 2-Sigma, 3-Sigma, and 4-Sigma quality.  Only 1 method subgroup performs at better than 3? quality, 10 method subgroups provide quality between 2-Sigma and 3-Sigma, and the remaining method subgroups do not even achieve 2-Sigma quality.  Keep in mind that all these are NGSP certified methods, but the certification requirement corresponds to about a 10% TE, rather than the current CAP requirement of 7%.  


Though the review article claims that it is possible for some methods today to achieve the desired clinical performance to reliably detect a change of 0.5 %Hb, physicians should be cautioned that this requires that the same method in the same laboratory be used to monitor patients over time.

Furthermore, the RCV calculation in the review article assumed a very low CV of 1% for within subject variability [2], though other references recommend higher values, e.g., 4.1% [3].  It should be noted that the NGSP study on within-subject biological variation involved only healthy males with normal HbA1c values and all samples were obtained after a minimum 8 hour overnight fast.  Such conditions are likely ideal and may not provide a realistic estimate of within-subject variability in diabetic patients under typical sampling conditions.

Thus, it is possible under near ideal conditions that a change in patient values of 0.5 %Hb could be measured as clinically significant with some methods (having analytical CVs of 2% or less, within laboratory, within method precision).  However, the use of two different analytical methods would not likely provide acceptable performance because of between-method biases.  In the real world, where patients will be tested in different laboratories within a healthcare organization, or in laboratories in different healthcare organizations, a change of 0.5 %Hb can be not reliably be attributed to a change in a patient.

What’s the point?

The alignment of goals for precision, accuracy, total error, and intended clinical use requires a clear understanding of the relationships between different types of requirements for quality and performance.  NGSP has made great progress in improving this alignment, but must still tighten the certification requirements to match the CAP PT requirement.  Once that is accomplished, methods will be certified to the same quality required for proficiency testing and the intended clinical monitoring of changes of 0.5 %Hb.  Methods require CVs of 2.0% if bias is up to 1.0%  to perform at the 3 sigma level.  To perform at the 4 sigma level, CVs of 1.5% are required if bias is up to 1.0%.  CVs will need to approach 1.0% with biases near zero to achieve 5 to 6 sigma performance.

Significant improvements have been made, but additional improvements will be necessary to keep up with the increasing clinical demands for HbA1c testing. Both the tolerance limits and the percentage of patients within those limits (which relates to quality on the sigma scale) must be properly specified to provide reliable testing that meets the desired clinical quality.


  1. Little RR, Rohlfing CL, Sacks DB.  Status of hemoglobin A1c measurement and goals for improvement: From chaos to order for improving diabetes care. Clin Chem 2011;57:205-214.
  2. Rolfing C, Wiedmeyer HM, Little R, Grotz VI, Tennill A, England J, et al. Biological variation of glycohemoglobin.  Clin Chem 2002;48:1116-8.
  3. Larsen ML, Fraser CG, Petersen PH. A comparison of analytical goals for Haemoglobin A1c assays derived using different strategies.  Ann Clin Biochem. 1991;38:272-278.
  4. Westgard JO, Westgard SA. The quality of laboratory testing today: An assessment of ? metrics for analytic quality using performance data from proficiency testing surveys and the CLIA criteria for acceptable performance. Am J Clin Pathol 2006;125:343-354.