Why Not Evidence-Based Method Specifications?

James O. Westgard
A word from
Dr. Westgard
 

March 2002
An updated version of this essay appears in the Nothing but the Truth about Quality manual

James O. Westgard, PhD, FACB

In March 2002, the Clinical Chemistry journal published a 36 page report by the National Academy of Clinical Biochemistry (NACB) on the use of laboratory tests in patients with diabetes [1]. That is probably the longest paper ever published by Clinical Chemistry, which attests to the importance being placed on "evidence-based laboratory medicine and test utilization" today. The March 2002 issue of Clinical Laboratory News also leads with the headline "NACB Issues Guidelines for Lab Testing in Diabetes" and announces an upcoming interactive audioconference on "The New Diabetes Testing Guidelines: Effects on the Laboratory and Clinical Management."

Why so much attention to diabetes and lab tests? Healthcare for diabetes patients is big business, estimated at $98 billion ten years ago! Laboratory tests are critical for managing the treatment of diabetic patients, so this provides a good opportunity to demonstrate the value of laboratory tests. In today's anti-laboratory regulatory and reimbursement environment, the objective management and treatment of one of today's major diseases (and one of the highest long-term costs in healthcare) depends on laboratory tests.

Evidence-based clinical practice recommendations

The NACB report provides evidence-based recommendations for the use of glucose (including glucose meters for patient use), the oral glucose tolerance test, urinary glucose, ketones, glycated hemoglobin, genetic markers, and microalbuminuria. Glycated hemoglobin emerges as the test with the strongest evidence for use in the evaluation of glycemic control, assessment of risk, establishment of treatment goals, and long-term management of diabetic patients. The report also provides recommendations for the performance specifications of laboratory methods.

The purpose of evidence-based medicine is to demonstrate and document best medical practice on the basis of published studies and expert consensus. The "level of evidence" is graded on a scale from A to E, with A representing the highest level of evidence and E the lowest. The levels are described as follows [1]:

A. "Clear evidence from well-conducted, generalizable, randomized controlled trials that are adequately powered…"
B. "Supportive evidence from well-conducted cohort studies…"
C. Supportive evidence from poorly controlled or uncontrolled studies…:
D. [not yet defined]
E. "Expert consensus of clinical experience."

In this context, the evidence used to establish specifications for the precision, accuracy, and QC of laboratory tests appears to be "expert groups", which is the lowest level in the evidence grading system. It would seem appropriate to consider additional scientific approaches to strengthen the recommendations for method performance specifications. I'll illustrate my concerns for the Glycated Hemoglobin test that is discussed in this report.

Guidelines for Glycated Hemoglobin (GHb)

Here's a summary of recommendations from the report [1]:

Evidence-based QC

These guidelines for test interpretation and method performance are specific and quantitative, but those for QC are somewhat vague, e.g., exactly how many control measurements are necessary (2 or 4?) and what control rules should be used (???). The "evidence" that should be considered here includes the quality required for the test, the pre-analytical variability (biologic variability) of the subject, the analytical variability of the method (imprecision and inaccuracy), and the rejection characteristics for QC procedures. The approach and methodology for doing this are well documented on this website, as well as in the clinical chemistry literature [2-6].

Defined medical quality. A medical or clinical quality requirement can be defined from the test interpretation and treatment guidelines. According to the consensus report from a recent conference on "Strategies to Set Global Analytical Quality Specifications in Laboratory Medicine" [7], the highest and preferred approach for establishing quality goals is the "evaluation of the effect of analytical performance on clinical outcomes in specific clinical settings." The specific clinical setting of interest here is a change of GHb from 7.0% to 8.0% that would trigger a change in treatment. The laboratory should assure that a patient with a true value of 7.0% GHb will never receive a test result of 8.0%. All sources of analytical and pre-analytical variation must be controlled so that a value of 7.0% GHb cannot be in error by 1.0% GHb, i.e., the clinical decision interval from 7% to 8% GHb defines a quality requirement of 14.3% (1.0/7).

Known biologic variability. Pre-analytical variability exists in the form of within-subject biological variability, which has been estimated to be from 4.1% [8] to 5.8% [9]. This is not mentioned in the NACB report, even though there is a section that addresses pre-analytical patient variables. Within-subject variability is an important factor because it is at least as large as the expected analytical variability. A patient with a true set point of 7.0% GHb may have test values as high as high as 7.6% (7.0% GHb + 2*0.041*7) to 7.8% (7.0 GHb + 2*0.058*7.0) due to biologic variability alone. There is little room for analytical variability if the overall error is to be controlled within 1.0% GHb.

Specified analytical variability. The NACP guidelines specify a desirable CV of 3.0% and a maximum allowable CV of 5.0%. Bias is assumed to be minimal if the proper methods are used and those methods are properly standardized. This assumption is undoubtedly idealist for the state of the art of current GHb methods, particularly for a practice guideline that will be applied across many states, many laboratories, and many methods. A true value of 7.0% GHb will undoubtedly be measured with some amount of systematic error, or bias, in many laboratories.

Necessary QC. Appropriate control rules and numbers of control measurements can be selected using charts of operating specifications [5] based on a clinical quality-planning model described in the literature [2] and supported by the QC Validator computer technology [10,11]. Optimal control rules and numbers of control measurements can be selected automatically by the QC Validator® 2.0 and the EZ Rules™ programs. For the GHb application, the input parameters are as follows:

Note that these parameters reflect the best or most optimistic conditions, i.e., the smallest known value for the biologic variability, the smallest specification for analytical variation, and no analytical bias. The resulting QC recommendation is a multi-rule procedure (13s/22s/R4s/41s) with 4 control measurements, which will provide approximately 80% detection of medically important systematic errors while maintaining only 3% false rejections. Click here to see how the EZ Rules™ computer program is used to select an appropriate QC procedure on the basis of the GHb specifications.

Even a multirule procedure with 4 control measurements per run is not quite ideal, as 90% error detection is generally preferred. Therefore, it would be better to use a multirule procedure with a total of 6 control measurements per run - either 2 control materials and 3 replicates per material per run, or 3 control materials and 2 replicates per material per run. It is unlikely that any laboratory will do this since neither the NACB or CLIA QC guidelines recommend this much QC. Given the vagueness of the NACB recommendation, many laboratories will choose to comply with the CLIA minimum of 2 controls materials per run, probably analyzing one control material at the beginning of the run and the other at the end of the run.

Evidence-based method performance specifications

The NACB recommendations for method performance (3.0% CV, bias 0.0%) correspond to a 3.6 sigma process [12]. Looking at the Sigma Table, that indicates 17,584 defects per million. A 5.0% CV would provide only a 2.4 sigma process, or 184,060 defects per million. According to industrial guidelines, the minimum acceptable performance of a production process is 3.0 sigma, whereas ideal performance is 5.0 to 6.0 sigma. A 3.6 sigma process is okay for production if QC is maximized, in this case, a multirule procedure with a total from 4 to 6 control measurements per run should be used.

Obviously, there are problems with the way method specifications are being set today, as exemplified with the GHb test.

Method performance specifications can be properly established by use of a chart of operating specifications that shows the relationship between method precision, accuracy, and quality control, as shown here.

OPSpecs chart

This chart represents a clinical quality requirement of 14.3%, as stated in the title at the top of the chart. The title also indicates 90% AQA(SE), which means this chart represents 90% Analytical Quality Assurance for Systematic Error, i.e., there will be at least a 90% chance of detecting medically important systematic errors. The y-axis shows the allowable inaccuracy, or biasmeas in units of %. The x-axis shows the allowable imprecision, or smeas in units of %. The different lines show the maximum allowable limits for imprecision and inaccuracy for the different QC procedures shown in the key at the right. The lines, top to bottom, correspond to the QC procedures as listed top to bottom in the key. For example, the bold line 3rd from the bottom corresponds to a 13s rule with 4 control measurements per run, which is 3rd from the bottom in the key.

The x-intercepts of these lines define the maximum allowable imprecision when bias is 0.0%. For QC procedures with a total of 2 measurements per run (the bottom two lines), the desirable method CV is 1.9% to 2.2%. For QC procedure with a total of 4 measurements per run, CVs from 2.2% to 2.7% are allowable. For multirule procedures with N=6, the maximum allowable CV is 3.1%.

As with lipid tests, strategies for making do with current method performance include making replicate test measurements to reduce method variation and averaging multiple test results to reduce biologic variation. Further analysis with OPSpecs charts shows that method CV's from 3 to 4% are allowable if duplicate measurements are made. A method CV of 5% is tolerable only if two test results are averaged, each test result is based on duplicate measurements, and QC involves a multirule procedure with 2 materials analyzed at both the beginning and end of the run. Click here to see how the QC Validator® 2.0 computer program was used to prepare the OPSpecs chart used in this assessment of performance specifications.

Why not include scientific methodology in evidence-based medicine?

Attempts to improve the use and interpretation of laboratory tests will fail if the methods of analysis aren't good enough for the intended applications. Method performance specifications and QC must be sufficient to guarantee the analytical quality of the test results. Otherwise, physicians and patients will find that the practice guidelines don't always work. Once that is discovered, there will further criticism about the lack of value of laboratory tests.

Laboratory scientists must define quality requirements for tests in a quantitative way in order to manage the production of test results in a quantitative manner. The "art of laboratory medicine" must give way to the science of laboratory measurements, just like the art of medicine must give way to the scientific management of diseases. Evidence-based laboratory medicine must address both of these issues! We can no longer just assume that laboratory test results will have satisfactory quality. We must manage the quality of laboratory tests to assure they are medically useful. This requires a rational, factual, and scientific basis for setting method performance specifications, including quality control.

This concern about performance specifications goes back to earlier recommendations by the NCEP for lipid tests, which are similarly inadequate [2,13]. While the NCEP specifications have been defended on statistical grounds [14], the application of Six Sigma concepts and principles make it clear that more stringent method specifications are needed. Laboratory tests that are to be uniformly applied on a national basis must meet a higher standard of performance - 5 to 6 sigma performance, rather than the 2 to 3 sigma represented by current specifications.

References

  1. Sacks DB, Bruns DE, Goldstein DE, Maclaren NK, McDonald JM, Parrott M. Guidelines and recommendations for laboratory analysis in the diagnosis and management of diabetes mellitus. Clin Chem 2002;48:436-472.
  2. Westgard JO, Hyltoft Petersen P, Wiebe DA. Laboratory process specifications for assuring the quality in the US National Cholesterol Education Program. Clin Chem 1991;37:656-661.
  3. Westgard JO, Wiebe DA. Cholesterol operational process specifications for assuring the quality required by CLIA proficiency testing. Clin Chem 1991;37:1938-44.
  4. Westgard JO. Analytical quality assurance through process planning and quality control. Arch Pathol Lab Med 1992;116:765-769.
  5. Westgard JO. Charts of operating specifications (OPSpecs Charts) for assessing the precision, accuracy, and quality control needed to satisfy proficiency testing criteria. Clin Chem 1992;38:1226-1233.
  6. Mugan K, Carlson IH, Westgard JO. Planning QC procedures for immunoassays. J Clin Immunoassay 1994;17:216-22.
  7. Kenny D, Fraser CG, Hyltoft Petersen P, Kallner A. Consensus agreement. Scand J Clin Lab Invest 1999;59:585.
  8. Lytken Larsen M, Fraser CG, Hyltoft Petersen P. A comparison of analytical goals for haemoglobin Alc assays derived using different strategies. Ann Clin Biochem 1991;28:272-278.
  9. Ricos C, Alvarez V, Cava F, Garcia-Lario JV, Hernandez A, Jimenez CV, Minchinela J, Perich C, Simon M. Current databases on biological variation: procs, cons and progress. Scand J Clin Lab Invest 1999;59:491-500.
  10. Westgard JO, Stein B, Westgard SA, Kennedy R. QC Validator 2.0: a computer program for automatic selection of statistical QC procedures for applications in healthcare laboratories. Computer Method Programs Biomed 1997;53:175-186.
  11. Westgard JO, Stein B. Automated selection of statistical quality-control procedures to assure meeting clinical or analytical quality requirements. Clin Chem 1997;43:400-403.
  12. Westgard JO. Six Sigma Quality Design and Control: Desirable Precision and Requisite QC for Laboratory Measurement Processes. Madison, WI:Westgard QC, Inc., 2001.
  13. Westgard JO, Wiebe DA. Adequacy of NCEP recommendations for total cholesterol, triglycerides, HDLC, and LDLC measurements. Clin Chem 1998;44:1064-1066.
  14. Caudill SP, Cooper GR, Smith SJ, Myers GL. Assessment of current National Cholesterol Education Program guidelines for total cholesterol, triglyceride, HDL-cholesterol, and LDL-cholesterol measurements. Clin Chem 1998;44:1650-1658.

James O. Westgard, PhD, is a professor of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.

Other Essays:

Copyright © 2002. All rights reserved.
Westgard QC, 7614 Gray Fox Trail, Madison WI 53717
Call 608-833-4718 or e-mail us at westgard@westgard.com

A Message from JOW
QC Lessons | QC Applications | Questions | Multirule
CLIA Requirements | What's New?| Catalog | Demo Download
Home  | Glossary | ARCHIVES | Links | Feedback