Is it happening with laboratory error budgets? |
|
July 2002
An updated version of this essay appears in the Nothing but the Truth about Quality manualJames O. Westgard, PhD, FACB
The year 2002 will probably be remembered for the scandalous and unethical conduct of American businesses. After many years of government deregulation, there is mounting evidence that business - left on its own - will do what is necessary to make money, with little regard for right and wrong and rules and regulations. Enron, Waste Management, Arthur Andersen, Tyco International, Adelphia Communications, Computer Associates, Global Crossing, WorldCom - there is a growing list of companies being investigated for misstating their earnings. Others, such as Xerox and Qwest Communications, have restated their earnings in the hope of avoiding investigation. Insider trading and other unethical practices expand the list further - Merrill Lynch, Genentech, Imclone, even Martha Stewart. HMO's are even being investigated for racketeering.
So much for a kinder and gentler government that depends on accountability from business. Business has not been accountable. In fact, accounting has been the big problem! Businesses have been "cooking the books" by hiding expenses and enhancing earnings. Here's what Business Week had to say in an editorial in its July 8, 2002 issue [1]:
"The June 25 admission by telecom giant WorldCom Inc. that it had cooked its books to the tune of at least $4 billion could not have come at a worse time for the economy and for the stock market. Even before the WorldCom news came out, the S&P 500-stock index had already fallen by 15% this year. The dollar was declining against the yen and the euro, consumer confidence was weakening, and fears were building that the recovery would be weaker than expected."
The public can no longer trust corporate financial statements! The net result is that the market is becoming unstable, small investors are retreating, foreign investors are withdrawing, the American dollar is falling, and foreign governments are questioning whether the American enterprise system is really the model for guiding their own development. The government's "hands off" attitude that began in the 80s has culminated in businesses now being "out-of-control."
There has also been a softening of government enforcement of laboratory regulations over the last decade. The fact that the CLIA-88 regulations have not yet been finalized (as of July 2002), that FDA clearance of the manufacturer's recommended QC has never been implemented, that electronic QC has been approved as a substitute for real QC, that home-testing products have been approved for tests that can't be adequately managed in the laboratory - these are but a few examples of our kinder and gentler government allowing the business sector more flexibility and less accountability.
But, business is not alone! We, as laboratory professionals, share the blame. We also have been cooking the books, avoiding the reality of what is necessary to implement reliable testing processes, sometimes misstating or misinterpreting the "error budgets" that are used to manage the quality of laboratory tests. We have been willing to accept the CLIA's minimum QC requirements as maximums in order to reduce costs. We have been willing to accept manufacturer's QC recommendations that are clearly inadequate in order to keep things simple (such as electronic QC for near patient testing). We have been less than reliable in developing recommendations and guidelines for laboratory testing that protect the public from bad results.
Manufacturers and laboratory professionals have welcomed the new testing guidelines from the American Diabetes Association (ADA) and U. S. Health and Human Services (HSS). These guidelines affirm the importance of laboratory measurements in the diagnosis and management of diabetes - one of the most prevalent and costly chronic diseases in society today.
In the June 2002 issue of Clinical Laboratory News, the headline heralds a "A New Attack on Diabetes Epidemic - Lab results the primary tool in determining 'Pre-diabetes'"[2]. Behind this story is our professional confidence that laboratories are capable of supporting more demanding interpretative guidelines. For glucose, for example, the new guideline tightens the decision interval between a normal fasting plasma glucose (FPG) and the diagnostic cutoff for diabetes.
"Normal fasting blood [sic - plasma] glucose is below 110 mg/dL, while individuals with FPG results in the 110-125 mg/dL range are considered to have impaired fasting glucose. An FPG > 126 mg/dL indicates hyperglycemia which, if confirmed, establishes a diagnosis of diabetes."
The impact of this change has not been carefully evaluated by laboratory professionals. For example, have the following issues been addressed?
- What quality is required for a method based on this guideline for the medical interpretation of a glucose test?
- What factors contribute to the variability of a patient test result and the reliability of the diagnostic classification?
- What precision, accuracy, and QC are necessary to correctly classify patients?
- Are current methods and current QC practices able to provide the necessary performance?
The public assumes that all of these questions have been carefully investigated and that the quality of glucose testing will assure proper classification of patients.
In this application, the amount of error that would cause a change in the classification and treatment of a patient is the difference from 110 mg/dL to 126 mg/dL. This "gray zone" or "decision interval" can be expressed as a clinical quality requirement of 14.5% (16/110), i.e., a glucose test result needs to be correct within 14.5% at the medical decision level of 110 mg/dL.
In the terminology on this website, we refer to this type of quality requirement as a "clinical decision interval" and use the abbreviation Dint. The advantage of a clinical decision interval requirement is that it truly represents the medical usefulness of a test because it is taken directly from the test interpretation guidelines. This type of quality requirement is more complicated than the more common allowable total error (such as defined by CLIA proficiency testing criteria for acceptability). A clinical decision interval expands the total error to include pre-analytical factors, as well as the usual analytical factors.
One important pre-analytical factor is the within-subject biologic variation of the patient. Important analytical factors include the imprecision (CV) and inaccuracy (bias) of the method and the sensitivity of the QC procedure. These four factors must be considered in an "error budget" to manage analytical quality objectively. Note that there is little discussion of any of these factors in the article in Clinical Laboratory News. Instead, there is an assumption that glucose methods are good enough for this application.
Here's where we have gone "off the books" and are hiding important information! Within-subject biologic variation for glucose is known to be 6.5%, as documented in Fraser's recent book published by AACC Press [3]. A subject whose stable homeostatic glucose set point is 110 mg/dL can vary as much as plus/minus 14 mg/dL due to biologic variation alone. A value of 124 mg/dL could be the upper 95% confidence limit for a non-diabetic patient! It would take only a little additional analytical error to cause this patient to be misclassified. Even though this information is commonly known and well documented in the scientific literature, it's not clear that it has been adequately considered in developing the new ADA/HHS interpretation guidelines. If it's not considered, then we are hiding a major component of the error budget and not including it on our books.
The basic specifications needed to operate a testing process are precision, accuracy, and quality control. These specifications can be determined once the requirement for quality is known. Given a clinical quality requirement of 14.5% (16/110) at a decision level of 110 mg/dL and knowing that the within-subject biologic variation is 6.5%, a chart of operating specifications can be prepared, as shown here:
This chart shows the allowable bias on the y-axis and the allowable CV on the x-axis for different QC procedures. The performance of a method is shown by an operating point which depends on the method's CV (x-coordinate) and bias (y-coordinate). This operating point should fall below the operating limits for the QC procedure being used, i.e., the lines shown for the different control rules and numbers of control measurements as identified in the key at the right side of the chart.
One set of specifications that works is a bias of 0.0%, a CV of 1.0%, and 2 control measurements on a Levey-Jennings chart having control limits set at 3s. These conditions are identified by the big green operating point on the chart and the QC line immediately above it. If bias were 1.0%, then the operating point would shift upward and it would be necessary to use a Levey-Jennings chart having 2s control limits, but this would give a high false rejection rate (probability of 0.09 or 9% chance of false rejection). To maintain good QC when there are only 2 control measurements per run, it would be best to specify a maximum bias of 0.7% and a CV of 1.0% or a maximum CV of 1.1% if bias were zero. These specifications would be in compliance with the minimal QC practices in the CLIA regulations.
Excellent laboratories can achieve a CV of 1.0% and a bias of 0.0%, but there's little evidence that such performance is generally available. Proficiency testing surveys and peer-comparison programs typically show CVs of 3% to 5% for the total group of methods surveyed and 2.0% for individual methods that are carefully sub-grouped. For example, the New York State Department of Health survey data for the February 2002 proficiency testing event shows the following results for a specimen whose glucose concentration is in the critical decision interval [4]:
- a group method CV of 3.77% (specimen C15, SD of 4.62, mean of 122.5, 433 labs);
- for the two largest subgroups, the CVs are 1.95% (SD=2.37, mean=121.0, n=97) and 2.17% (SD=2.76, mean=127.1, n=93).
The performance of the ten largest method groups are shown by the red operating points (A to J) on the OPSpecs chart below:
Remember that desirable performance is the area below the solid line, i.e., methods whose operating points (y-coordinate=bias, x=coordinate=CV) are below that line can be controlled to assure the desired quality is achieved. Note that a method CV of 1.5% and bias of 0.0% (green point) would be necessary to have a controllable method when 4 control measurements are analyzed per run. None of the operating points for the ten method groups are even close to the necessary performance. Even if a bias of zero is assumed, only method group B would provide the performance needed to assure the quality required by the ADA/HSS guideline. But, this is only possible when more stringent quality control is applied than provided in most laboratories today (N=4 instead of 2).
I submit that there is sufficient scientific evidence to conclude that current glucose testing in the US cannot deliver the quality needed to apply the new ADA/HHS guidelines. These new practice guidelines do not account for the:
- known within-subject biological variation (6.5%),
- known precision of methods (typically a CV of 2.0%),
- known biases between methods (as large as 8% in the New York survey), and
- known in-sensitivity of the minimal QC practices that are in compliance with government regulations (minimum of 2 control measurements per run).
All this information is available, but somehow is not included in the medical assessment of these new ADA/HHS test guidelines. By keeping this information off the books, the public is being misled. Our good intentions in improving patient care will in fact lead to abuses. Patients who are subjected to this testing will need to be re-tested and will be exposed to additional testing. If it is billable, physicians will order it. If it is billable, the healthcare system will offer the service. If it is billable, patients will be subjected to it.
Will we be seen as ignorant, incompetent, or irresponsible when the public realizes that glucose testing is not reliable? Ignorance is no excuse! The information is there in the scientific literature. Incompetence is unpalatable, but it must be part of the reason. Irresponsible - there's no escape! We are expected to mind the gates of quality and protect our patients from harm. But we are not being accountable - we've been cooking our own books!
Finally, I just want to share with you an email I got recently:
"I have a co-worker who uses a shortcut to setting means and sd ranges. She takes a LJ graph with anywhere from 20 to 100 points, finds the lowest value and the highest value, divides the difference by 2 for the mean, divides again by 2 for 1sd....My question is: is my co-workers method valid? Will it provide reliable limits?"
This is a stark example of the cooking the books at the most basic level of QC in our laboratories. Computing an average isn't that difficult (anyone in school with a grade point average knows how to do it), but some laboratory professionals are still taking shortcuts. As a profession, we in the laboratory need to take responsibility for what we do. If we don't, someone else (inspectors, regulators, eventually the lawyers...) are going to do it instead, and wreak havoc when they do.
- This economy can survive the scandals. Editorial. Business Week, July 9, 2002, page 118.
- Sainato D. A new attack on the diabetes epidemic. Clinical Laboratory News 2002;28(June):1-5.
- Fraser CG. Biological Variation: From principles to practice. Washington DC:AACC Press, 2001, page 134.
- New York Department of Health website: http://www.wadsworth.org/chemheme/chem/gencc/ptframes.htm
Other Essays:
