Good Data Wanted - Bad Data Should Not Apply (The STARD Initiative)

James O. Westgard A word from
Dr. Westgard
 

January 2003
An updated version of this essay appears in the Nothing but the Truth about Quality manual

James O. Westgard, PhD, FACB

Some people tire of my preaching about the need for correct test results and the importance of quality control. However, everyone accepts that good data is a prerequisite to good management. But somehow, many people think they will know it when they see it, i.e., they can tell bad data from good data.

What does "good data" look like?

What is it about bad data that makes it look good? What gives bad data the appearance that we can trust the results and conclusions based on it? Basically, it has the appearance of exactness because it has been printed or published in some form. Just printing a test result on a computer report makes the number look reliable. Somehow the act of being published, printed, and/or cited enough times, a kind of magic takes place: whatever its origins, the bad data transmutes into good data. I've cited some examples in earlier essays about such magic acts:

STARD initiative

Other papers in the January 2003 issue of Clinical Chemistry bring new recognition of the problem with published numbers. There are three papers [3-5] on "Standards for Reporting Diagnostic Accuracy, " or STARD as it is called. A total of 22 pages are devoted to STARD, which indicates this must be a pretty important subject.

The STARD initiative is an attempt to provide more uniform guidelines for publication of studies on "diagnostic accuracy" in scientific journals. Diagnostic accuracy has to do with the initial validation of the medical usefulness of new diagnostic technology, including new laboratory tests. This phase of validation involves the testing of patients who are known to have a certain disease, as well as patients who are known NOT to have that disease, in order to assess the ability of the test to classify patients correctly. Typically such studies report the diagnostic sensitivity and specificity, which can be presented graphically in the form of receiver operator characteristics (ROC). The results of these studies then become the basis of further "data mining" and "meta-analysis" publications, which pool the data from previously published studies, giving higher numbers of test subjects and generally better and better numbers for diagnostic performance.

According to David Bruns, editor of Clinical Chemistry, STARD has been endorsed by several other major journals and is published in BMJ, Annals of Internal Medicine,Radiology and AJCP, among others, with more on the way. There is also a recent editorial in JAMA endorsing STARD, available at http://jama.ama-assn.org/issues/v289n1/ffull/jed20079.html

One ominous conclusion you can reach from this initiative: there must be an awful lot of bad data out there if a group of journals finds it necessary to cooperatively issue these standards!

STARD checklist

The first article in the series of three papers is a literature survey of protocols and recommendations on diagnostic research methodology [3]. The results were summarized as follows:

"The search of the published guidelines on diagnostic research yielded 33 previously published checklists, from which we extracted a list of 75 potential items. The consensus meeting shortened the list to 25 items, using evidence on bias whenever available…"

The second article details the list of the 25 items in the checklist [4]. Of interest here are some of the items that have to do with the reliability of the measurement data:

"Item 8. Describe the technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standards.

"Item 9. Describe the definition of and rationale for the units, cutoffs and/or categories of the results of the index tests and the reference standard.

"Item 10. Describe the number, training and expertise of the persons executing and reading the index tests and the reference standards.

"Item 11. Describe whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other test and describe any other clinical information available to the readers.

"Item 12. Describe methods for calculating or comparing measures of diagnostic accuracy, and the statistical methods used to quantify uncertainty (e.g., 95% confidence intervals).

"Item 13. Describe methods for calculating test reproducibility, if done." (emphasis added here)

The only item that deals directly with the analytical performance of the test is item 13, which addresses "Reproducibility, if done". The only recommendation seems to be that reproducibility, if done, should be reported, though later on there is some discussion that this estimate of reproducibility, if done, should be useful to "show the reader the range of likely values around an estimate of diagnostic accuracy."

No QC required!

If estimates of reproducibility are optional, clearly there is no requirement for any statistical quality control of the measurement process. It follows that there is still no requirement for good data in studies on the diagnostic accuracy of laboratory tests.

Of course, the assumption is that any research investigator and any research laboratory will provide good data. But there is no proof of that assumption and the STARD initiative itself seems to be evidence to the contrary. No one really knows the quality of data from research labs because they seldom do any quality control or peer comparison studies. They seem to think that performance data aren't needed because the people can tell if the numbers are good or bad! If they believe that, I would expect they probably also believe in Santa Claus, the Easter Bunny, and the Tooth Fairy. The alternate hypothesis is that they have little understanding and training in the scientific aspects of making measurements.

Validation requires statistical control!

The first phase of validation of a measurement process is to establish its predictability. That is sometimes called establishment of a "state of statistical control." According to Wernimont [6], "a measurement process may be said to be in a state of statistical control if the significant causes of variation have been removed or corrected for, so that a finite set of n measurement from the process can be used to (a) predict limits of variation for the n measurements and (b) assign a level of confidence that future measurements will likely be within these limits." Eisenhart [7] emphasizes the importance of debugging the measurement process until it becomes predictable, and states that until it becomes predictable "it cannot be regarded in any logical sense as measuring anything at all." My point here is that there is a good chance of bad data with early versions of measurement technology, particularly in the hands of analysts who don't perform any QC.

Few people these days have heard of Wernimont and Eisenhart, even though these respected scientists rank up there with Shewhart and Youden and Deming for their contributions to analytical quality management. They established the fundamental standards for measurement processes in the heyday of the old "National Bureau of Standards." Many of these fundamentals have now been forgotten because it is so easy to make measurements today. If everyone and anyone can make measurements today, how can anything go wrong? Well, the editors of several leading journal are beginning to worry that something is terribly wrong with the data in peer-reviewed publications, which supposedly represent the highest standard of quality for all measurement data.

STARD - a step in the right direction

STARD is hopefully just the start of a new emphasis on good data. According to Dr. David Bruns, the editor of the journal of Clinical Chemistry, beginning in January 2003 the STARD checklist is part of the recommendations to authors who want to submit papers to the journal [5]. While the lack of QC is still a problem in the STARD guidelines, there will undoubtedly be revisions of these recommendations in the future. That limitation can be corrected.

The provision of training in QC for researchers and analysts in research laboratories is a much more difficult problem. There is little awareness of this need by the researchers themselves. However, research groups that have a central support lab often employ clinical laboratory scientists, who have the necessary skills to improve the quality of the data. Such laboratories also provide stimulating work and attractive working conditions, thus they can be very competitive for the limited supply of well-trained clinical laboratory scientists.

Clinical Laboratory Scientists - the means to good data

The bottom line is that clinical laboratory scientists are the best source of analysts who are trained to make measurements and understand laboratory test data. If you want good data, these are the people who are needed in your laboratory. If you want to validate your measurement processes, these are the people who understand the validation process. If you need to implement statistical QC, they can do it.

And they need not be limited to the laboratory! Healthcare organizations today would benefit greatly from broad deployment of clinical laboratory scientists in their quality management programs and systems. Anywhere that data are being collected and analyzed, the clinical laboratory scientist can contribute to the quality and improvement of the processes under study.

References

  1. Campbell B, Flatman R, Badrick T, Kanowski D. Problem with high-sensitivity C-Reactive protein. Letter to the editor. Clin Chem 2003;49:201.
  2. Ockene IS, Matthews CE, Rifai N, Ridker PM, Reed G, Stanek E. Response to problems with high-sensitivity C-Reactive protein. Clin Chem 2003;49:201-2.
  3. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative. Clin Chem 2003;49:1-6.
  4. Bossuyt PM, Reitsma, JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: Explanation and elaboration. Clin Chem 2003;49:7-18.
  5. Bruns DE. The STARD initiative and the reporting of studies of diagnostic accuracy. Clin Chem 2003;49:19-20.
  6. Wernimont G. Statistical control of measurement process. In Validation of the Measurement Process, Devoe JR, ed. ACS Symposium Series, American Chemical Society, Washington, DC 1977.
  7. Eisenhart C. Realistic evaluation of the precision and accuracy of instrument calibration systems. J Res Natl Bureau of Standards - C: Engineering and Instrumentation. 1963;67C: 161.

James O. Westgard, PhD, is a professor of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.

Other Essays:

Copyright © 2003. All rights reserved.
Westgard QC, 7614 Gray Fox Trail, Madison WI 53717
Call 608-833-4718 or e-mail us at westgard@westgard.com
A Message from JOW
QC Lessons | QC Applications | Questions | Multirule
CLIA Requirements | What's New?| Catalog | Demo Download
Home  | Glossary | ARCHIVES | Links | Feedback