We need more data, more data, and more data

There is an oft-quoted maxim for the restaurant business, that only three things matter: location, location, location. In a similar vein, medical laboratories trying to assess method performance and analytical quality need to focus on three things: data, more data, and still more data.

We need more data, We need more data, and We need more data

When it comes to getting information about instrument performance in the diagnostic marketplace, laboratories face huge challenges in obtaining reliable objective assessments. The solution, of course, is more information. But what exactly is needed?

1. We need data.
(Because there isn't enough data out there about method performance.)

Laboratories trying to purchase new instruments are in a tough spot. How can the lab find out which instruments and methods are good and which are bad? Unfortunately, the main source of information about an instrument is the manufacturer itself. If you ask a diagnostic manufacturer if their instrument is good, it's a bit like asking a shark if it's a good idea to go for a swim. The manufacturer has every incentive to put its best face forward, and conceal, as much as is permissible, any faults or shortcomings of their instrument. Sometimes the marketing plays a bigger role in the sale of an instrument than data on method performance - because the manufacturer has almost exclusive control over the data related to performance. Often other data on cost, speed, and volume trumps the quality data (if quality is not simply assumed to be good).

This lack of information is more generally known as Information Asymmetry - a scenario where one party in a transaction has much more knowledge and therefore power than the other. In the case of used cars, it's the dealerships that knows more than the customers and can unload "lemons" (problem cars) on the unsuspecting. With stockbrokers, the asymmetry is revealed when you notice the brokers handle their own money quite differently than they handle yours. On the flip side, buyers typically have more information than sellers at estate and garage sales (the buyer knows that your painting from the attic is actually a priceless masterpiece...). In the case of medical laboratories, the diagnostic manufacturer has more information than the individual laboratory.

There are some sources of information about instrument performance - published studies, proficiency testing reports, anecdotal reviews and comments from peer laboratories - but these are not always easy to find. Proficiency testing results can be confusing, since many of the surveys are only consensus-based and results may be tainted significantly by matrix effects. Published Performance studies of methods and instruments are harder to find. Often they are only published as posters and abstracts at scientific meetings, not as full papers in the regular journal issues. These small studies don't get the wide circulation that they deserve.

There is also a phenomenon called "publication bias" which comes into play. This term describes the desire for scientific journals - as well as the scientists themselves - to focus only on new, interesting, and positive findings. "The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that 97% of all published psychological studies with statistically significant data found the effect that they were looking for."[1]

Performance of the newest method is more exciting than a confirmatory study of an older method. Papers that find "negative" results - that is, that do not find anything, or find something that contradicts earlier studies - are not published as often as papers that find "positive" results (a new method, a new correlation, etc.) Since both journals and scientists receive more recognition with positive findings, the incentives of the academic publishing community tilt the scales in favor of papers of that type.

Finally, there is no clearinghouse of performance data - the FDA and CMS do not disseminate method performance data on instruments and methods. While 510k applications must contain data on method performance, often that data is not made public in order to protect trade secrets. Only the manufacturer's claims become public and those claims often take some statistical skill to decipher. Nor do the professional organizations take on the task of providing a survey of performance and quality of the instruments in the marketplace. There is no "Consumer Reports" in the US laboratory marketplace.

So, even though the laboratory needs more of the "negative" studies - or the less exciting, confirmatory studies - the market of scientific publication does not supply them in sufficient quantity.

2. We need more data.
(Of the data we manage to find, we still need more of it, so that we can have more confidence in the calculations.)

Anyone with a statistical background will tell you that in order to truly know the standard deviation (SD) of a method within 10% of its true value, you need about 100 measurements. So why is it that the method validation studies often published in the literature only have 40, 20, or sometimes as few as 10 measurements in the replication study that estimates the mean, standard deviation, and coefficient of variation?

Because the expense of control materials for medical laboratories is significant, laboratories prefer to use as few control measurements as possible. Thus, if you calculate the confidence intervals of the typical method validation studies, you'll find they are wider than you expect. The small sets of data that normally comprise a method validation study are not really enough to get a really confident estimate of some of the errors. That's why these studies mainly verify a manufacturer's claim, rather than truly validate the performance of a method.

As we've mentioned before, one of our pet peeves is the within-run precision study, often performed by manufacturers during the launch of a new instrument. These instruments take years to research, design, and develop, but evidently a lot of manufacturers only want to spend one day evaluating their precision. Whenever we see a method validation poster that contains only a within-run precision study, it raises concerns in our eyes. Certainly, it is understandable to want to run the study that is the least cost, shortest time, and produces the most optimistic estimate, but the number thus generated is highly unreliable. Even cars must express their mileage-per-gallon as city and highway. Please, manufacturers, if you've invested years in building your latest, greatest instrument, spend a few extra days establishing its precision.

3. Finally, we need still more data.
(So we can be sure that the first study or, if we're lucky, studies, are not just anomalies)

One study might find anything. However improbable, a poor method could still produce a burst of miraculously great results. It's just that over the long run, with multiple studies, one wonderful finding on a poor method will be overwhelmed by findings that show poor results. This is the famous "regression to the mean."

So one study is not enough. It's great if you can find a single study on method performance, but you really want more than that. You'd like the finding to be confirmed by additional, independent studies. A method validation study may cover a few days or weeks of performance, but the real question is performance over the long term.

Again, journals and conferences don't like to feature repetitive findings. There's nothing novel in a "me-too" conclusion of a study, but the fact remains that progress in science has to occur in this way. One study reaches a conclusion. Another study has to take place to either confirm or refute that finding.

Why won't one study suffice? Because in addition to pure statistical flukes, there is the very real danger - as we mentioned earlier - that the study was flawed, either by intent, design, or unconscious desire. Dr. John Ioannais, an epidemiologist at Stanford University, has made his career studying the flaws in medical and scientific publications, has found that "the range of errors being committed was astonishing: from what questions researchers the posed, to how they set up the studies, to which patients they recruited for the studies, to which measurements they took, to how they analyzed the data, to how they presented their results, to how particular studies came to be published in medical journals.... Researchers headed into their studies wanting certain results - and, lo and behold, they were getting them."[2]

This pervasive problem with scientific studies - that over time the dramatic findings or effects seem to lessen - has been called the "Decline Effect."[1] It's a combination of selective reporting, statistical outliers, and scientific fashion. Scientists who want to find a positive result tend to find it, often helped along by the sheer randomness of the things they study. Journals, in the quest to publish the most interesting new thing, select the most dramatic results, which may only be the biggest outliers, and then tend to publish studies that support those earlier results.

This isn't a problem unique to the laboratory. As with doctors, it's good to get a second opinion. The lesson for labs is, try to get multiple studies, by different laboratories, from different periods of time. The more studies you can evaluate, the more reliable the aggregate findings will be.

Doomed to a dearth of data?

Invariably, labs will have less information than they want (and need) when it comes to making decisions on method performance. The moral here is that labs should try to get more, be cautious about the data they receive, and make a serious effort to validate the method they choose.

During method validation, labs should conduct more than just within-run studies to get good estimates of performance on the instrument that actually gets installed in their laboratory. Beyond the initial validation, labs should monitor long-term performance (the cumulative imprecision of several months of routine performance, the comparison of accuracy against a peer group over many data points).

And finally, try to share that data with other labs, through publications, posters, and professional associations. When more labs share more data, everyone benefits.

References

Jonah Lehrere, "The Truth Wears Off," The New Yorker, December 13, 2010.
David H. Freeman, "Lies, Damned Lies, and Medical Science," The Atlantic, November 2010

Hat tip to Dr. Jan Krouwer, who first drew attention to the article about Dr. John Ioannidis

Tools, Technologies and Training for Healthcare Laboratories

QC Design