Tools, Technologies and Training for Healthcare Laboratories

The Top 5-and-a-half Best LabSafeQuality CompareGrades

Sometimes it seems the Ratings, Rankings, and Awards are legion in healthcare. The current system of Top 50, HealthGrades, Best Hospitals seem a bit like the Grammys: there's a category for everyone, and everyone goes home a winner. But the recent proliferation of metrics and awards are not only confusing, they are now impacting reimbursement. The new metrics being implemented by payers and the Affordable Healthcare Act threaten to impose a set of unproven or dubious benchmarks on hospitals. Which raises the question: is the laboratory safe from these metrics? and why don't we see more ranking of laboratories, methods, and instruments? 

The Top 5-and-ahalf Best LabSafeQuality CompareGrades

April 2015
Sten Westgard, MS

TooManyAwardsDiscussed in this essay:

5 out of 4 rankings are confusing

The backlash is predictable: as more and more metrics are launched to measure healthcare success, assess quality, and provide the basis for pay for performance programs, the outcry against those same metrics rises. And often, for good reason.

In a recent article in Health Affairs, the 4 major hospital rankings were compared: the US News Best Hospitals, the Healthgrades Top 100 and 500, the Leapfrog Group letter grades for hospitals (A, B, C, D and F), the Consumer Reports 0-100 Safety ratings.(There are also the CMS Hospital Compare rankings, but these are not arranged so as to create a summary of performance which can then designate winners and losers.) The ratings of these major organizations are often the subject of mainstream news stories, particularly whenever the latest ratings are released. And whenever a good rating is "achieved" by a hospital, the healthcare group can be relied upon to tout the latest honor in their marketing.

A disturbing fact remains, though. The four major ranking systems don't agree. "No hospital was rated as a high performer by all four national ratings systems. Only 10 percent of the 844 hospitals rated as a high performer by one rating system were rated as a high performer by any of the other rating systems."

Much of the differences in the rankings can be explained by differences in the rating focus, differences in the measures, and differences in the financing of the project:

"Both Leapfrog and Consumer Reports focus on safety for rating hospitals, although each defines safety differently. Leapfrog defines it as 'freedom from harm,' while consumer Reports referes to 'a hospital's commitment to the safety of their patients.' HealthGrades' Top 50 and Top 100 ratings stress quality, highlighting hospitals that consistently perform well on patient outcomes, as measured by mortality and complication rates. U.S. News focuses on identifying the 'best medical centers for the most difficult patients,' with the goal of helping consumers determine which hospitals provide the best care for serious or complicated medical conditions and procedures."

So these ranking and rating systems aren't really aiming at the same targets. Still, one would think that a hospital that excels and keeping patients free from harm would also be a hospital that provides the best care for serious or complicated medical conditions or procedures.

As always, it helps to follow the money: "[T]he four national rating systems also differ in how they finance the ratings. Independently funding the work of rating hospitals (that is, acquiring, analyzing, and presenting the data) is essential because such work is rarely underwritten by grants or other public funding sources. Consequently, Leapfrog, US News, and HealthGrades finance their ratings, in part, by allowing hospitals to use their ratings in advertisements and promotional materials for a fee. In contrast, Consumer Reports does now allow hospitals to use its ratings promotionally and instead releases its ratings only to paid subscribers." The danger of having the hospitals pay fees for the rankings is that you end up allowing the foxes to design the chicken coop – the rankings are influenced to appease and appeal to their sponsors.

Closer to home, Esposito et al published a recent critique over the new Quality Metrics, pointing out that many of the new Quality Indicators have been implemented while overlooking the complications that each measure may cause. Even the Patient safety indicators, which are almost universally agreed upon, show low positive predictive value and it will always be difficult to assess which patient safety incidents were preventable or avoidable.

"On October 1, 2012, CMS began penalizing hospitals for higher readmission rates for heart failure, acute myocardial infarction, and pneumonia as part of the Hospital Readmission Reduction Program (HRRP). Of note, about 25% of all patients discharged after admission for heart failure are readmitted within this 30-day bracket. However, it is unclear whether the factors that influence hospital readmission rates are inherently beyond their control....Based on the pattern of reimbursement cuts, one study has shown that large teaching hospitals and safety net hospitals are most likely to be penalized, suggesting that higher readmission rates might be due to lower socioeconomic status and greater case complexity"

Where are the test ratings? Instrument rankings? Top 50 Laboratories?

Aside from a "Lab of the Year" contest here and there among the trade journals, there isn't as much ranking of laboratories, instruments, or tests. Finding a comparison study of all methods for a particular test is pretty rare, and it is particularly complex to create a level playing field on which to judge performance.
Given the vast amounts of measurements being made by labs, EQA and PT programs, and manufacturers, surely we have the "big data" we need to rate and rank. But for some reason, we don't.

The Top 5-and-a-half Reasons why we can't make a Top 10 list of the best Labs, Instruments, or Methods

  1. We don't agree on how to judge which is best. Even if we made analytical quality the standard by which we judge a lab, instrument, or method, there is no agreement on the technical definition of quality. Is it simply the lowest imprecision? Bias? Total Error? Uncertainty of measurement? For most labs, the two most important factors considered are "cheaper" and "faster" – but is TAT the only useful benchmark? Or a broad test menu? Or a high workflow volume? Is the cheapest method or instrument automatically the best?
  2. Even when we focus on a single dimension of quality, we can't agree on the standards for that dimension. Take total error, which can be calculated for a lab, method, and all the methods of an instrument. But to determine whether the total error is acceptable or unacceptable, you need some standard of allowable total error. CLIA in the US, Rilibaek in Germany, RCPA in Australia and Asia and the "Ricos database" all set different goals for allowable total error. There's a very healthy debate about which source of goals is better than others, and the 2014-2015 Edict of Milan has mandated which types of goals are preferred over others. But in practical terms, there are still frequent discrepancies in benchmarking, leading to the typical problem of comparing apples and oranges.
  3. The organizations that have the most data are the least like to share it. Manufacturers have a lot of data from their customers. They are keenly aware of the strengths and weaknesses of their own products. But they are highly unlikely to share this knowledge with outsiders. I have always encouraged labs to ask (nay, demand) that their prospective vendors provide detailed performance data. It's surprising how little a vendor will provide to a possible customer. A few short-term validation studies, or poster abstracts, is not enough. Manufacturers should be able to provide real-world data on the performance of their methods and instruments over the long-term. If a manufacturer spends 3-5 years developing a new instrument or method, they should be willing to show you 3-5 months of performance data from a real customer. Hiding behind the excuse of "customer privacy" doesn't suffice. If you're a manufacturer, you need to be able to show your data.
  4. The organizations that have the most potential to set objective standards are also the least likely to set challenging standards. If you're an EQA or PT provider, your bottom line is based on how many labs you can get to subscribe to your surveys. You can't afford to alienate a group of labs or a specific set of labs that use a particular brand of instruments. If you're a professional organization, your bottom line is based on how many sponsorships and advertising and you can get from the vendors (membership revenue is just not enough to sustain your organization). And if you're a government agency, your bottom line is the funding (which is meager) and the mandates (which are many) you're trying to serve; you're unlikely to be able or willing to confront a large segment of labs or manufacturers and tell them they're performing unacceptably (witness the recent back-tracking of the "off-label" regulations for BGMS by CMS). All the revenue and financial incentives point in one direction. The right thing to do is going to differentiate between good and bad performers. The easier path makes everyone ignorant but complacent.
  5. The laboratories most likely to have quality and performance problems are also the least likely to address them. The more compliance-oriented a laboratory is, the more likely they are willing to allow poor methods, poor instruments, and poor practices to persist in their operations. After all, as long as you run 2 controls per day, CLIA doesn't ask you about the performance of the method. Nor does CLIA judge you on the how many times you repeat a control, but on consistency of your control routine.

    (and-a-half). The disconnect between the end customer (be it clinician or patient) and the test results means it's difficult to bring proper market pressure to the situation. Patients, clinicians and hospital administrations assume that tests and instruments are a commodity: all quality is the same and the only difference is price. This is patently not true, for so many tests, but as long as the customers believe it, it's hard to get them to understand the extent of the risk they are facing. Progress will be hard and slow as long as customers are oblivious.

The Rank Conclusion? Rank Confusion

As much as we hope for a grand unity of quality goals, that may not be in our future. If anything came out of the Milan 2014 meeting, it was two clear points: (1) The most evidence-based goals may be the goals that are hardest to achieve, and (2) There will be no one source of quality goals – partly because of reason #1. We will have to accept that there are quality requirement goals are going to come from "mixed media" instead of one pure substance. Some of our goals will be able to attain the higher status of being purely driven by evidence and outcomes but most of our goals are "state of the art"
Because of the way incentives are aligned in the diagnostic industry right now, it is unlikely that the large professional organizations. or governments, or vendors are going to start making "Top 50 methods" lists. Only a few rogue elements may be capable of that.

We are caught between Drucker's "If you can measure it, you can't manage it" and Deming's 10th point: "Eliminate slogans, exhortations, and targets for the work force asking for zero defects and new levels of productivity. Such exhortations only create adversarial relationships, as the bulk of the causes of low quality and low productivity belong to the system and thus lie beyond the power of the workforce."

We've got to make more measurements. We've got to do some ranking, because all the evidence is showing us large variation in the quality of today's methods and instrumentation, even today's laboratories. But we need to balance that with a realistic (and benevolent) perspective. In some ways, the laboratory can't ultimately be held responsible for today's instrument performance –most of that performance is built into the box – but if that quality isn't measured, we'll never be able to demand better performance (or the right performance) from the next generation of instruments.