Preoccupation with Failure

Sten Westgard, MS

This lesson on High Reliabilition Organizations (HRO) focuses on another principle: Preoccupation. Sometimes, it's good to be worried.

Detecting Failure as early as possible
- Examples of detecting failure early
- Laboratory examples of detecting failure early
Encourage people to report failure
- Examples of reporting failure
- Laboratory examples of reporting failure
Make an effort to Anticipate how systems might fail
- Examples of anticipating failures
- Laboratory examples of anticipating failure
Specify mistakes that should not be made
- Examples of Red Rules
- Laboratory examples of Red Rules
Conclusion
References

September 2008

In the introduction to High Reliability Organization (HRO) theory, we listed five core principles that HROs exhibit. In this lesson, we will discuss the first principle – preoccupation with failure - in detail.

To most ears, a preoccupation with failure sounds like a bad thing. The very phrase, to be preoccupied, carries with it a negative association, something similar to obsession. If you heard that a laboratory was preoccupied with failure, you might conclude there’s a group of med techs who need prescription antidepressants.

In HRO theory, preoccupation with failure is a good thing. To be preoccupied with failure is not an unhealthy obsession or a paralysis borne of fear. Instead, it’s an antidote to the complacency that is bred by success. Preoccupation with failure means seeking out all signs of error, hoping to find weak signs that can be recognized and prevented from developing into full-blown failures. Preoccupation with failure is meant to lead to pre-emption of failure. An organization that is preoccupied with failure should be able to prevent future failure conditions.

Preoccupation with Failure, more specifically defined by Weick and Sutcliffe, means two things:

“First, it means that [organizations] pay close attention to weak signals of failure that may be symptoms of larger problems within the system. Second, it means that the strategies adopted by HROs often spell out mistakes that people don’t dare make. Organizations that look relentlessly for symptoms of malfunctioning, especially when these symptoms can be tied to strategic mistakes, are better able to create practices that preclude those mistakes.”[1]

In even more practical terms, Preoccupation with failure leads to several organization behaviors:

Strong emphasis on Detecting Failure – as early as possible
Encourage people to Report Failure
Make an effort to Anticipate how systems might fail.
Specify mistakes that should not be made.

Detecting Failure – as early as possible

Everyone can detect big errors and large failures. If you wait until the flames are surrounding you, you don’t need a smoke detector. But being able to detect failures when the signals are weak or small, when the errors are just emerging, is not a truly difficult task. That is what an HRO commits itself to - the detection of critical errors as early as possible.

Detecting Failure also means detecting small errors or “near misses.” In a typical organization, a “near miss” is greeted with a sigh of relief; they conclude, mistakenly, that their system worked. In an HRO, a “near miss” is analyzed, dissected, and discussed. Those “near misses” are really opportunities to examine the strengths and weaknesses of the current system and determine ways to improve it. The HRO takes a “near miss” and tries to change the system so the next time the error doesn’t occur at all.

Examples of Detecting Failure Early

In other industries where HROs exist – airlines, nuclear power facilities, railways – part of the stress on early error detection manifests itself as an emphasis on front line personnel. Maintenance workers and technicians in these industries have the most contact with process and are most likely to detect the errors when they are small.

Sometimes the signs of early failures come from unusual observations. Weick and Sutcliff describe a situation where

“[A]n East Coast power company told us about an unconventional indicator that had proved diagnostic of larger system issues. When the incidence of bee stings goes up for electrical linemen working in the field, it’s a sign that they are reaching into places without looking, and that means they may be getting sloppy when they handle active power lines.”[2]

Similarly, Sentara Healthcare, a group of seven acute care hospitals in Virginia and North Carolina, extended their efforts to detect and prevent errors beyond the normal practices. These hospitals not only engaged in Root Cause Analysis to seek out sources of problems, it also focused on auxiliary methods of error detection. For example, Sentara applied Common Cause Analysis to aggregate learning from near misses and other less serious events (Common cause analysis looks for recurring themes that may have caused multiple events). Sentara also uses a less detailed tool called Apparent Cause Analysis to learn from events that are less serious and don't require a full RCA. This approach stresses the need to pay attention to potential problems before they happen.[3]

Examples in the laboratory

For years, we have been recommending that laboratories abandon the 1:2s “warning” rule, because the noise (false rejections) generate by the rule often overwhelms the techs; the result of 2s rules is usually less attention paid to such signals, and/or workarounds that artificially extend the 2s range. These modified control rules can then detect only the grossest errors. Better, we said, to stop using warning rules and only use the rejection rules in the multirule procedure. Indeed, the most modern formulations of the “Westgard Rules” have no 2s warning rule at all.

But there is another “warning” rule that the best laboratories seem to be adopting. The mean rules like 10x 12x, where a number of results occur on one side of the mean, are being used by sophisticated laboratories as an early indicator of drift. By monitoring a mean rule, the laboratory has the potential to stop the drift before a true out-of-control situation occurs.

Likewise, those laboratories that institute Average of Normals (AoN) / Average of Patients / Patient Data rules are really trying to detect small signals of error. Since the AoN rules cannot specify what kind of error is occurring – you must run traditional controls and determine which rules are violated to diagnose the problem – these are effectively warning rules that are aimed are pre-empting any real out-of-control event.

Encourage people to Report Failure

It’s one thing to get someone to detect an error. It’s an entirely different thing to get someone to report it. Detecting Failure also means that the organization must be willing to make candid assessments of performance. They must be willing and eager to recognize errors, and share the news with others about them. A relentless search for problems is the path to finding solutions and preventing errors in the first place.

Too often, the organization where we work has a punitive culture, where management “shoots the messenger” and those who bear bad news get treated badly. Even if there isn’t a culture problem, organizations that are highly “boxed,” where personnel retreat into their own silos and are reluctant to communicate across boundaries, are more likely to underreport errors. And whenever errors aren’t reported promptly, those failures accumulate and fester and grow. In places where employees are encouraged to talk only about successes, and are punished for finding and/or revealing problems, those errors are more likely to grow into larger problems down the line.

Healthcare Examples of Reporting Failure:

At Christiana Care, a health system in Wilmington, Delaware that includes two hospitals, there are two practices that highlight their stress on reporting failures

"Inside of the eCare workspace the ICU staff has created a "catch of the day" wall. This wall is filled with cut-out fish that have stories of near-misses written on them. These fish serve as a visual reminder to staff members of the unit's commitment to recognize and build awareness of patient safety. Instead of hiding these near misses, Christiana seizes the opportunity to openly display its near misses so that everyone can become more aware of potential vulnerabilities and sensitive to other risks patients may encounter.
"Members of the Patient Safety Mentor Program attribute its success to recognizing the individuals who are able to find "near misses" and "good catches" from events that never reach a patient. Recognition occurs by identifying the staff member who prevented an event and awarding them a "recognition diamond." The Patient Safety Committee then sends an email to the individual's immediate manager and to each person up the senior chain of command so the senior leaders are aware of the staff members who are preventing errors from occurring." [4]

Laboratory Examples of Reporting Failure:

The Maryland General scandal gives us an excellent negative example – they did everything you’re not supposed to do. They ignored and intimidated staff who reported problems with instrumentation. They warned the staff not to speak to inspectors about problems during inspections. They punished people who spoke up, to the point that some members of the staff felt the need to go outside the hospital, to the press, in order to bring the problems to light.

One of the few positive outcomes of Maryland General was the requirements for whistleblower hotlines in every US laboratory. Both CAP and Joint Commission now explicitly inform laboratories of the ability of anyone to report a problem anonymously to them. Of course, by the time someone feels like they have to blow the whistle on a bad practice, the problem has probably grown into a crisis.

Within the laboratory, management needs to balance costs and quality, to make sure that while employees are asked to “do more with less”, that they must ensure that they aren’t providing less quality. Management needs to give their employees the right – even the professional duty – to report errors without fear of retribution. Concepts like Just culture or Reporting culture come into play here.

In one sense, laboratories have little problem reporting “out” flags during QC runs. They have more a problem respecting that out-of-control signal. Instead of simply repeating the control until it falls back "in," they need to act on that out-of-control signal and determine why the control was out and what can be done to correct the instrument so it doesn’t happen again.

Make an effort to Anticipate how systems might fail

For an HRO, even a “quiet period” can’t be trusted. Simply because no errors have occurred recently does not mean that the system is safe. Instead, it’s possible that the errors currently happening are not being caught by the monitoring system. Paradoxically, in an HRO, failure is a rare event, so it’s increasingly difficult for the organization to analyze concrete data. Instead, the HRO has to figure out how to detect problems that are ever more rare.

A healthy imagination is necessary for an HRO. You must constantly envision scenarios where the system could fail. You must even contemplate ways in which the monitors would miss the problem. By clever anticipation of this sort, HROs can nip errors in the bud. If you can predict how the system might fail, you can modify the system to prevent the error in the first place (or at least detect it earlier).

Examples of Anticipating Error:

At Sentara Healthcare, the staff was encouraged to discuss areas where potential errors might occur. One possible danger zone was mentioned repeatedly: in the area around medication dispensing machines, interruptions were common and could pose a risk to patient safety. An interruption during drug dispensation could mean incorrect doses, wrong doses, etc.

Sentara Leigh, one of the hospitals within the Sentara Healthcare system, instituted a No Interruption Zone around the dispensation machine, where (obviously) no interruptions were allowed. Signs, colored tiles on the floor, and other organizational behaviors reinforced this protected, “sterile cockpit” where nurses and other technicians could concentrate on a single critical task.

“They proactively addressed the risks, rather than waiting to respond until a patient had experienced serious harm….Sentara Leigh's culture emphasizes mindfulness that encourages staff to reduce risks even before those risks are known to have caused a patient harm.

“They viewed small breakdowns in their processes for drawing medications and transitioning patients as signs of danger rather than as proof that the overall system was safe. Many staff could tell stories of how they had found a medication stocked in the wrong location and had put it back where it belonged. It was common for nurses to be interrupted when leaving their shift and to realize later that they had neglected to mention something about a patient to their counterpart on the incoming shift. In many organizations, these kinds of stories would be viewed as proof that the system was safe, since in each case the mistake was caught before the patient was harmed. But in a system that is preoccupied with failure, these small breakdowns were correctly recognized as small events that ought to be addressed because they increased the likelihood of a major medication error.

“They promptly acted based on the information they had rather than attempting to collect data to establish the exact magnitude of the problem. Certainly organizations need data in order to set priorities and justify major investments. But in this case, the solutions did not require significant resources or justify waiting until a way of quantifying the risk could be developed and implemented. Staff were convinced that interruptions and poor communication were creating risks for their patients and that small changes in how they drew medications or communicated with each other could reduce those risks. This proactive approach to identifying and eliminating small risks is characteristic of cultures that are preoccupied with failure.” [5]

Anticipating Errors in the Laboratory

Laboratories have less control over their instrumentation than they used to. These devices are becoming more and more like black boxes. While engineers at the diagnostic manufacturers might be able to assess the probabilities of component or sub-system failures within an instrument, the laboratory customers at the other end probably can’t. It’s become harder and harder for technicians in the lab to predict how their instruments might fail.

Here is where the new CLSI guidelines on Risk Information, as well as techniques like FMEA (Failure Modes Effect Analysis), would be useful to the laboratory. If a manufacturer could supply you with detailed Risk Information, particularly information on residual risks and ways that these risks might manifest themselves, that would give laboratories more ability to anticipate failures and mitigate those risks.

Good QC Design is another proactive technique that a laboratory could implement. Using data on current performance, or method validation data, you can calculate the medically important critical errors that the system must detect, and customize the QC procedures to ensure that those errors will be detected.

Specify mistakes that should not be made.

Detecting, reporting and anticipating errors are crucial tasks for the laboratory. But on a higher plane, at the level of organizational culture, it’s just as important to express what mistakes are totally unacceptable to the institution. Explicit commandments “Thou shalt not…” help define the organization’s values. Those values can help guide the staff even during times when the rulebook isn’t clear. In times of crisis, having an employee who understands the values of the organization is probably better than having an employee who can recite the rules and procedures chapter and verse. Particularly in ambiguous situations, where rules and procedures might not even exist yet, having employees who understand the values of the organization – and what the organization simply will not tolerate – is the key to safety.

Probably the simplest example of this is the famous quote, “Failure is not an option!” uttered during the Apollo 13 crisis. But another example of expressing the unacceptable is the Red Rule. A Red Rule is a high priority rule that must be followed to the letter. Outside of healthcare, industries often express Red Rules as conditions when workers must “stop the line.” In other words, any deviation from a Red Rule brings work to a halt until compliance is achieved. Red Rules, in addition to relating to important and risky processes, must also be simple and easy to remember.[6]

Certainly every industry, including healthcare, already has a huge number of rules. What makes a Red Rule necessary and different? A Red Rule permeates the entire organization, from the line workers to top management. If someone stops the production line because of a Red Rule, the CEO of the company must back them up. Red Rules are meant to provide backbone to a safety culture

Examples of Red Rules

There are certainly a lot of conditions that shouldn’t happen in medical care. The problem with Red Rules in healthcare might be that we have to limit them to just a few that are simple and that can be easily remembered.

“A red rule in health care might be the following: “No hospitalized patient can undergo a test of any kind, receive a medication or blood product, or undergo a procedure if they are not wearing an identification bracelet.” The implication of designating this a red rule is that the moment a patient is identified as not meeting this condition, all activity must cease in order to verify the patient’s identity and supply an identification band.” [7]

The Joint Commission’s annual list of National Patient Safety Goals could be considered a set of Red Rules. They express in simple terms a set of unacceptable outcomes. The growing list of Never Events from CMS also represent a form of Red Rules. There, the rule is being backed up by non-reimbursement, not just management or regulatory support.

Red Rules in the Laboratory

First of all, let’s be clear in stating that “Westgard Rules” are not necessarily Red Rules. A Red Rule for the laboratory would be better expressed in simpler terms, like “No patient results will be released if the QC results are out.” A series of statistical rules is probably too complicated.

Another Red Rule for the laboratory is probably, “No specimen can come in or leave the laboratory without a patient identifier.” That’s a simple, obvious rule. Sometimes Red Rules must express the very basic assumptions of operation – because in some workplaces even the basic things are going wrong.

If we were to develop Red Rules just for analytical performance and quality control, it would be tempting to declare, “No 2s limits” or “No controls should be repeated without trouble-shooting the method.” Of course, this implies that the QC procedures have been properly designed and that methods have been selected with adequate performance. It might be a few more years, or a few more generations of instrumentation, before we can add that rule to the list.

Some Red Rules can be generic, rules that apply in every laboratory. Others might be more specific to the laboratory and tailored to their specialized functions. Red Rules in a lab that supports transplants will be different than a laboratory that only performs routine screening tests.

Conclusion

High Reliability Organizations cannot be complacent. They must be preoccupied with failure. This principle means that the organizations seek out every smaller, every harder to detect, signs of error and failure. These organizations must also create a workplace culture where workers are not afraid to report errors and failures; this ensures that when an error occurs, the entire system becomes aware of it. HROs must go one step even further, however, and use their imagination to envision how their current system might fail in the future, and thus make proactive changes that will prevent errors from occurring in the first place. Finally, HROs must be willing to define what mistakes and errors are unacceptable, so that the employees have a set of values that can guide them through difficult or ambiguous times.

References

1. Karl E. Weick and Kathleen M. Sutcliffe, Managing the Unexpected, Second Edition, Wiley, San Francisco, CA, 2007, p.46.

2. Ibid. p 47.

3. Becoming a High Reliability Organization: Operational Advice for Hospital Leaders. AHRQ Publication No. 08-0022, April 2008, Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/qual/hroadvice/ Appendix A. High Reliability Organization Learning Network Operational Advice from the Sentara Network Site Visit http://www.ahrq.gov/qual/hroadvice/hroadviceapa.htm#failure [Accessed September 17, 2008]

4. Becoming a High Reliability Organization: Operational Advice for Hospital Leaders. AHRQ Publication No. 08-0022, April 2008, Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/qual/hroadvice/ Appendix F: Case Studies in High Reliability Applications: EICU and Sepsis Prevention at Christiana Care http://www.ahrq.gov/qual/hroadvice/hroadviceapf.htm#preoccupation [Accessed September 16, 2008]

5. Becoming a High Reliability Organization: Operational Advice for Hospital Leaders. AHRQ Publication No. 08-0022, April 2008, Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/qual/hroadvice/ Appendix E: Case Studies in High Reliability Applications: Medication Dispensing Machine Redesign and Executive Walkarounds at Sentara Leigh. http://www.ahrq.gov/qual/hroadvice/hroadviceape.htm#preoccupation Accessed September 16, 2008

6. http://www.psnet.ahrq.gov/glossary.aspx#R [Accessed September 17, 2008]

7. Ibid.

Tools, Technologies and Training for Healthcare Laboratories

High Reliability