A Six Sigma Risk Analysis Lab Example
At the 2012 Quality in the Spotlight conference in Antwerp, Belgium, Sten Westgard had the pleasure of presenting example applications of data-driven Risk Analysis. In collaboration with EndocLab in Portugal and Dr. Graca Salcedo, Six Sigma Risk Analysis was performed on three different tests.
A Six Sigma Risk Analysis Example: defining QC frequency with FMEA
This study comes to us courtesy of Dr. Graca Salcedo, who initiated and implemented the quality improvement techniques in her own laboratory. Dr. Salcedo is the head of EndocLab in Portugal. This is a small laboratory associated with about 200 patients.
Many years ago, Dr. Salcedo had implemented Sigma-metrics and OPSpecs charts, assessing the performance of her methods and optimizing the QC procedures based on that performance assessment. However, after several years of implementation, she noticed that many of the controls were simply never out-of-control. So she began to question the frequency of the QC she was running. To assist her in her evaluation of QC frequency, we suggested Risk Analysis via FMEA (Failure Mode Effects Analysis).
Dr. Salcedo chose to try Risk Analysis on three of her laboratory methods: Glucose, Total Cholesterol, and GGT. She decided to apply Risk Analysis on the QC process for these three tests.
The FMEA framework
Recall that in FMEA, there is a choice between a model with two factors and a model with three factors. In the latter case, you make estimates on the occurrence (OCC), severity (SEV) and detection (DET) of each failure mode. In the former case, you ignore detection and focus only on occurrence and severity. In EndocLab's implementation, detection was not ignored and they used the 3-factor risk model.
Within each factor, there is then a choice of scales. In traditional risk analysis, you can rank each factor from 1 to 3 (qualitative), 1 to 5 (semi-quantitative), or 1 to 10 (quantitative). For this example, we're going to do something unusual, though. We're going to record each factor on a scale from 1 to 100. In some ways this will be natural to do because we're dealing with probabilities and percentages that are scaled from 1 to 100. The reason for doing this will be seen at the end.
Rating Occurrence (OCC)
The first factor is occurrence, which is a rating of how often the failure mode is happening. It's essentially an error rate. As we noted earlier, Dr. Salcedo was essentially not seeing any out-of-control results. However, in her external quality assurance testing (EQA), there were errors occurring. So this rate was used as the factor for OCC.
|Test||% QC errors observed (OCC)|
Rating Severity (SEV)
Originally, Dr. Salcedo's laboratory ranked the severity of the failure modes on a scale of 1 to 10, where a ranking of 3-4 repesens a delay in treatment, and a ranking of 5-6 represents inadequate treatment. This is an area where expert judgment is used, not a set of data of patient outcomes.
We further modified this ranking by converting it to a scale of 1 to 100 instead of just 1 to 10.
|Test||% QC errors
Ranking Detection - through QC Design
Usually detection is ranked by a consensus estimate, but in the case of EndocLab, we can be far more data-driven. EndocLab has continuous measurements of imprecision and bias. Combining those performance measurement with a quality requirement, it is possible to determine the critical systematic error and also determine the appropriate QC procedure for the test. Knowing the QC procedure and the imprecision and bias estimates allows us next to determine the expected error detection that each QC procedure will provide.
Since EndocLab has been routinely monitoring controls, it was simple to get a total imprecision estimate. And since the laboratory also participated in EQA, it was possible to estimate the bias between the EQA mean and the laboratory mean, which could then be converted into a bias. Finally, the laboratory makes use of biologic variation database as its source for quality requirements - desirable total allowable errors are chosen from the database.
|Test||% QC errors
|Glucose||0.86||40||1.54%||0.66%||6.94%||"Westgard Rules", N=4|
|Total Cholesterol||3.39||60||1.77%||0.92%||8.97%||12.5s, N=4|
Through Critical-Error graphs, we can determine the probability of error detection (Ped) of the QC procedures selected for each test. Then, the detection rate (DET) = 1 - Ped.
Detection (DET) is measured with a high number indicating that error detection is not good, and a low number indicates that detection is very good. Again, we're measuring it on a scale of 1 to 100. So if the probability of error detection (Ped) is 80, that gives us a DET factor of only 20.
The reason we do this with DET is that this factor gets multiplied together with SEV and OCC, with the product representing the size of the risk. Thus, a high DET leads to a higher risk number.
For example, the "Westgard Rules" applied with N=4 on the glucose method gave 92% error detection, so DET = 1 - 0.92. DET therefore = .08, which we then convert to a save of 1 to 100, making the DET number an 8.
|Test||% QC errors
|QC Rules||(DET = 1 - Ped)|
|Glucose||0.86||40||"Westgard Rules", N=4||8|
|Total Cholesterol||3.39||60||12.5s, N=4||1|
Calcuating the Risk
Now that we have all the factors, we can combine them into a product that represents the risk of each failure mode. When you are using three factors, the combined product is called a Risk Priority Number (RPN). Expressed as an equation, RPN = OCC * SEV * DET.
So for glucose, our OCC is only 0.86, SEV is 40, and DET is just 8. The RPN therefore = 0.86 * 40 * 8 = 275.
|Test||% QC errors
|QC Rules||(DET = 1 - Ped)||RPN|
|Glucose||0.86||40||"Westgard Rules", N=4||8||275|
|Total Cholesterol||3.39||60||12.5s, N=4||1||204|
Now here is where we talk about why we've evaluated each factor on a scale of 1 to 100. If you multiply three factors of 100 together, you get a million. So the RPNs are being expressed on a scale of 1 to 1 million. That might sound ridiculous on its own, but recall there's another key metric that gets measured on that same scale: Six Sigma. That's right: Sigma-metrics are measured by counting up defects that occur per million opportunities (DPM or DPMO). Thus, we can take these RPNs and do something more than just compare them relatively to each other. We can go to a Six Sigma table and look up the short-term Sigma value for each number:
|Test||% QC errors
|QC Rules||(DET = 1 - Ped)||RPN||Sigma short-term scale|
|Glucose||0.86||40||"Westgard Rules", N=4||8||275||4.9|
|Total Cholesterol||3.39||60||12.5s, N=4||1||204||5|
At the end of our Risk Analysis, we now have the failure modes ranks on a scale that's familiar to us. We know we want to achieve a goal of Six Sigma, where there are almost no defects per million opportunities. We know that 3 Sigma is the "floor" on performance in many industries. So the results we get from EndocLab's methods are pretty impressive. They have one world class method and two methods that are good to excellent.
But this is still not the end point of the Risk Analysis. At this point, Dr. Salcedo evaluated the performance metrics and made a decision on QC frequency. There is not current methodology for relating the frequency of QC analysis to the test performance. Intuitively, we know that there must be a relationship between the frequency of QC and the risk of the test interpretation, as well as a relationship to the volume of testing. But there isn't an equation that converts performance or Sigma-metrics into a specific frequency of monitoring. Thus, at this step. Dr. Salcedo used her judgment:
|Test||% QC errors
|QC Rules||(DET = 1 - Ped)||RPN (DPM)||QC Frequency|
|Glucose||0.86||40||"Westgard Rules", N=4||8||275||Weekly N=4,
N=2 Monday and Wednesday
|Total Cholesterol||3.39||60||12.5s, N=4||1||204||Weekly N=4,
N=2 Monday and Wednesday
|GGT||0.001||30||13.5s, N=1||10||0.3||Weekly 1 control per run,alternative levels|
During the part of the presentation, the conference audience grew a little uncomfortable. While the earlier steps were relatively transparent (you could see they were data-driven or at least there was a rationality behind the rankings), the last decision on QC frequency was more controversial. Some in the audience felt that there was a minimum frequency of QC that should be performed, perhaps at least daily. Others were more comfortable with Dr. Salcedo's decision. Ultimately, we see again that tools can help provide the basis for a decision, but they are not the substitute for a decision. It often still comes down to professional judgment, which may vary from lab director to lab director.
Commentary by Dr. Salcedo
Maybe this approach is not for all labs because it is a laborious one, almost an academic one. Most, I believe, will choose daily controls because it is simpler for auditing. I am interested in the performance and in optimizing management. And this rationale can be applied for all tests in the lab. The next step must come from the manufacturers, I believe, in presenting their product performance to customers, similar to how they sell TSH methods as an expected 3rd generation performance. And also, vendors must alter the way we can design and implement IQC, because in most auto-analyzers it is laborious to not do daily two levels of control.
30 years ago, we used 2 level controls each hour or every 50 or 100 patients, or calibration/control/patient. Since then, manufacturers have invested in optimizing their instruments and methods. QC must be a rational decision, but it is still sometimes a sentimental one. And it is understandable. Most of us in the laboratory are honest people, and very much afraid of harming our patients. We simply do not want any amount of risk. But I believe that time has come to change. We do not have money (including time) to spend in the wrong way. It is not rational to spend the same amount of money and time with each and every test. We all know that they are different. We must put aside feelings and dedicate our time and effort to serve our patients better.
I believe this article has a central question: how can I define my analytical run? Well, in these examples, we defined weekly runs. The baseline work was to try to understand the maximum period between "events". In these 3 examples, the run could be extended as much as 32 days minimum; but as we are a small lab, and close on Saturday, (so introducing an "event") I defined the run as a week, for these 3 tests. And applied the OPSpec charts to define the number of control measurements. But if an "event" occurs during that week (new lot of reagent, technical intervention, out of control monitoring, unexpected environmental changes - electrical, temperature, water..), the run stops there and IQC is performed, of course. The question that arises is always the same: "But what if something goes out of control, how do you guarantee that patient results are correct?" At Endoclab we have never delivered results when IQC was out of control. The work is simply blocked until the problem is solved. But rarely we receive out of control EQA (for sigma >4). In those cases, we apply to all the patients delivered under those conditions the new total error, and I revise all those reclassified (normal/not normal). But making use of a biological variation database, with astringent total analytical errors, for sigma like these, assures us that we simply get zero patients reclassified. And like everyone, we also have methods that hardly reach sigma 2, and oblige us to do much more that simply running 2 levels of control.
Also, we are using total biological error, as you mentioned, based on EQA group or method or equipment results. For at least some tests that have huge public health impact, like glucose or cholesterol, EQA should only report the true value. I fully agree with Professor George Klee that we should aim for zero errors. A 2 or 3% error misdiagnoses millions of people around the world! So we must change this rapidly.
So, some proposal should be done regarding to what a laboratory has to present to the auditor (we are accredited by ISO 15189). You could think about an excel spreadsheet to be fulfilled by the lab, that will give the auditor the basic support for analysis. And of course, the lab must have all the raw data for analysis under request.
Even if a few labs will choose this approach, I believe that this presentation is positive in showing that each test is a test, and should be treated individually. This is the only way to serve the patient, to make a correct follow up, to put aside the "chance" of getting it right. Also I hope that the article will provoke and encourage discussion of a proposal for FMEA framework, regarding the main lab tests.
The publication of this study on the website should not be taken as an endorsement of a particular QC frequency. Westgard QC does not endorse the reduction of QC frequency to less than once a day, particularly when regulatory and accreditation bodies mandate otherwise. The goal here is to show that we can provide more data-driven, less arbitrary, implementations of FMEA and Risk Analysis. We don't need to pick a number from 1 to 5 based on our "gut feelings." We don't need to ignore factors like Detection, when the data and tools are readily available. Laboratories have a wealth of data that can be utilized for Risk Analysis. We hope that the future EP23 implementations in US laboratories will encourage this type of reality-based Risk Analysis.
Nevertheless, this application reveals that data will not be the sole deciding factor in any FMEA or Risk Analysis. At certain points, Risk Analysis must rely on professional judgment, and in some cases, our judgments will vary. This will make it difficult to have standards for Risk Analysis, unless the decision-making process is made more explicit by regulators and accreditation agencies.
Our thanks again to Dr. Salcedo for providing her laboratory data on these methods and implementing the Risk Analysis approach.