Tools, Technologies and Training for Healthcare Laboratories

The Need for a Mean Rule: Designing QC for Phosphorus

Here's a question about "Westgard Rules" that we get a LOT: "Does the violation of the 10x rule mean anything? Can we ignore it?" At the heart of this question is another, deeper question: what QC rules are actually necessary to assure the quality of a method? With the help of user-submitted data on a phosphorus method, we'll take a closer look at this question.

Is there a need for a mean rule? Designing QC for a phosphorus method

Sten Westgard, MS
February 2012

Recently, one of our visitors submitted a query about something happening in their own laboratory:

"Biochemistry control records of Analyte Phosphorus for period May 2011 to October 2011 show 15 consecutive values on one side of mean in level 2. Of these 11 values are within 1 SD but 4 values are between 1 and 2 SD. However Level 1 conforms to Westgard Rule 10x. Please advise how to interpret it and what action should be taken in such cases. Will it be considered as violation or a warning?

This is one of those cases where the question didn't have an easy answer - at least without some further investigation and discussion. We rarely can reply with a "Yes" or "No" to most questions on the use and interpretation of QC. Usually, we need to know more about the method, the method performance (imprecision and bias), and the quality requirement before we can give a solid answer.

In this case, the laboratory was eager to share more details with us, including the Levey-Jennings charts:

Here's the picture for Level 1:

2012-Phosphorus-LJ1

Here's the picture for Level 2:

2012-Phosphorus-LJ2

While the pictures make the problem easier on see, we still face the same problem: if a 10x rule is violated, does it matter?

Imprecision Data

Let's back up a bit and look at the imprecision data. Again, the laboratory was very helpful and provided several months of imprecision data.

Date Level 1 CV%
2011/05 3.22 4.05%
2011/06 3.22 4.72%
2011/07 3.2 4.39%
2011/08 3.17 4.12%
2011/09 3.18 3.74%
2011/10 3.1 3.74%
Date Level 2 CV%
2011/05 7.28 2.2%
2011/06 7.2 2.54%
2011/07 7.25 2.35%
2011/08 7.24 2.22%
2011/09 7.3 2.22%
2011/10 7.18 2.19%

Again, we have a lot more numbers, but in the absence of setting expectations for performance, we don't know what these numbers mean.

What's the quality required by phosphorus?

To get an idea of whether the precision data is acceptable, we need to place those numbers in context of the quality required by the test. Strangely enough, we don't have as many sources for quality requirements for phophorus performance as we have for other tests.

As it happens, this laboratory wasn't in Australasia (where the RCPA is the most influential guidance), nor were they interested in the biologic variation goal. Instead, they chose to follow the guidance of the Rilibak. So the allowable error for this method is now 16%.

Now that we have a quality requirement, we can start comparing imprecision to the goal. We do this by calculating the Sigma-metric.

What's the Sigma-metric of this phosphorus method?

To be extremely brief (click here for much more detail), the Sigma-metric equation is this: Sigma-metric = [ (Quality Requirement) - (Bias) ] / (Imprecision)

Or, as a more formal equation: Sigma-metric = (TEa - bias) / CV

For the moment, we don't have any information on bias. So we will ignore it - don't worry, we'll come back to it later. For level 1, in the month of May in 2011, we have a 4.05% CV. Thus, our Sigma-metric is

Sigma-metric = 16 / 4.05 = 3.95

If we carry out the calculations for all of the data, here's what we get:

Date Level 1 CV% Sigma-metric
2011/05 3.22 4.05%  4.0
2011/06 3.22 4.72%  3.4
2011/07 3.2 4.39%  3.6
2011/08 3.17 4.12%  3.9
2011/09 3.18 3.74%  4.3
2011/10 3.1 3.74%  4.3
Date Level 2 CV%  Sigma-metric
2011/05 7.28 2.2%  7.3
2011/06 7.2 2.54%  6.3
2011/07 7.25 2.35%  6.8
2011/08 7.24 2.22%  7.2
2011/09 7.3 2.22%  7.2
2011/10 7.18 2.19%  7.3

Again, to make an extremely reductive summary (click here for more details), 3 Sigma is generally considered the minimum acceptable performance, 4 is adequate, 5 is good, and 6 is great. So on the low end, performance is above the minimum and hovers around the adequate level. At the high end, performance is much better.

What QC rules would you use for this method?

If we consider just the data we have (ignoring bias), we could use OPSpecs charts to select the appropriate QC procedures for this method. You could download Normalized OPSpecs charts, which are freely available but require you to make "normalization" calculations in order to plot the results, or you could use specialized QC design software to generate custom charts. For this example, we're going to use the specialized software and generate the exact charts we need.

For level 1, here's the situation:

2012-Phos-OPS16-L1-90

Unfortunately, we see that most of the operating points are to the right of the available candidate QC procedures. That is, the only rule that is can provide adequate quality assurance for about 4 of the 6 points is an extended set of "Westgard Rules." This set of rules requires only 2 controls, but you must look-back at 3 previous runs in addition to the current run. Note that this is on a 90% Analytical Quality Assurance (AQA) OPSpecs chart. That means that if an error has occurred, you have a 90% chance of detecting that error in the first run (your average run length (ARL) to detect the error will be 1.1; most of the itme you will catch errors as they happen.)

If the laboratory was willing to accept 50% AQA (that is, an ARL of 2, which means that on average, you will catch the error on the second run after it has occurred), you could look at a different OPSpecs chart:

2012-Phos-OPS16-L1-50

Some other possibilities become available here. If you are willing to use 4 controls, you could get down to simple rules or small sets of "Westgard Rules" that avoid any need for mean rules. Of course, this is a trade-off, because with 50% AQA, that means by the time you catch the error, one run has probably already slipped past you. This kind of decision should be made only when you've got a very stable, reliable method. You wouldn't want to go to 50% AQA on a method that is having frequent problems.

Now, if we switch to looking at performance on the higher level, it's a different story. Here's the OPSpecs chart (90% AQA) for level 2:

2012-Phos-OPS16-L2

In contrast to the other level, we have some great news: very simple rules can provide adequate error detection for this level. With just two controls, and 3s or 3.5s control limits, we would still detect medically important errors at this level.

Now we have an interesting challenge: how do we reconcile the two levels of performance? If level 1 needs "Westgard Rules", but level 2 doesn't, how do we design QC procedures for this test? The practical reality is that we probably can't run two different QC designs at the two different levels of this method. Most QC programs provided with instrumentation or LIS or middelware do not allow that level of customization. So we have to choose one design for this method.

It would help us to know more about the laboratory's use of this particular test. Are clinical decisions and test interpretations more important at the lower level or the higher level? If the test results are only meaningful at the higher end, then we can worry less about the performance at the lower end. If the reverse is true, that phophorus deficiency is clinically more important for this health system, then we need to use performance at that level to dictate how we QC the method.

Let's assume the following scenario: performance at the low end is more important, so we need to use some set of "Westgard Rules." We could still select a set of rules that avoids a mean rule, if we're careful. Furthermore, we could selectively enforce a mean rule if we needed to. That is, if we needed to use a 10x rule, we could make sure that we only implement that strictly on the low level - if we saw a 10x violation on the upper level, we wouldn't need to pay close attention to it. If you chose to make this kind of selective enforcement, you would really need to document all the reasons why violations on the low end matter while violations on the upper end aren't as important.

There's one other question to ask, though: if we do see a 10x violation, on either end, what kind of rule is being violated? Is it only a "warning rule" or is it a rejection rule? As we said earlier, through QC design by OPSpecs charts, we could select a set of "Westgard Rules" that didn't include a mean rule. QC Design is the selection of rejection rules (rules that, if violated, indicate that something is truly wrong with the method and the laboratory needs to reject the run and trouble-shoot). That doesn't mean you won't see 10x violations. It just means that when 10x violations occur, they don't indicate that you have to reject the run based on that information. If you do see a 10x violation, it will be up to the laboratory to decide what to do with that information (treat it as an advance indicator of something degrading, etc.).

If we proceed with the following assumptions:

  • performance at the low end is most important
  • some form of "Westgard Rules" is needed to QC this method
  • we don't have to be as worried about "Westgard Rule" violations on the high end
  • we can select a set of rule that don't include mean rules

then it is possible to conclude that a 10x violation on the high level is not something worth worrying about.

What about bias?

Remember, though, we were assuming that bias was zero. Unfortunately, it rarely is zero in real life. There is always some difference between the lab method and more higher-level method. In this case, the laboratory has some information from the controls. The controls come with an assigned mean - so the actual laboratory mean can be compared to this value and a bias can be calculated. Once you have calculated bias, you can use it in the Sigma-metric calculations and incorporate it into the OPSpecs chart for QC Design.

Date Level 1 CV% Bias Sigma-metric
2011/05 3.22 4.05% 6.9% 2.2
2011/06 3.22 4.72%  6.9% 1.9
2011/07 3.2 4.39%  7.5% 1.9
2011/08 3.17 4.12%  8.4% 1.8
2011/09 3.18 3.74%  8.1% 2.1
2011/10 3.1 3.74%  10.4% 1.5
Date Level 2 CV%  Bias Sigma-metric
2011/05 7.28 2.2% 3.6% 5.6
2011/06 7.2 2.54% 4.6% 4.5
2011/07 7.25 2.35% 4.0% 5.1
2011/08 7.24 2.22% 4.1% 5.4
2011/09 7.3 2.22% 3.3% 5.7
2011/10 7.18 2.19% 4.9% 5.1

Once you bring bias into the picture, the Sigma-metric numbers go down. There is still a difference between level 1 and level 2, and level 2 performance remains pretty good. But once the bias gets factored into the QC Design, the choices are more stark. For the lower level, the performance issue becomes a bigger problem:

2012-Phos-OPS16-L1-bias

Even with 50% AQA, there aren't really any control procedures that provide adequate error detection. Even if we included mean rules, that might not give us enough error detection to catch an important error within a few runs of occurrence. Problems might persist and grow larger before we can detect them. In order to provide the best error detection, we would need to implement our maximum QC (as many controls as we can run economically, using as many of the "Westgard Rules" as we can apply).

2012-Phos-OPS16-L2bias

On the upper end, there are still a few solutions that don't involve extensive "Westgard Rules", mean rules, or too many controls. If the upper level performance is all that really matters, there are still way s to design QC that will avoid using mean rules.

Is there also another problem here?

Having discussed the data on performance, it's probably worthwhile to step back and look again at the Levey-Jennigs charts:

2012-Phosphorus-LJ1

Do you see how no points at on the actual mean, but many are just above the mean or below it?

This is probably a data-rounding issue. That is, by straight calculation, the mean is 3.18. However, the instrument or some software program is only reporting out 1 decimal place, not 2. So the test results come out at either 3.1 or 3.2, never 3.13, 3.15.3,18, etc. So you have the paradox that the system never reports a value that is calculated to be your mean. Here is a case where the capability of the software and instrument have a big impact on the QC possibilities. If you can adjust the system to report more decimal places, you can improve this situation. If you are stuck with just one decimal place, then you need to adjust your QC to fit the limitations of the instrument/software.

This is a recipe that will generate a lot of mean rule violations. Since the system will round up or down, it will throw values on either side of the mean. This problem may be such an issue that it trumps that QC design challenge. We may need to "round" our mean so that we're using the same significant figures as the instrument/software reports. Then we can tackle the QC design.

Conclusion

As you can see, this is a case of "Where you're going depends on where you're starting from." You need the answers to some key questions before you can make the right choice. Is performance at the low or high end more important? What quality requirement is most important? (consider this - if a tighter quality requirement was being applied, probably even the high level would need maximum "Westgard Rules") Does bias matter or can we ignore it? Is the data rounding issue to blame for mean rule violations - and can we fix that?

So often it happens that in order to answer a simple question, we need to ask a lot more questions first. If we can get those answers, we can determine the most appropriate and practical answer for the conditions of that specific laboratory.