Best Practices for "Westgard Rules"
So we've catalogued some of the worst abuses of "Westgard Rules." What about the best uses? What's the best way to use "Westgard Rules" - and When, Why, How and Who, too? Here is a list of 12 practices to make your use of "Westgard Rules" better.
- 1. Define the quality that is needed for each test.
- 2. Know the performance of your method (CV, bias).
- 3. Calculate the Sigma-metric of your testing process.
- 4. Relate the QC procedure for the test to the Sigma-performance of the method.
- 5. Use single-rule QC procedures and minimum number of control measurements (N) for methods with high performance
- 6. Use single-rule QC procedures and moderate number of control measurements (N) for methods with moderate to high performance
- 7. Use multirule QC procedures for methods with moderate to low performance
- 8. Use multistage QC designs for methods with undesirable performance
- 9. Built and interpret multirules in a logical order and adapt the rules to fit with different Ns
- 10: Define explicitly the application and interpretation of rules within and across matreials and runs.
- 11. Only use multirules for which error detection characteristics are known.
- 12. Interpret multirules to help indicate the occurrence of random error or systematic error.
In a previous discussion, we described some of the "abuses, misuses, and in-excuses" involving the improper implementation and interpretation of "Westgard Rules" by instruments, LIS devices, and data workstation QC software. Now that we've cleared the air about the "worst practices", it's time to talk about "best practices" for doing QC right.
It's important to understand the problems (worst practices) in order to implement proper solutions (best practices). If your QC software is doing things wrong, no amount of effort on your part can correct for those problems. QC needs to be done right from the start.
Quality management begins with the knowledge of the quality that needs to be achieved. Sounds simple, doesn't it? But when I ask the laboratory professionals "What quality is needed for a test?" the answer is seldom a numeric or quantitative definition of the quality requirement. That number could be in the form of a total allowable error (TEa), such as the CLIA proficiency testing criteria for acceptable performance. Or that number could be in the form of a clinical decision interval (Dint), which is a gray zone of interpretation for patient treatment. This number comes from the physician and uses his/her diagnosis cutoffs as a way to figure out the level of quality needed in a method. A third possibility for that number is the biologic total error, as documented by a European group that has derived figures for the allowable bias and allowable imprecision from studies of individual biological variation. In any case, the sources of some of these numbers are here on the website or somewhere in your laboratory or hospital. Quality begins with defining the quality needed for each test.
If you don't know the quality that is needed, then it doesn't make any difference how you do QC. It's all arbitrary! It's like taking a trip without knowing the destination. Or playing soccer without marking a goal. Or trying to call someone without knowing their phone number - you may get to talk to someone, but they may not care to talk to you.
- The need for quality standards
- CLIA proficiency testing criteria
- Clinical decision intervals & quality requirements
- European biologic goals
- Desirable Precision, Bias and Total Error derived from Biologic Variation (Ricos database)
It's hard to argue with this, too, particularly since CLIA requires that a laboratory validate the performance of its methods. You estimate method precision (CV) and accuracy (bias) by method validation experiments when you introduce any new method. For existing methods, the results observed on control materials being analyzed in your laboratory right now can be used to estimate the method's CV and results from proficiency testing or peer comparison studies can be used to estimate bias.
Why is this important? You need to know how well your method is performing. CV and bias are the characterstics that tell you how your method is performing.
- The comparison of methods experiment - to estimate inaccuracy
- The replication experiment - to estimate imprecision
It's useful to have a metric that tells you out-front whether or not your method performance is good enough to achieve the quality that is required. Why do you need to know this? If method performance is bad, no amount of QC can overcome the inherent lack of quality. If method performance is extremely good, only a little QC is needed to assure the necessary quality will be achieved.
Here's the calculation:
Sigma = (TEa - bias)/CV
- Where TEa is the CLIA allowable total error (expressed in %),
- Bias is the systematic error (also expressed in %) compared to a reference method or compared to peer methods in a proficiency testing survey or peer comparison survey, and
- CV is the imprecision of your method (in %) as calculated from control measurements in your laboratory.
Here's an example. The CLIA criterion for acceptable performance for cholesterol is 10%. If a laboratory method shows a bias of 2.0% on proficiency testing surveys and a CV of 2.0% on internal QC results, the Sigma-metric is 4 [(10-2)/2]. What does that 4 mean? read on...
The sigma metric will give you a good idea of the amount of QC that is needed. If you have low bias and a small CV, the metric will be high (e.g., TEa of 10%, bias of 1.0%, and CV of 1.5% gives a sigma of 6.0). Instinctively you know that good method performance should require less QC. If you have a high bias and a large CV (e.g., bias of 3.0 and CV of 3.0 gives a sigma of 2.33), poor method performance will require more QC.
One direct consequence of this practice is that it moves you away from blanket application of any rules for all the tests in the laboratory. You should no longer use just one control rule or one set of control rules on all the tests in your laboratory. You adjust the rules and number of control measurements to fit the performance of the method. The imperative for manufacturers is to provide QC software the flexibility that allows users to optimize QC design on a test by test basis.
For our regular readers, these first four points shouldn't come as a surprise. Since Westgard Web came online in 1996, the articles, lessons, applications, guest essays - pretty much everything we post - have been trying to drive home the point that we need to define the quality we need and measure the performance of our methods. The OPSpecs chart and the Normalized OPSpecs charts (available online for free) are graphic tools to illustrate what control rules are best for your tests. The Validator® and EZ Rules® 3 software programs are automated tools to help you pick the control rules ("Westgard" or otherwise) needed by your tests. Indeed, these first four points are really universal guidelines for establishing the "best practices" for QC. Whether or not you're using the "Westgard Rules" in your laboratory, you need to do these things.
Amazingly enough, if method performance is good in relation to the quality needed for the test, you may not need to use multirule QC at all. When sigma is 6.0 or greater, any QC will do; use a simple single-rule procedure with 3.5s or 3.0s control limits and the minimum number of control measurements (typically Ns of 2 or 3). When sigma is 5.5 to 6.0, use 3.0s control limits and Ns of 2 or 3.
For methods having sigmas between 4.5 and 5.5, you need to be more careful in your selection of QC procedures. At the high side (5.0 to 5.5), you can generally use 2.5s control limits with an N of 2 or 3. At the low side (4.5 to5.0), you should use an N of 4.
This is the corollary to best practices 5 and 6. When single-rule procedures can't provide the high error detection that is needed, then you switch to multirule procedures with Ns from 4 to 6. For method sigmas around 4.0, multirule QC is the way to go.
For method performance in the 3.0 to 3.5 sigma range, you need to do a maximum amount of QC to achieve the necessary error detection. That amount of QC will be expensive, so to minimize the costs, you can adopt two different QC designs - one a STARTUP design for high error detection and the other a MONITOR design for low false rejections. You use the STARTUP during your (you guessed it) startup or any time when the instrument has gone through a significant change. For example, after trouble-shooting and fixing problem, use your STARTUP design to make sure everything is ok again. The idea is to switch back and forth between these designs as appropriate. The STARTUP design should be a multirule QC procedure with the maximum N that is affordable (N=6 to 8). The MONITOR design can be a single rule procedure with a minimum N of 2 or 3.
Multidesign QC is the latest advance in "Westgard Rules". Most QC software programs don't have this capability because manufacturers (a) don't realize that some of laboratory tests perform so badly they need extra QC or (b) don't have the technical expertise in QC to know what QC features to offer their customers. Customers are also to blame because (a) they're happy to do the minimum QC to be in compliance with government regulations, (b) they're often too busy to worry about doing QC correctly, and (c) they're not asking manufacturers for better QC technology. This last reason is why marketing and sales departments in the major diagnostics manufacturers routinely downgrade the priority of QC features in new products. They're listening to the "wants" of their customer, but not addressing the true needs of the customer.
Contrary to public opinion, "Westgard Rules" doesn't mean a single combination of rules, such as the well-known
multirule procedure. That's just the first example we published of a multirule QC procedure. Other combinations are possible. There's no single "Westgard Rule" - which is one of the reasons why we've always preferred the term "multirule QC" over "Westgard Rules."
For certain types of tests, notably hematology, coag, and blood gas, controls tend to be run in three's, i.e., one low control, one middle control, and one high control. For situations like this, it isn't practical to use the "classic Westgard Rules"; those rules were built for controls in multiples of 2. So when you're running 2, 4, 8 controls, use the "classic" rules. When you're running 3 or 6 controls, use a set that works for multiples of threes:
10. Define explicitly the application and interpretation of rules within and across materials and runs.
Do you know what it means to apply a control rule within-material, across-material, within-run, and across-run? All of these applications of the multirule give you another chance to detect errors.
If you're running two controls per run, each control on a different level, measuring those controls once (N=2) and using the classic rules, here are the following questions that can come up. If you use the 22s rule, how do you do it? Are you applying it across materials, so if the first control is out 2s and the second control is also out 2s, you interpret that as a violation? Are you applying it within-material across-runs, so that if in the previous run, the high control was out 2s, and again in this run, the high control was out 2s, is that a violation of the rule?
It gets even more complicated with the larger rules. If you're using two controls and measuring once, how do you interpret the 10x rule? Do you look-back 10 runs on each control? Do you look back 5 runs on both controls? What if you're running 3 controls, measuring once (N=3) and are working with the 12x rule? Do you look-back on the last 6 results of two controls, the last 4 of all three controls, or just each control by itself, the last 12 runs?
Most trouble-some of all is the R4s rule. This is a range rule that is meant only to be applied within-run, so it can pick up random errors. If you apply it across-runs, the rule will also detect systematic errors and confuse the identification of random errors.
There are valid reasons to interpret the control rules one way or another. We're not even suggesting there is a complete "right" way to do the interpretation. If you want the 41s rule to only detect within-material (but across-run), that's fine. Just make sure you spell that out, both in your design, your implementation, and when you explain the features to the customer. If you don't specify what type of interpretation you're going to do, the customer may assume you're doing something more or less than you're doing.
In a way, this is where "manual" interpretation of the "Westgard Rules" is easier than computer implementation. Visually, you can look at the charts and instantly take in the within-run, across-run, within-material, across-material details, and you can choose to disregard any or all of them if you want. A computer program must be explicitly told how to interpret the rules. It won't look-back on previous runs, look across -materials within-run, or within-material across-runs, unless you write the code to do so.
The "Westgard Rules" aren't completely mix and match. You can't use all possible combinations of control rules. You can immediately see that using a 22s/2of32s makes no sense at all. What about a 22s all by itself? Is that even useful?
Here is a table of all the multirule combinations whose rejection characteristics are known:
Here is a list of some of the higher N multirules, which are for those seeking extreme quality control of extremely problem-prone methods. See the QC Application on Immunoassy QC with Higher N Multirules for more details on these rules. The power curves are also available online.
When we say "known" we mean that the error detection and false rejection characteristics of the combination of rules are known. All this means is that probability studies have been performed. So for those rules listed in the table, we know how well they detect errors. For rules that aren't listed on the table, we have no idea. If you're using a rule not on the table, you're flying blind, crossing your fingers and just hoping that the rule actually detects the errors you need to detect.
Why are the "Westgard Rules" the way they are? When we came up with them, did we pick all these rules out of a hat and stick them together? No, there was method to our madness. We chose each particular rule in the "Westgard Rules" because it was sensitive in a particular way to a particular kind of error. The error detection of the combination of those rules was in a way greater than the sum of its parts.
Quick review: there are two types of errors, random and systematic. Also coincidently, there are control rules which detect random errors better than systematic errors, and control rules that pick up systematic errors better than random errors. So the multirule combines the use of those two types of rules to help detect those two types of errors.
Here's a table listing the type of error and the control rule that best detects it.
|Type of Error||Control rule that detects it|
|Random error||12.5s, 13s, 13.5s R4s, R0.05, R0.01|
|Systematic error||22s, 41s, 2of32s, 31s 6x, 8x, 9x, 10x, 12x, x0.05, x0.01, cusum|
Given this knowledge, when you get a particular control rule violation, you can begin to figure out what's wrong with the test method by looking at the rule that was violated. Was it a 13s violation or a R4s violation? That's likely a random error. Were there 6, 8, 9, 10 or even 12 control results on one side of the mean? That's most likely a systematic error.
This is quite a list. I would say that most laboratories can't claim to have implemented all of these practices. Some laboratories may not be able to say they've implemented any of the points! But these are the best practices. If I knew a laboratory was doing all of these things with their testing, I would be extremely confident of the quality of their testing.
As I said earlier, many of these points are about QC best practices in general, not just "Westgard Rules" specific behavior. "Westgard Rules" are part of a QC context. Doing the best "Westgard Rules" means that you are doing the best all-around QC, too.
If you're reading this and find yourself at a loss of what to do, and/or where to start, fear not. Taking the first step is hard, but you quickly build up momentum. One quality improvement leads to another. The efficiencies and savings multiply as the quality improves.
If your QC happens to be in a poor state, there is still no reason to fear. That means there's a lot of room for improvement, and any step will make it better. Probably the best thing to do is not to try and tackle all the best practices, but instead try to eliminate all the worst practices first.
Finally, start small. Don't get overwhelmed by trying to change everything in the lab all at once. Try a pilot project first. Use the results to get management commitment from the powers above you. For those of you with highly automated chemistries, work on those first