Re-emerging Issues in QC Today
To repeat or not to repeat, that seems to be the eternal QC question for the laboratory. We can all recognize the theoretical problem with repeating the control, but in practice, most laboratories exhibit this kind of behavior. Dr. Westgard discusses why some QC problems persist, and what approaches to deal with them are best.
Re-emerging Issues in QC Today
James O. Westgard, Sten A. Westgard
In a recent review of the history and evolution of QC practices in medical laboratories, we pointed out that “QC for the Future” might involve regression to 1st generation QC or advancement to 6th generation QC. In this context, 1st generation QC represents Levey-Jennings charts or single-rule QC with 2 SD control limits, whereas 6th generation QC represents properly designed SQC procedures (that take into account the quality required for the intended clinical use of a test, as well as the precision and bias observed for the method) together with specific control mechanisms that target individual failure modes in the testing process (based on risk analysis).
At the AACC meeting in Atlanta in July, we had several visitors who were asking questions about emerging QC practices. They were stimulated by a poster “Should I repeat my 1:2s QC rejection?”  and a workshop “Beyond Westgard Rules: Quality Control for Multiplex Mass Spectrometry” . In historical context, these are actually re-emerging issues that were considered in the original development of multirule QC. Both issues have to do with the false rejection characteristics of QC procedures, which were the basis for the first recommendation of multirule QC . Furthermore, even the well-known publication referred to as “Westgard Rules” recognized that QC practices would need to be adapted to changes in the number of controls being measured , thus adaptations for multiplex analyses were expected and even the direction of those adaptations was anticipated.
Thus, we see history repeating itself in the QC practices that are emerging in medical laboratories today! Hopefully, we can also learn from history and make these new practices more understandable and effective.
Everyone knows that there is a 1 out of 20 or 5% chance that a control measurement will exceed control limits set as the mean ± 2SD. That applies when there is 1 control measurement per run. If there are 2 levels of control each run, as required by CLIA, then the chance of observing at 1 measurement outside 2SD control limits is about 9%. And it gets worse as the number of control measurements increase – 14% for N=3 and 18% for N=4. That’s why it is poor practice to use 2SD limits, or a 12s control rule, for rejection of a run. It’s also poor practice to simply repeat the controls until they fall within 2SD limits, though that is commonly done in many laboratories today.
The AACC poster recommended a different repeat protocol, allowing only 1 repeat of the controls, then applying a 22s rule for rejection. This is the same 22s rule that is part of the common 13s/22s/R4s multirule QC procedure. The difference is that in the initial run, a single control exceeding 2SD limits is interpreted as a “warning” and leads to analysis of addition controls to determine whether there should be a rejection. In effect, the false rejection problem from the 12s rule is reduced by applying a 22s rejection rule. Error detection is improved by adding 1 or 2 additional controls in response to the 2SD warning (12sW), thus increasing N to 3 or 4, as compared to the N=2 in traditional application of single-rule or multirule procedures.
This 12sW/22s QC procedure is an improvement if the practice could be strictly employed as defined. However, it is dangerous to condone a “repeat, repeat” practice because it may lead to a “repeat, repeat, repeat” practice and maybe even a “repeat, repeat, repeat, repeat” practice as well. We need to consider the "human factors" as well as statistical power. If we condone a little bit of repeating, the bench technologists learn to distrust the QC signals, which over time can lead to ignoring the signals completely (every out-of-control flag is treated as a "false" rejection and repeated until it falls back in, with the result that true errors get missed).
We think it is still better to employ a 13s/22s/R4s/41s multirule procedure (with N=2 control measurements per run and the rules interpreted across 2 runs, or R=2), with or without a 12s warning. If additional controls are analyzed, this multirule combination with a 41s rule, will provide even better detection of systematic errors. And it also includes the R4s rule that will improve detection of random error.
Multiplex analysis refers to quantifying multiple measurands in a single analysis. We typically think about advanced analytic systems, such as mass spectroscopy and molecular arrays, but remember the Simultaneous Multitest Analyzers (Technicon SMAs) in the 70s and 80s also performed multiple tests on a single sample. In those days, it was popular to “screen” patients with panels of 12, 16, or 20 different tests. One difficulty in interpreting such screening tests was the 95% definition of normal range and the increasing probability of observing an abnormal test result as the number of tests increased. For example, with 12 tests, there is a 46% chance that at least 1 will exceed the 2SD normal limits; with 16, a 56% chance; with 20, a 64% chance! The more tests that are done, the more likely it is to find something abnormal. This same false rejection problem will be encountered with controls when using 2 SD limits .
When controls are used across multiple tests, the false rejection problem will increase with both the number of controls per test and the number of tests. This was one of the issues that led to the development of multirule QC procedures. The strategy was to reduce the false rejection level for each test so that the effect across tests was also manageable. For multitest analyzers with 10 to 20 tests, it was effective to utilize multirule QC to keep false rejections low. Furthermore, it was possible to optimize the QC design on a test by test basis (to account for the quality required for the test and the precision and bias observed for the method). Such a strategy is particularly effective when most tests perform at the 5 to 6 sigma level, which allows the use of single rules, such as 13s or even 13.5s that have very low false rejections to be applied for many tests. Thus, for multiplex assays with up to 20 or 30 measurands, we could expect that a similar strategy would be effective – first design QC to monitor the quality needed for each test, making sure to keep the probability for false rejection low, then assess the probability of false rejection across all the tests.
In maintaining a low probability for false rejection, it is expected that there will need to make changes in the control rules. For example, Table 4 in the original multirule paper  recommended that the multirule algorithm be employed only up to 4 control measurements per run. For higher numbers of control measurements, mean and range rules, or mean and chi-square rules were recommended to allow more optimal design of the probabilities for false rejection and error detection. In practice, we have often used multirule procedures with up to 6 control measurements, but recommend that the R4s counting rule be replaced by an exact Range rule, such as R0.01, to maintain suitably low false rejection rates. Higher Ns call for the use of control charts for the mean and the standard deviation, which are comparable to the use of tests of significance, e.g., the t-test for means, and the chi-square or F-test for variances.
The use of t-test and F-test for QC of multiplex analysis is one of the recommendations from the “Beyond Westgard Rules” workshop.  This is entirely consistent with the principles of multirule QC to employ different rules for monitoring systematic and random errors. However, it will be important to optimize these new multiplex QC procedures for the clinical quality required for the intended use, not just employ statistical tests of significance. Past practices and existing tools for QC design should be applicable to these new multiplex analyses. Thus, multiplex assays with up to 20 or 30 measurands should be amenable to the use of multirule QC procedures that are optimized for the quality required for each test and the precision and bias observed for the measurement procedure. The multirules can include mean and SD rules to better control the probability of false rejections.
Multiplex assays with 50 or more measurands will require new QC strategies and algorithms, particularly as the number of measurands gets larger and larger. One approach suggested by Master  is to apply some form of pattern analysis. It is useful to recognize that multirule QC can also be understood as pattern analysis where different control rules are used to recognize the different patterns resulting from random and systematic disturbances of the measurement process. An essential part of pattern analysis is to focus on the deviations from a known or expected distribution, whether that distribution be control measurements, multiple patient measurands on a single sample, or multiple measurands on multiple patients. The development and evaluation of such pattern recognition algorithms might benefit from earlier work by Cembrowski  on the validation of patient data QC algorithms.
Another approach may be to focus on the possible failure-modes of the process and apply different control mechanisms to monitor each critical failure-mode, as outlined in our Six Sigma Risk Analysis methodology . The applicability of the risk analysis approach can be seen in the results of a recent study by Ellington et al  that investigated the critical variables in multiplex protein assays. Multiple control mechanisms may be needed, therefore, a risk analysis QC Plan may provide an effective approach for monitoring multiplex assays.
- Kuphipudi L, Yundt-Pacheco J, Parvin C. Should I repeat my 1:2s QC rejection? Abstract A-136, 2011 AACC Annual Meeting Abstracts.
- Master S, Grant R. Beyond Westgard Rules: Quality Control for Multiplex Mass Spectrometry. Morning short course. AACC Annual Meeting, July 24, 2001, Atlanta, GA.
- Westgard JO, Groth T, Aronsson T, Falk H, deVerdier C-H. Performance characteristics of rules for internal quality control: Probabilities for false rejection and error detection. Clin Chem 1977;23:1857-67.
- Westgard JO, Barry PL, Hunt MR, Groth T. A multi-rule Shewhart chart for quality control in clinical chemistry. Clin Chem 1981;27:493-501.
- Westgard JO, Barry PL. Cost-Effective Quality Control: Managing the quality and productivity of analytical processes. Washington DC:AACC Press, 1986, p 87.
- Cembrowksi GS, Carey RN. Laboratory Quality Management. Chicago:ASCP Press, 1989.
- Westgard JO. Six Sigma Risk Analysis: Designing analytic QC Plans for the medical laboratory. Madison WI:Westgard QC, 2011.
- Ellington AA, Kullo IJ, Bailey KR, Klee GG. Measurement and quality control issues in multiplex protein assays: A case study. Clin Chem 2009;55:1092-1099.