Questions about QC for Multiple Instruments and Multiple Labs

In this world of ever-increasing volume and ever-consolidating labs, there is a growing desire to "simplify" QC designs by using similar means and similar SDs for multiple methods, instruments, and laboratories. Is this really a good idea? What are the the benefits? What are the risks?

Seeking Common ground on common means, common standard deviations, for multiple instruments and multiple laboratories

James O. Westgard, PhD
January 2019

Late in 2018 and early in 2019, the AACC Artery discussion board has hosted a running discussion thread about the use of means and standard deviations for the same methods on multiple instruments and multiple laboratories. While many of the comments are instructive, it’s clear there is a wide range of approaches currently in practice. In social media, the recommendations that will be recommended and possiblly adopted for laboratory practice usually are a result of "counting the votes.” Right now the most popular choice seems to be common means and common SDs. While this may be the easiest and most convenient choice, there's no evidence that this is the appropriate solution to a scientific problem. And while everyone seems to agree that the discussion is restricted to a set of the same instruments, same lot of reagents, same lot of control materials, etc., there is always the danger that any informal consensus may spread beyond the stipulated conditions. Once we accept common means for similar instruments, inexorably it will be more convenient if we adopt that common mean for all instruments.

Selecting SQC strategies for multiple instruments is a sufficiently difficult problem that the most recent CLSI C24-Ed4 guidance document [1] did not address this issue, stating that “although significant advances in QC thinking have occurred, there are still important areas that could benefit from additional developments, such as QC strategy design and implementation for laboratories with multiple instruments of the same type performing the same measurement procedures.” The C24-Ed4 guideline deliberately restricts itself to the case of a single method where the individual mean and individual SD are the basis for planning an SQC strategy.

Possible recommendations

The possible practices mentioned in this discussion could be classified as follows:

Use the individual mean and an individual SD
Use the individual mean and a common SD, (or group SD)
Use a common mean and anindividual SD
Use a common mean and a common SD (or group SD, clinical SD, fixed SD, consensus SD)

There is ambiguity about how these different parameters should be determined, whether by calculation or other mechanism. That particularly applies to the use of a clinical SD, fixed SD, and consensus SD. This means there is a lot of "flexibility" in their assignment. Usually that flexibility is leveraged to widen the SD more and more, driving down outliers and rejections. The use of a less and less evidence-based SD inevitably introduces additional uncertainty and difficulty unless there is a defined mathematical approach that relates the common mean and common SD to the observed individual means and SDs of all the instruments in the group.

These possible practices range from the design of a QC procedure for an individual method in an individual laboratory (using individual mean and individual SD) to the application for a "network" QC procedure for a regional service (using pooled mean and pooled SD). Evaluating those extremes may provide some insights into the problem.

Consideration of an individual method or instrument

C24-Ed4 is essentially the "Best Practice" guideline for the establishing the mean and SD, but also for defining the quality for intended use. By defining intended use and determining the actual performance of methods, laboratories can optimize the selection of the control rules, number of control measurements, and frequency of QC. C24-Ed4 defines an SQC strategy as “the number of QC materials to measure, the number of QC results and the QC rule to use at each QC event, and the frequency of QC events” and provides a “road map” for planning such SQC strategies. The best practices for multiple instruments should also consider all these factors, not just the assessment of the appropriate mean and appropriate SD for calculating control limits. Thus, the issue of common mean and SD is actually more complicated than it initially appears.

Consider these possible complications: a laboratory that sets 2SD control limits at 92 and 108 based on a pooled mean of 100 and a pooled SD of 4.0, but a particular instrument has an actual mean of 100 and an actual SD of 2.0. The lab thinks it is implementing a 1:2s rule and expects to have high error detection (as well as high false rejections). However, the actual control rule being implemented depends on the reality-based individual mean and actual individual SD, which in this case define a 1:4s rule (limit/SD=8/2=4s) and will provide very low error detection. After implementation, the lab may be happy to observe a low number of rejections using pooled means and pooled SDs, without realizing that their QC lacks the necessary error detection to assure the quality required for intended use. This gives the the laboratory a false sense of security precisely when they are nearly blind to errors. The moral of this scenario: even if a lab is using pooled means and SDs, it must maintain information on the actual mean and SD of each individual instrument to assess SQC performance. (Fortunately, most SQC software automatically collects the data and will provide on-going estimates of the individual means and SDs.) But the simplicity of a common mean or common SD is not accompanied by a simplicity in data monitoring - we have to track two sets of data now. One set of our books is "common" or simplistic. The other set of books tells us reality and tracks each instrument individually.

The above scenario may seem like an extreme, but in the field, I have witnessed this approach numerous times. It illustrates the need to assess the expected SQC performance on the basis of the actual situation. Similarly, the idea that it may be useful to implement an off-center control chart to enhance error detection needs to be critically evaluated in terms of the actual control rule and its ability to detect medically important errors. An off-center control chart makes it difficult to utilize counting rules (such as 8:x, or 10:x, 4:1s, 2:2s), which may limit a laboratory’s ability to effectively provide error detection with "Westgard Rules". Again, it had two drawbacks: reduced error detection, and possibly an increase in false rejection.

Assessing the error detection of an actual control rule requires knowledge of the rejection characteristics, which are often presented as power function graphs or power curves that describe the probability for error detection as a function of the size of the errors occurring. Such power curves are needed to select the control rules and numbers of control measurements, as well as to select an appropriate SQC frequency using Parvin’s MaxE(Nuf) risk model [2]. Power curves for the case of an individual method (and individual mean and individual SD) have been determined by simulation studies, documented in the scientific literature, and incorporated in some SQC packages.

Consideration of multiple instruments

The available power curves, however, may not be applicable for a network product in a regional service because an additional component of variation is expected between instruments and/or between laboratories. That additional component of variation will degrade the power curves and cause a loss in error detection. We documented that effect and its significant magnitude many years ago [3]. That additional component of variation makes it more difficult to optimize the design of SQC strategies using common means and common SDs and also will require more information to develop a rigorous planning approach that is applicable to multiple instruments. For example, the detection of random errors may be improved using a range rule having control limits based on the within run within instrument variation. Separate control charts for means and ranges (or SDs) might be needed to improve SQC, rather than using a single control rule as assumed in much of the Artery discussion.

We should also recognize that rigorous SQC planning following the C24-Ed4 road map has not been widely implemented in laboratories today. For example, a recent survey of leading academic laboratories revealed the widespread use of 2 SD control limits in spite of the known false rejection problem with that control rule [4]. The most obvious explanation is that laboratories somehow inflate the size of the SD to minimize the false rejections (perhaps by using pooled or group or clinical or fixed or consensus SDs???). Clearly, our best laboratories are not following the C24-Ed4 guidance for best SQC practices.

The difficulty in implementing a planning process for the risk-based SQC strategies recommended in C24-Ed4 is related to the complexity of the theory and calculations. Graphical tools such as the Sigma Run Size Nomogram are now available for free download and are practical for use by "common" laboratory scientists [5-7]. Even simpler tools, such as a Sigma Run Size Matrix and Westgard Sigma Rules with Run Size are also available [8]. Thus the first step in improving SQC practices in many laboratories should be to develop and implement an appropriate QC Design process, first for individual methods using individual means and individual SDs, then for multiple methods using appropriate means and SDs.

Best SQC practices today

Meanwhile, laboratories need guidance for handling this challenge, even when a rigorous solution is not yet known, or at least not yet scientifically documented. The motivation for using pooled means and pooled SDs seems to be simplicity and convenience. However, we must also recognize that patient safety may be a more important goal. If patient safety is the priority, then assuring the quality of test results and minimizing patient risk, then the possible SQC practices should be ranked as follows:

The safest practice at this point in time is to apply the CLSI C24-Ed4 guidance to individual methods on individual instruments using individual means and individual SDs. This approach adheres to the basic principles of SQC, the “Best Practices” recommended by CLSI, and is supported by existing SQC software.
The next best approach would be to implement SQC strategies based on individual means and pooled SDs that are calculated from the observed SDs for all instruments, followed by careful assessment of the actual QC rules and their error detection capabilities, and also a critical assessment of the frequency of QC events.
The least predictable and most uncertain approach would be the use of pooled means and pooled SDs, particularly the use of SDs that are described as clinical, fixed, or consensus.

Quality assessment vs quality control

Finally, the assessment of quality for a network laboratory system should be considered a separate issue. It may involve calculation of network means and network SDs to characterize the variation that may be observed for patient test results. Comparability of test results should preferably be based on fresh patient samples rather than control samples, but that may be practical only in small networks and small geographic regions. Using an EQA/PT approach and a defined analytical performance specification (quality requirement) for intended use, it would be possible to characterize the Sigma-metric for individual instruments, as well as the overall quality of the network system [9]. The important point is that such assessment of quality should not impose limitations on the assurance of quality by the SQC practices for individual instruments and methods in individual laboratories.

References

CLSI C24-Ed4. Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions. Clinical and Laboratory Standards Institute, 950 West Valley Road, Suite 2500, Wayne PA, 2016.
Parvin CA. Assessing the impact of the frequency of quality control testing on the quality of reported patient results. Clin Chem 2008;54:2049-54.
Westgard JO, Falk H, Groth T. Influence of a between run component of variation, choice of control limits, and shape of error distribution on the performance characteristics of rules for internal quality control. Clin Chem 1979;25:394 400.
Rosenbaum MW, Flood JG, Melanson SEF, et al. Quality control practices for chemistry and immunochemistry in a cohort of 21 large academic medical centers. Am J Clin Pathol 2018;150:96-104.
Bayat H. Selecting multi-rule quality control procedures based on patient risk. Clin Chem Lab Med 2017;55:1702-8.
Bayat H, Westgard SA, Westgard JO. Planning risk-based statistical quality control strategies: Graphical tools to support the new CLSI C24-Ed4 guidance. J App Lab Med 2017;2:211-221.
Westgard JO, Bayat H, Westgard SA. Planning risk-based SQC schedules for bracketed operation of continuous production analyzers. Clin Chem 2018;64:289-296.
Westgard JO, Westgard SA. Establishing evidence-based statistical quality control practices. Am J Clin Pathol 2019 (in press). DOI: 10.1093/ajcp/aqy159
Westgard JO, Westgard SA. A graphical tool for assessing quality on the sigma-scale from proficiency testing and external quality assessment surveys. Clin Chem Lab Med 2015;53:1531-6.

Tools, Technologies and Training for Healthcare Laboratories

Questions