Tools, Technologies and Training for Healthcare Laboratories

Precision QC - the latest adjectivization of Quality Control

Another year, another new model. Almost regularly as the planet circles the sun does a new model emerge in the literature. The latest is called "Precision QC." Have we been practicing "Imprecision QC" for the last 60 years? Do we need to recreate the QC wheel again?

Planning Risk-Based SQC Strategies: The New Precision QC Model (PQC)

James O. Westgard, PhD
February 2023

In a recent web essay, I discussed the evolution of Statistical QC procedures and described how process control with statistical rules evolved to patient-focused quality control for detection of medically important errors by selecting the control rules and number of control measurements to provide a high probability of error detection (to reduce false negative results) and a low probability for false rejection (to reduce false positive results). Next, needing a specification for frequency of QC, Parvin developed a patient risk model [1] to minimize the reporting of erroneous patient results by optimizing the run size between QC events. Run size is particularly important for the operation of automated continuous production analyzers that are the workhorses in our high-volume laboratories. With that background, CLSI formalized an SQC planning “roadmap” for developing risk-based SQC strategies that define the control rules, number of control measurements, and the frequency of QC events [2].

Now a new series of papers about risk-based SQC have been published in JALM [3-5]. Schmidt and colleagues present a new risk model called “Precision QC” (PQC). The authors have also published a formal optimization study of the PQC model in CCLM [6] that provides similar information.

Modification of Parvin’s Risk Model

The first JALM paper discusses the Parvin patient risk model and challenges the assumption that when an in-control (IC) state changes to an out-of-control condition or state (OOC), the OOC state will exist until detected and corrected. The PQC model assumes that an OOC state is not a stable state, but can transition back to an in-control state (IC) or to a larger OOC state. Hence, Parvin’s model is described as NOOCTA (No Out-Of-Control Transition Allowed) and Schmidt’s PQC mode as OOCTA (Out-of-Control Transition Allowed).

It is somewhat difficult to rationalize how such transitions from OOC to IC could happen without some operator intervention, but mathematically it is easy to postulate such transitions. If they do occur, it suggests a random type of error rather than the systematic type of error that is generally the focus of QC for automated analyzers. If it is a random error, then the design of the QC procedure may also need to consider different decision criteria, which in turn would require different power curves to describe the rejection characteristics for random errors.

In both models, the patient risk of false negative results can be described by an Expected Number of unreliable final patient results, which Parvin called E(Nuf). Parvin’s model assesses E(Nuf) for a wide uniform distribution of systematic shifts. As the shifts get larger, E(Nuf) increases but so does the error detection capability of the QC procedure. At some point, the increased error detection offsets the increases in E(Nuf), leading to a maximum, called MaxE(Nuf), that is the critical error condition for managing patient risk. In contrast, PQC model shows a continuing increase of E(Nuf) as the size of errors increases, leading to a higher estimate of patient risk, up to ten times greater.

Differences between models are largest at low Sigmas, and Schmidt chooses to characterize two cases, one with a Sigma of 2.0 and the other a Sigma of 4.0. This is an unfortunate choice because methods with low Sigmas are not generally considered acceptable; e.g., the industrial rule of thumb is that Sigma must be 3.0 or larger for routine production; for medical tests, Sigma should be greater than 3.5 to provide a manageable process. In addition, with today’s automated analyzers, test performance generally falls in the Sigma range from 4 to 6, which means that higher Sigmas are more relevant for study than low Sigmas.

Thus, there might be some concerns with this new risk model. The assumption that an OOC state can revert to an IC state on its own without operator intervention needs some empirical verification. The demonstration of differences between the models at low Sigma performance is not of any practical value. While the authors suggest that a 2.0 Sigma method provides “borderline” performance and a 4.0 Sigma method as “relatively good capability,” they acknowledge that differences in performance between the models is more apparent at low Sigmas, which seems to be the real reason for their choice. Information about performance for high Sigma methods is needed.

Parameters affecting the estimate of risk

The second paper in the JALM series provides a theoretical framework for developing risk models and provides guidance for both simulation and mathematical implementation, with much of this information being providing in supplementary material. Of more interest here is the identification of the important parameters that should be studied to provide a more comprehensive assessment of patient risk [4].

These parameters include:

  • Candidate QC procedures and related power curves
  • Frequency or probability of systematic shifts
  • Size and distribution of systematic shifts
  • Loss function

Candidate QC procedures. A variety of QC procedures have been studied and recommended for medical laboratories, as discussed in the CLSI C24-Ed4 document on guidelines for SQC [2]. Power curves that describe the probability for rejection for different size errors can be generated by computer simulation and may also be found in the scientific literature [7,8]. The same information is required by both models.

Frequency or probability of systematic shifts. Explicit definition of this critical parameter is required by the PQC model. This information is not generally known, but in principle might be estimated from long-term QC records. Qualitatively, laboratories may be able to classify methods based on their relative stability, e.g., those known to have frequent problems vs those known to have few problems. Traditional risk assessment often employs scales from 1 to 5 or 1 to 10 to estimate the probability of occurrence of errors. Regardless of the difficulty in knowing the probability of systematic shifts, the new risk model is promoted because it requires an explicit definition of this characteristic.

Size and distribution of systematic shifts. Similarly, the new risk model allows investigation of the effects of the size of shifts and their distribution. Again, that information is currently unknown and may be even more difficult to determine than the frequency or probability of systematic shifts. By comparison, the Parvin model characterizes performance for the size of shift for the maximum expected number of unreliable final patient results, in effect the worst case scenario called MaxE(Nuf). Our observation is that the MaxE(Nuf) shift is generally a bit smaller than the critical systematic error (∆SEcrit) which can be calculated from the allowable Total Error (TEa) for the test and the imprecision and bias observed for the measurement procedure. Thus calculation of ∆SEcrit provides a good approximation of the size of systematic error that needs to be detected by the QC procedure and can readily be related to the Sigma quality of the testing process, which is now widely used to assess QC performance and plan/design new QC procedures.

Loss function. This parameter describes the trade-off between the risk of false negative results vs false positive results. The new model requires an explicit definition of a loss function that relates FN risk or cost vs FP risk or cost. There are many ways to evaluate the “tradeoff” between FP and FN results, thus again this may be difficult to quantify. The Parvin model only considers the risk of false negative results in terms of E(Nuf), which is also used for FN risk in the new PQC model.

Laboratory scientists have not ignored this issue of tradeoffs between FN risk and FP risk, but generally have not provided an explicit risk or cost comparison for FN and FP results. Instead, they generally consider that the cost of a single false negative result to be catastrophic for the health of a patient, whereas the cost of a false positive is not likely as large because there is an opportunity for intervention when the result is inspected and interpreted for patient treatment. Therefore, in selecting SQC procedures, labs attempt to maximize the probability for detecting medically important errors while minimizing the probability for false rejections. Thus there is an intuitive and implicit loss function that weighs FN over FP results and underlies current QC planning processes.

A more specific example of a practical loss function is use of the industrial quality-costs model to maximize the test yield of a process (productivity) and minimize the defect rate (improve quality) [8]. Recall that quality-costs consist of preventive costs, appraisal costs, internal failure costs, and external failure costs. This model has been used to demonstrate that the costs allocated to the laboratory alone, e.g., repeat analyses (internal failure costs) may pay for increased QC (appraisal costs) [9,10]. Thus improved quality (lower defect rate) may be achieved while increasing productivity (higher test/process yield), demonstrating Deming’s theory that improved quality can also lead to reduced costs [8, chapter 1].

Implicit test/method analytical/quality characteristics??? While the authors identify 4 critical parameters here, they add another - the allowable Total Error (TEa, the quality requirement for the test) in the applications paper [5]. They express TEa as a multiple of the method SD, which is assigned a value of 1.0, allowing them to use the ratio TEa/SD as a value for Sigma. It would be better to explicitly define method quality characteristics to include TEa, bias, and imprecision in the concentration units of the test, as well as the medical decision levels (XC1, XC2 etc) where performance is critical. More complete information could be provided by specifying the precision profile for the test as well as the regression line for assessing method bias at the medical decision levels in order to calculate Sigma performance explicitly. Thus, for all the criticism that laboratory scientists have not provided explicit statements of critical parameters of the risk model, the new PQC model is likewise incomplete and needs explicit statements of the test/method analytical/quality characteristics.

Impractical Applications

For the 3rd paper, the authors stated objective is to assess whether the new theoretical PQC framework can be applied in practice. They select 2 analytes – Cadmium and Carbohydrate Deficient Transferrin (CDT). While they acknowledge these are esoteric examples, they rationalize their choice based on the difference in Sigma performance – Sigma of 4.0 for Cd vs Sigma of 1.4 for CDT. While this range of Sigmas may be of interest mathematically to demonstrate the model, neither application is useful for teaching laboratory personnel about the PQC model. In most labs, those tests will be send-outs and of no further interest. Examples such as cholesterol, glucose, and HbA1c would be of much more relevance to most laboratory analysts. In addition, a 1.4 Sigma test should never be implemented in a medical laboratory because it would produce an error rate of 16% when operating in-control. Again, the relevant range of interest for Sigma performance is between 3.5 and 6.0 Sigma.

In describing the risk of FN and FP results, the authors illustrate the use of 3 different loss functions or trade-off curves:

  • FP Rate vs FN E(Nuf);
  • FP Cost vs FN E(Nuf);
  • Total Cost vs Control limit.

When Sigma is 1.4, these tradeoff curves are quite different for the Parvin vs PQC models, but when Sigma is 4.0, the differences are much smaller. One would expect them to be even smaller for higher Sigmas that are more representative of method performance for the automated analyzers in use in most high-volume medical laboratories.

At the end, they conclude that the PQC model may be best applied to consider “what if” scenarios, rather than determining specific QC control limits.

“We do not believe the PQC model should be used to determine control limits. Rather it should be used as a decision support tool in “what if” analysis. For example, if errors from assay X are more costly than those from assay Y, then the model would suggest tighter control limits on assay X.”

Actually, that’s not necessarily true. That neglects the Sigma quality of the method. It is possible that methods with higher Sigmas may be less costly and still be managed with more relaxed control limits. Even though the authors include TEa as a critical characteristic in the two example applications of the model, it is not clear that method performance is adequately represented relative to the quality required for intended use.

Finally, it seems evident that this series of papers is written more for mathematical interest than for practical applications to define laboratory SQC strategies (control rules, number of control measurements, frequency of QC events). This is clearly stated in the optimization study published in CCLM [5].

“Our model is not intended to be used directly. The mathematical calculations are complex and would most likely be carried out by a decision support system similar to Bio-Rad’s Mission Control… Also, we should state that this paper is not directed toward practitioners. Rather, it is directed towards researchers with expertise in the mathematical analysis of QC methods who can critically evaluate our approach and, hopefully, improve upon it.”

Meanwhile, back in the laboratory, what should you do?

Many US laboratories continue to use 2 SD control limits across all tests on multi-test continuous production processes, according to a survey of automated chemistry and immunoassay instruments in large academic laboratories [11]. They have yet to adopt Parvin’s patient risk model, even though practical graphical and calculator tools are currently available. While there is much interest in implementing Patient-Based Real-Time Quality control procedures (PBRTQC), planning and design of such procedures may be even more difficult. Thus, SQC practices in US laboratories tend to be stagnant, even though laboratories generally recognize that use of 2SD control limits and one-size-fits-all QC are both sub-optimal.

What’s the point?

As stated by Farnsworth and Lyon [12] in an editorial accompanying the JALM series, laboratory QC is a risky business. Risky in a positive sense that risk models can improve the planning and design of SQC strategies, particularly by providing a rationale for specifying QC frequency in terms of the run size between QC events. Risky in a negative sense that risk models are complicated and proper use requires considerable mathematics expertise, as well as critical judgments to define important parameters. For example, the study of a 1.4 Sigma method reflects poor judgment related to the Sigma performance range that is of interest for medical laboratory applications.

Of course, the development of more comprehensive patient risk models would be valuable if there were practical tools that supported applications. Keep in mind that a decade elapsed between the development of Parvin’s model and the availability of practical tools for implementation. The increasing complexity of these risk models makes it difficult for laboratory scientists to understand the theory, validate the behavior of the model for medical laboratory applications, and demonstrate the usefulness of the model for designing/planning SQC strategies for service laboratories. Because of this, laboratories are likely to be slow to accept, adopt, and apply new more complicated models for risk-based SQC strategies.

Finally, recall the advice of the famous statistician George Box, who was Chairman of the Statistics Department at the University of Wisconsin. “All models are wrong, but some are useful.” In that context, Parvin’s model is useful for planning SQC strategies that define the control rules, number of control measurements, and the frequency of QC events for the continuous production analyzers found in high-volume automated medical laboratories. In contrast, the new PQC risk model is not even intended to define practical SQC strategies, nor provide any tools to support practical applications. There’s no “what if” about it.

References

  1. Parvin CA. Assessing the impact of the frequency of quality control testing on the quality of reported patient results. Clin Chem 2008;54:2049-2054.
  2. Schmidt RL, Moore RA, Walker BS, Rudolf JW. Risk analysis for quality control. Part 1: The impact of transition assumptions in the Parvin model. J Appl Lab Med 2023;8:14-22.
  3. Moore RA, Rudol JW, Schmidt RL. Risk analysis for quality control. Part 2: Theoretical foundations for risk analysis. J Appl Lab Med 2023;8:23-33.
  4. Schmidt RL, Moore RA, Walker BS, Rudolf JW. Risk analysis for quality control. Part 3: Practical application of the precision quality control model. J Appl Lab Med 2023;8:34-40.
  5. Schmidt RL, Moore RA, Walker BS, Rudolf JW. Precision quality control: A dynamic model for risk-based analysis of analytical quality. https://doi.org/10.1515/cclm-2022-1094.
  6. CLSI C24-Ed4. Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions. Clinical and Laboratory Standards Institute, 950 West Valley Road, Suite 2500, Wayne PA, 2016.
  7. Westgard JO, Groth T. Power functions for statistical control rules. Clin Chem 1979;25:863-9.
  8. Westgard JO, Barry PL. Cost-Effective Quality Control: Managing the quality and productivity of analytical processes. Washington DC:AACC Press, 1986.
  9. Westgard JO, Hyltoft Petersen P, Groth T. The quality-costs of an analytical process: 1. Development of quality-costs models based on predictive value theory. Scand J Clin Lab Sci 1984;44(suppl 172):221-7.
  10. Westgard JO, Hyltoft Petersen P, Groth T. The quality-costs of an analytical process: 2. A test yield formulation of the predictive value quality-costs model. Scand J Clin Lab Sci 1984;44(suppl 172):228-36.
  11. Rosenbaum MW, Flood JG, Melanson SEF, Baumann NA, Marzinke MA, et al. Quality control practices for chemistry and immunochemistry in a cohort of 21 large academic medical centers. Am J Clin Pathol 2018;150:96-104.
  12. Farnsworth CW, Lyon OAS. QC a Risky Business: The development of novel risk-based tools for assessing QC methods. J Appl Lab Med 2023;8:3-6.