Multirules and QC Validator
A discussion of how multirule procedures are implemented in the QC Validator program. These answers also apply to the latest version of our QC Design software, EZ Rules 3.
Also included is a discussion of what a "run" is and how to define it for today's modern random access analyzers. Plus patient averages and moving averages. (Scroll down past the first section)
This article is exclusively sponsored by LGC Technopath Clinical Diagnostics
Analytical runs, placement of controls, patient averages, and moving averages
Application of multirules in the QC Validator program
A user of the QC Validator program raised some interesting questions about how the control rules for multirule procedures are simulated and applied.
How are the different levels of control incorporated in the power curves?
For simplicity, we have made assumptions that method performance and analytical errors are the same at each level or each control material, so all control measurements will provide the same information about what's happening. In a sense, the performance and errors are normalized relative to the mean of the control material and considered much like an SDI type of value. Thus, the control rules can then be applied as if a single level of control material is being used and the level or material itself does not matter.
If these simplifying assumptions are not made, the process becomes much more complicated and requires that the power curves be determined for each specific set of conditions. Using a QC simulation program on a microcomputer, this might take several minutes for each power curve. Given the variety of rules and Ns of interest, it could take several hours or even a day to generate all the information needed to select appropriate control rules. Those conditions would vary to some extent from test to test, thus making the job of selecting QC procedures too laborious to be practical.
How does the number of control materials affect automatic QC selection?
The control rules considered during automatic QC selection depend on the total number of control measurements available, which in turn depend on the number of control materials. The number of control materials sets the minimum number of control measurements to be considered, but multiples of that number are also considered, usually corresponding to measurement each control material twice. For example, for 2 Materials, the total Ns considered by the default settings for the automatic QC selection criteria are 2 and 4; for 3 Materials, the total Ns are 3 and 6. Users can modify the selection criteria to eliminate the higher total Ns.
As an example, with automatic selection for 3 Materials, the autoselect feature will consider the 2of32s and 31s rules rather than the 22s and 41s rules. With a total N of 6, the autoselect feature will consider a 6x rule rather than a 10x rule.
How does the number of runs (R) affect the selection of control rules?
R here refers to the number of runs in which the control rules are applied (don't confuse this with the R in R4s which refers to a range control rule). Most control rules are applied within a single run, thus R usually is 1. However, with multirule procedures, some rules cannot be applied within a single run if the total N is too small. For example, with a 13s/22s/R4s/41s multirule procedure, only the 13s, 22s, and R4s rules can be applied in a single run (R=1) having 2 control measurements. If R were 2, then the 41s rule could be applied by combining the 2 control measurements from the current with the 2 control measurements from the previous run, hence, using the 41s rule to look back at earlier control data and improve the detection of systematic errors that persist from run to run until detected and eliminated.
The default setting for R for the automatic selection criteria is always 1, regardless whether the number of control materials are 1, 2, or 3. This means that multirule procedures will be selected only on their capabilities for detecting errors in the first run. However, in situations where ideal error detection cannot be achieved in the first run, users could change the setting for R with the objective of selecting a multirule procedure that would achieve the desired error detection with a set number of runs.
What are the implications of R in the simulation of power curves and the application of rules in a multirule procedure?
In generating power curves, the control rules that can be applied within a run are always used in the current run (R=1) if the total N is sufficent. If there is a rule that cannot be applied in the current run, then if R>1, that rule will be applied if RxN is sufficent. However, the rules that were applied in R=1 are not applied again because they would have already be applied to the earlier run.
In applying multirule procedures to laboratory testing processes, the rules that can be used within the current run should be applied first to decide the control status of the current run, then any rules that require more than one run are applied next to detect persistent errors that cannot be detected in a single run.
For example, if a 13s/2of32s/31s/6x multirule procedure were applied for N=3 and R=2, the 13s, 2of32s, and 31s would be applied to the 3 control observations in a single run and the 6x rule would be applied across runs to consider the 3 previous control observations as well as the 3 current observations. If the control observations in the previous run were +0.5SD, +2.2SD, and +1.7SD, and those in the current run are +2.1SD, 1.8SD, and 1.7SD, in that order, then a 6x rule violation has occurred. Note, however, that because 2of32s rule can be applied within the run, there is no violation of that rule, even though there there is a sequence of +2.2SD, +1.7SD, and +2.1SD that occurs across runs.
What rules are used across runs?
Rules such as 41s, 6x, 8x, and 10x are often applied across runs to detect persistent systematic errors. The 22s and 31s could also be used within a material and across runs if it is of interest to monitor systematic changes at one level, such as the high or low end of the working or linear range.
Why isn't the R4s rule used across runs?
Remember that the R4s rule is intended to detect random error, whereas the 22s rule is aimed at systematic error. If a systematic change occurred between two runs, perhaps due to a calibration problem, the R4s rule would respond to this systematic change as well as any change in random error. Because we want to use these rules to give us some indication of the type of error occurring (which would help us in trouble-shooting the method), we prefer to use the R4s rule only within a run and detect between-run systematic changes with other rules such as the 22s or 41s. Not everyone agrees with this and some analysts choose to apply the R4s rule across runs, so be sure to look carefully at how the R4s rule is being applied in your own laboratory.
What about using the R4s across materials within a run?
We do apply the R4s rule across materials within a run, even though it could be argued that a systematic change might occur at one level and not at the other, thus in principle the rule should not be used across materials, particularly if control limits were being calculated on the basis of within run SDs rather than a more long-term SD that represents the performance expected over many runs. Again, some judgment is needed here and you need to carefully define how to apply the rule in your method protocols, or carefully read the laboratory procedure manual to understand its intended use in your laboratory.
Does the R4s rule require consecutive control measurements?
No, the R4s rule is used to consider the highest and lowest observations in a group of control measurements, thus there is no requirement for consecutive observations like with the 22s or 41s rules. "Consecutiveness" is helpful for observing a shift in the mean of a distribution, i.e., systematic errors, whereas random error is observed by looking for the width of distribution which is more easily observed from the range of a group of observations.
Is the R4s rule violated if one control is +2.4SD and another is -1.8SD?
No and yes, depending on whether the rule is defined as a qualitative counting rule or a quantitative range rule.
The original application of the multirule procedure was to count the number of measurements exceeding certain limits, therefore, it is a counting type of algorithm. Does 1 measurement exceed a 3s limit, do 2 in a row exceed the same 2s limit, does 1 in a group exceed the +2s limit and another exceed the - 2s limit, do 4 in a row exceed the same 1s limit, and do 10 in a row fall on one side of the mean? If R4s is used as a counting rule, observations of +2.6SD and -1.8SD do not represent an R4s violation.
If you want to be more quantitative and actually calculate the difference between the highest and lowest observations, then it is possible to use a quantitative range rule such as R0.05 or R0.01, in which case an observed range of 4.4SD would be a violation if N were 2-4 per run. These rules are usually used together with mean rules, which is the original QC recommendation developed by Shewhart and still widely used in industry today. QC Validator contains power curves for mean/range procedures and the automatic selection criteria can be modified to select these procedures if they can be implemented in your laboratory.
Another user from the Netherlands provides a series of questions about how to define an analytical run, placement of controls within a run, and the use of patient data and moving average QC procedures.
- How is a "run" defined for automated random access analyzers?
- Does it make any difference whether a pair of control materials are analyzed immediately, one after the other, or in random order, separated by time, say one in the morning and one in the afternoon?
- Is it important to include patient averages to assure quality and detect preanalytical as well as analytical factors that may not be observed with control samples?
- How do QC procedures that make use of moving averages compare to multirule procedures?
How is a "run" defined for automated random access analyzers?
For instance, a run could be all patient samples between two controls, one full tray, or one shift. It makes quite a difference if results are considered validated only after the last controls have been analyzed.
We touched on a similar question earlier and acknowledged that "this is a tough question and we don't have an easy answer." That answer is still true, but it may be useful to discuss this a bit more.
The above question also implies that the definition of a run includes the practice of "bracketing" patient samples by controls. It is important to understand that the practice of bracketing dates back to early continuous flow analyzers that were not very stable and tended to drift over a rather short period, such as the time required to analyze a tray of samples. It became standard practice to place controls at the beginning of the run - right after the calibrators - and at the end of the samples or the end of the tray, whichever came first. If the controls at the end of the run were out, it was usually due to a problem of drift. The results on the patient samples had changed significantly from the beginning of the run, therefore, it was sound practice to repeat the samples in between the controls.
Today's fourth generation analyzers have quite different operating characteristics (see discussion of Future Directions in Quality Control), which suggests that the practice of "bracketing" a group of samples with controls may not be appropriate. Better guidance for defining a run is provided by NCCLS [Document C24-A. Internal Quality Control Testing: Principles and Definitions. National Committee for Clinical Laboratory Standards, 940 West Valley Road, Suite 1400, Wayne, PA 19087-1898], which provides the following definitions:
- "Analytical run: For purposes of quality control, an analytical run is an interval (i.e., period of time or series of measurements) within which the accuracy and precision of the measuring system is expected to be stable. Between analytical runs, events may occur causing the measurement process to be susceptible to variations that are important to detect.
- "Manufacturer's Recommended Run Length (MRRL): The manufacturer should recommend the period of time or series of measurements within which the accuracy and prrecision of the measuring system, including instruments and reagents, are expected to be stable.
- "User's Defined Run Length (UDRL): The user should define the period of time or series of measurements within which validation of the measurement process is important based on the stability, reporting intervals of patient results, cost of reanalysis, work flow patterns, operator characteristics, or similar nonanalytic considerations that are in addition to the expected stability of the accuracy and precision of the measuring system."
These statements suggest that the run be defined in units of time or units of samples based on the expected stability of the method, the size of changes that would be important to detect, and the changes in conditions that make the method susceptible to problems. While the maximum period of a run is defined by the manufacturer, the user is responsible to assess laboratory factors that may require a shorter run, thus definition of a run is a shared responsibility of the manufacturer and the user. Regulations sometimes set another maximum, such as CLIA's 24 hour period as the maximum run length. In addition, it should be recognized that manufacturer's seldom deal with the issue of what size of changes would be important to detect, thus the user is really left with the responsibility of defining both the quality requirement and the run length.
With today's high-stability, high-precision, random access analyzers, it often makes sense to define run length in units of time. It also is practical to analyze controls initially, before patient specimens, in order to assure the system is working properly before starting patient analyses, then to monitor periodically to check performance. This implies a multistage QC design, with high error detection during startup and low false rejections during monitoring.
With certain electrode type analyzers where exposure to specimens may in some way "build up" and cause problems, it may make sense to define the run length as a certain number of specimens.
Does it make any difference whether a pair of control materials are analyzed immediately, one after the other, or in random order, separated by time, say one in the morning and one in the afternoon?
In selecting control rules and numbers of control measurements, our QC planning approach is to determine what control rules and how many control measurements are necessary to assure that an out-of-control signal will be obtained if medically important errors are present. This means if N=2, those two measurements are needed to determine the control status of the method. If you wait till the afternoon to get the second measurement, you won't know if the method is working properly until then; meanwhile, you may have reported a lot of patient results. Again, with modern instrument systems, we would argue for a multistage QC procedure having a startup design that will assure the necessary quality is being achieved before analyzing any patient samples, then spacing controls over time to look for changes in performance. It makes sense that controls spaced out over the course of a run would provide the best potential for picking up a problem as early as possible.
Is it important to include patient averages to assure quality and detect preanalytical as well as analytical factors that may not be observed with control samples?
For tests where stable control materials are available, patient data QC procedures, such as Average of Normals (AON), usually provide a secondary and complementary method for monitoring method performance. They may be useful in picking up preanalytical problems that reflect improper processing and storage of specimens, as well as analytical problems that do not show up in the same way on control materials. In general, AON procedure are more complicated to design because additional factors need to be considered, such as the ratio of the population to analytical SDs and the truncation limits chosen [see Cembrowski GS, Chandler EP, Westgard JO. Assessment of 'Average of Normals' quality control procedures and guidelines for implementation. Am J Clin Pathol 1984;81:492-499]. They also tend to be more difficult to implement and are impractical in many laboratory situations because of the high N that is needed to provide the desired error detection.
However, power curves can be determined and then we can apply the same QC selection and design methodology using OPSpecs charts and/or critical-error graphs. We recently illustrated how to do this and recommended using AON as a way to measure run length in automated process control systems [see Westgard JO, Smith FA, Mountain PJ, Boss S. Design and assessment of average of normals (AON) patient data algorithms to maximize run lengths for automatic process control. Clin Chem 1996;42:1683-1688].
How do QC procedures that make use of moving averages compare to multirule procedures?
QC procedures that employ a moving averages would be expected to perform similarly to a traditional mean rule, which is expected to have at least as good error detection as multirule procedures and possibly even better. We provide power curves for traditional mean/range QC procedures in the QC Validator program, along with power curves for a variety of multirule procedures. Parvin has recommended a multimean type of QC procedure that should have better error detection than a traditional multirule procedure [see Parvin CA, Comparing the power of quality-control rules to detect persistent systematic error, Clin Chem 1992;38:356-363].