Tips on Managing the Quality of Immunoassays

R. Neill Carey, Ph.D. Clinical Chemist Peninsula Regional Medical Center Salisbury, MD

Dr. R. Neill Carey shows how to derive quality requirements for immunoassays from the proficiency testing criteria. Theophylline, cortisol, thyroxine, and folate examples are illustrated.

Where to find specifications for Allowable Analytical Errors
CLIA Guidelines
Other Guidelines
Other Sources
Assay Precision and Bias
Practical Tools for Deciding What Rules to Use
QC for Immunoassays: Two Control Materials vs. Three
Manufacturer's Control Ranges vs. User's Calculated Limits
Single vs. Multistage QC Designs
Single vs. Multirule QC Procedures
Implementing Appropriate QC Procedures
Verify Performance Over Time
Example Analytes:
Theophylline
Cortisol
Thyroxine
Folate
References
Biography: R. Neill Carey. Ph.D.

There are only a few essentials for designing the quality control procedure for any assay. First, how good does the assay need to be; how much error is allowable? Second, what is the actual analytical performance of the assay; what is its precision and what is its bias relative to its peer group on proficiency testing or relative to some important reference method? If bias can be minimized, then allowable error and the assay’s precision ultimately define the minimum number of quality control samples that must be run and the decision rules for interpreting their results.

Where to Find Specifications for Allowable Analytical Errors

Guidelines for precision and accuracy can be derived from proficiency testing requirements and/or from requirements for medical usefulness. Since successful performance on proficiency testing is almost universally required to maintain laboratory accreditation, specifications derived from proficiency testing requirements define the minimum standards for analytical performance. For some analytes, however, requirements based on medical usefulness are more stringent, and require a higher level of quality.

CLIA Guidelines

For laboratories licensed to do testing in the United States, the CLIA 88 specifications for acceptable performance [1] are paramount. Acceptable performance on proficiency testing is defined in the format of total allowable error. In its simplest form, total allowable error is a fixed percentage. For theophylline, for example, total allowable error is target value +/- 25%. The target value is the group mean for the method. The group may be specific for a particular instrument and method if there are sufficient participants in a proficiency testing event. Total allowable error may also be defined as a fixed concentration. The total allowable error for thyroxine is target value +/- 20% or 1.0 ug/dL, whichever is greater. However, for many immunoassay analytes, acceptable performance is defined as target value +/- 3 SD. Here SD is defined as the group standard deviation for the participant’s peer group.

Other Guidelines

Sometimes, the target value for a particular analyte may be defined by a particular method. For example, the German specification for acceptable performance for thyroxine is the isotope dilution gas chromatographic mass spectrometric method mean +/- 24% [2], which is a total error specification.

A European working group [3] has published an extensive compilation of quality specifications based on medical usefulness criteria. These physiologically based specifications are derived from the interindividual and intraindividual variations for each analyte, as originally suggested by Cotlove [4]. The same formula is used for all analytes. Specifications are for percent CV and percent bias. Westgard, Seehafer, and Barry have shown how European specifications for precision and bias compare with CLIA specifications for allowable total error for routine Chemistry analytes by using OPSpecs® charts [5]. Readers interested in this comparison for immunoassay analytes could use the same approach.

Other Sources

Specifications may have also been developed for specific analytes. For thyrotropin (TSH), Klee and Hay [6] state that the overlap between the variation of the lower normal-value limit and the assay detection limit should be less than 1%. Spencer, Takeuchi, and Kazarosyan state that the total CV of a "third generation" TSH assay is defined to be 20% in the 0.01 - 0.02 mU/L range [7].

Assay Precision and Bias

The relationship between allowable error and the precision and bias of a particular assay determines how stringent the quality control procedure must be. The standard deviation of the assay must be known fairly accurately since quality control rule performance is defined by the standard deviation. An NCCLS evaluation protocol provides a procedure to estimate the precision of an assay reliably [8]. If routine quality control data are used, the standard deviation should be calculated over two or more months time in order to let the data represent factors which affect precision over time. Ideally several reagent lots and recalibrations should be represented.

If a laboratory has significant bias relative to the peer group in proficiency testing or a reference method which was used to determine medical decision concentrations, there will be less tolerance of increased analytical errors. It is essential to minimize bias to the extent possible. Bias can be detected by averaging the individual bias values of PT samples over several PT cycles. More rapid estimates of bias can be made by comparing the laboratory's means and standard deviations on control materials to the peer group means and standard deviations when the control material manufacturer provides an interlaboratory data comparison service.

Practical Tools for Deciding What Rules to Use

Control procedures can be developed after allowable error has been specified, and the precision and bias of the assay have been measured. The ideal control procedure has high probability of detecting critical analytical errors. It also has low probability of rejecting runs falsely when performance is actually acceptable.

There is an orderly process for determining the optimal control procedure:

Determine the size of error which must be detected to minimize the probability of failing PT. This error is termed the critical-error [9]. For systematic error (bias), it is the size of bias that would lead to PT failures, measured in units of standard deviations. If the standard deviation is small relative to the allowable error, the bias can be large (in units of standard deviations) before bias is large enough to cause PT failures. High critical-error implies that QC procedures with low probabilities of false rejection will be sufficient.
Use graphical tools or tables to select combinations of rules and numbers of controls that will have high probability of detecting the presence of critical-error in an analytical run, and simultaneously have low probability of falsely rejecting. These tools include OPSpecs charts[10], critical-error graphs, power function curves [11] and error grids [12].
Software is available to automate this process. The Westgard EZ Rules ® 3 software is a useful tool for deciding which rules to use for a particular assay. The user inputs the assay’s CV, percent bias (if known), allowable error, and the number of controls. The software has the capability to select the most appropriate rules automatically based on selection criteria that can be set by the user. Candidate rules can also be selected manually. Graphs are available in the form of OPSpecs® charts and power function charts. QC Validator software was used in the practical examples included here.

When these tools are not available, Westgard and Burnett [13] have a simple criterion for deciding when multirule QC procedures with 2 or 3 controls will provide P_ed of 50% or more for errors large enough to cause PT failures:

SD_analytical < 1/4 Allowable Error

When 2 or 3 controls are used, and there is no constant bias, either multirule procedures or the 1_2.5s rule will provide acceptable QC for most assays when the assay’s standard deviation is less than one quarter of the allowable total error.

QC for Immunoassays: Two Control Materials Vs Three

Quality control of immunoassays is generally done with three levels of control because immunoassay dose-response curves are nonlinear and require multiple calibrators. Since medical decisions are made at several concentrations for many analytes, the usual "normal and abnormal" controls often do not provide a high enough comfort level. Many analytes have multiple decision levels. There are three for TSH: below the lower end of the normal reference range (for diagnosing hyperthyroidism), just above the normal reference range (borderline hypothyroid), and elevated. The third control material also increases sensitivity of the quality control procedure to increased analytical errors. There are situations, however, in which two controls are sufficient.

Manufacturer’s Control Ranges Vs User’s Calculated Limits

Some instrument manufacturers market analyte-specific control products. These materials are useful after calibration to verify the calibration and to provide more control replicates when they are needed for effective control of the method. Some manufacturers produce these controls with relatively wide target ranges rather than assaying them and providing lot-specific mean concentrations or peer data comparison. To set up initial quality control ranges with these materials, the manufacturer’s assigned range must be divided by 6 to estimate the standard deviation. Otherwise the ranges are too wide.

Single Vs Multistage QC Designs

Sometimes, more than 3 control measurements are required to achieve satisfactory control for analytes on automated systems. Multistage control procedures may be effective in these situations. Most automated systems are stable in routine operation. When problems do occur, it is usually following new lot calibration, recalibration, or major maintenance. On these occasions additional controls can be run to increase sensitivity to errors. The manufacturer’s control materials may be used as supplemental controls, or the laboratory’s usual multilevel controls may be run in duplicate. When the laboratory’s usual multilevel controls demonstrate a shift, the manufacturer’s control materials can help determine whether there is a calibration problem or a problem with the matrix of the multilevel controls (some control materials may behave differently than patient samples).

Single Vs Multirule QC Procedures

To achieve satisfactory error detection for immunoassay procedures, it is often necessary to use quality control rules with probability of false rejection (P_fr) of 2% to 4%. Typically for three controls, rules such as the 1_2.5s rule or 1_3s/(2of3)_2s/R_4s multirule are appropriate. For computerized laboratories multirule procedures may be preferable because the technologists can tell which kind of error is occurring (systematic vs. random) according to which rule is violated. When 2_2s rule failures occur, the cause is virtually always systematic error; something has caused the method to shift up or down. Failures of the 1_3s rule may be caused by either random or systematic errors. When 1_3s rule failures occur simultaneously with 2_2s rule failures, there is systematic error. Generally, 1_3s rule failures represent more serious errors. They really get your attention.

Implementing Appropriate QC Procedures

The ideal way to implement quality control procedures is through the laboratory’s computer system. The laboratory computer terminal is where patient results are entered, and where the instrument interfaces are controlled. Control samples can be intermixed with patient samples and handled fairly easily. Control data can be closely linked to patient data. The screen sessions can be universal throughout the laboratory, simplifying implementation and training.

Some compromises are necessary in order to take advantage of the laboratory computer systems QC capabilities. In general, fewer rules are available on central laboratory computers than on PC software specifically designed for QC. The QC chart graphics on some laboratory computers are not as good as those on specialty PC software. QC for qualitative tests is difficult to implement in some laboratory computer systems. Documenting changes of QC ranges and rules is cumbersome in some systems. The number of significant figures of QC data may be limited by the interface between the computer and the instrument, diminishing the performance of quality control procedures [14].

Most automated analyzer instrument systems have QC software. Their capabilities vary widely. For small laboratories, it may be practical to use the software of the analyzer with the most methods and as the laboratory’s QC software for as many other methods as possible. If this instrument’s software will not accept QC data from other methods, a hybrid with paper QC for the other methods may be acceptable.

Verify Performance Over Time

Control range updates are frequent during the first few months with a new assay or new lots of controls. Standard deviations widen over time, reducing the sensitivity of QC procedures to increased error. The impact of apparent biases and changes in standard deviation should be considered as QC ranges are updated. It is possible for these updates to slowly change an assay’s QC procedures into something quite different from what was originally set up. Occasionally it may be necessary to revalidate the QC procedure’s performance, particularly in situations where the original rule selection was marginal. Some assays require vigilance and frequent recalibrations to keep bias and precision within acceptable limits.

Example Analytes

Some examples are included to demonstrate how to apply these tools and thought processes. In each case, the assay is stable, and has a very low frequency of large errors (< 2%). These are real assays from the laboratory, and the QC data represent precision over a period of about 4 months. PT specifications were used for allowable error.

References

Federal Register February 28, 1992;57(40):7002-7186.
Steele BW. Thyroxine testing: can we find truth? Supplement to Summary Report.College of American Pathologists Survey Set 1996 K-C, February, 1997.
Fraser CG, Hyltoft Petersen P, Ricos C, Haeckel R. Quality specifications. In Haeckel R, ed. Evaluation methods in laboratory medicine. Weinheim:VCH, 1993;87-99.
Cotlove E, Harris EK, Williams GZ: Biological and analytic components of variation in long-term studies of serum constituents in normal subjects. III. Physiological and medical implications. clin Chem 1970;16:1028-1032.
Westgard JO, Seehafer JJ, Barry PL. European specifications for imprecision and inaccuracy compared with operating specifications that assure the quality required by US CLIA proficiency testing criteria. Clin Chem. 1994;40:1228-1232.
Klee GG, Hay ID. Sensitive thyrotropin assays: analytic and clinical performance criteria. Mayo Clin Proc 63;1123-1132, 1988.
Spencer CA, Takeuchi M, Kazarosyan M. Current status and performance goals for serum thyrotropin (TSH) assays. Clin Chem 1996;42:140-145.
National Committee for Clinical Laboratory Standards. Evaluation of precision performance of clinical chemistry devices - Second Edition; Tentative Guideline. NCCLS document EP5-T2 (ISBN 1-56238-145-8). NCCLS, 771 East Lancaster Avenue, Villanova, Pennsylvania, 19805, 1992.
Westgard JO, Barry PL. Cost-effective quality control: managing the quality and productivity of analytical processes. Washington, DC: AACC Press, 1986:47-49.
Westgard JO. Charts of operationing process characteristics ("OPSpecs charts") for assessing the precision, accuracy, and quality control needed to satisfy proficiency testing criteria. Clin Chem 1992;38:1226-33.
Westgard JO, Groth T. Power function graphs for statistical control rules. Clin Chem 1979;25:394-400.
Westgard JO, Quam EF, Barry PL. Selection grids for planning quality control procedures. J Clin Lab Sci 1990;3:271-8.
Westgard JO, Burnett RW. Precision requirements for cost-effective operation of analytical processes. Clin Chem 1990;36:1629-1632.
Cembrowski GS, Carey RN. Laboratory quality management: QC & QA. Chicago: ASCP Press, 1989;122-131.

Biography: R. Neill Carey, Ph.D.

Neill Carey is Clinical Chemist at Peninsula Regional Medical Center in Salisbury, Maryland. He received his Ph.D. in analytical chemistry from Duke University, although he completed his thesis research at the University of Wisconsin-Madison after his major professor moved there. In 1972 he joined Dr. Westgard as assistant director of the clinical chemistry laboratories at University Hospital. The long winters and short sailing season led to a move to Peninsula Regional in 1977. Currently he is supervising the automated and special chemistry laboratories there. He has continued to consult, work, and write with the Westgard group over the years. The second edition of Laboratory Quality Management, QC & QA with Dr. George Cembrowski is one of his current projects.

Tools, Technologies and Training for Healthcare Laboratories

Guest Essay