SIX SIGMA - Medical tolerance limits can be defined from test interpretation guidelines that describe a gray zone between two values that lead to different medical decisions and treatments. Neonatal screening provides many examples where laboratories actually publish test interpretation and treatment guidelines for the physicians who are responsible for the care of the newborns. Intuitively, Six Sigma concepts should apply here since the whole purpose of such testing is to identify congenital defects that can be treated effectively when detected early. Millions of newborns are being tested. High analytical quality and a low defect rate in the testing processes are essential.
Neonatal screening programs are mandated in many countries and often operate under the auspices of the government. In the US, for example, most neonatal screening is provided through the official health laboratories of the individual states, often called State Public Health Laboratories. These laboratories need to provide high quality, high volume testing, with fast reports, and low costs. While these service requirements are similar to those in hospital clinical laboratories, newborn screening is often considered to be a special category of laboratory testing.
The history of neonatal screening generally begins in the 1960s with screening for phenylketonuria by use of a biological inhibition assay for phenylalanine that was introduced by Guthrie [1]. Screening for congenital hypothyroidism was introduced in the late 1970s by use of a chemical radioimmunoassay measurement of thyroxine. A variety of other microbiological and chemistry tests have expanded the screening applications in the 80s and 90s. At the beginning of the new millennium, tandem mass spectrometry (TMS) is poised to provide a technological revolution in neonatal screening.
Practitioners in the field emphasize that tests for neonatal screening are different from other kinds of laboratory tests (is this refrain beginning to sound tired to you yet?). They point to special problems such as:
- the technique for specimen collection from a heel stick,
- whole blood as the specimen,
- time of collection of the specimen,
- collection and storage of samples on filter paper,
- effects of different types of filter paper,
- small volume of sample,
- relatively large imprecision of analytical methods,
- lack of standardization of laboratory methods,
- lack of control materials that are independent of a manufacturers method,
- appropriateness of different statistical control rules,
- appropriateness of number of control measurements per analytical run,
- usefulness of patient data for quality control,
- limited availability of external quality assessment programs,
- sample size of spotted specimens on external quality assessment programs,
- difficulties with consistency from one reagent lot to another,
- choice of cutoff point for positive test results,
- test interpretation via a single cutoff point versus two points with a gray zone,
- cost of repeat testing and confirmation, and
- accuracy or reliability of the interpretation as the real measure of quality.
This list is not comprehensive, but does suggest that there are many preanalytical and postanalytic issues that are important. Many of these factors except for those factors related to sample collection on filter paper are also important in testing in any clinical laboratory.
The difficulties in developing a more quantitative and objective approach to analytical quality management requires that several key issues be addressed. The clinical quality-planning model provides the appropriate Six Sigma design tool for these applications.
1. How should specimen variability from filter paper collection be accounted for? Specimen collection and storage on filter paper introduces additional variability. Several factors are involved, including the preparation of the infants heel, the care and skill in collecting the heel-stick specimen, the amount of specimen collected per spot, the effect of different filter papers, proper drying of the filter paper specimens, proper transport, and careful punching and elution of specimens. Preanalytical sources of errors must be accounted for in neonatal screening and can be with the clinical quality-planning model.
2. Should the format of test interpretation guidelines be standardized? Test interpretation guidelines are critical because of the potential cost of a false-negative result. A correct decision in neonatal screening depends on making the correct interpretation of test that is properly performed on a specimen that is properly collected. Specific interpretative guidelines are often provided by the neonatal screening program, thus the program exerts more influence on the interpretation of a test than is typical for tests performed in a clinical laboratory. Two types of guidelines are generally found:
- Single cut-off or decision point to classify positives and negatives;
- Two decision points and a gray zone to classify positives, intermediates, and negatives.
The clinical quality-planning model is particularly appropriate for applications where two decision points are included in the test interpretation and treatment guidelines.
3. How should quality specifications be defined? The outcome of the Stockholm conference was a consensus statement that recommends a hierarchy of standards be utilized to define the quality requirement for a laboratory test [2]. This hierarchy includes clinical classification criteria for medically important changes in test results, biologic goals for imprecision and inaccuracy, proficiency testing criteria for the allowable total error, and state of the art performance criteria. The use of clinical decision intervals with the clinical quality-planning model provide the highest form of quality standards.
4. How should quality specifications influence quality management practices? The NCCLS C24-A2 document describes how to plan a QC strategy on the basis of the quality required for a test [3]. The planning procedure takes into account the analytical performance factors observed for the method in the laboratory, as well as the statistical rejection characteristics of different control rules, with the objective of maximizing the detection of medically important errors and minimizing false rejections. The clinical OPSpecs chart provides a practical tool to implement the NCCLS planning guidelines.
5. What is the role of patient data in quality management? In the general structure of a multi-stage QC procedure, some measure of the central tendency (mean, mode, median) of the patient population is recommended as a monitor of the long-term stability of the measurement process. Patient data QC procedures are mainly useful for detecting systematic changes in the process. Given the high volume of testing performed in most neonatal screening laboratories, patient data should be useful as part of the overall QC measures. Speaking at the AACC/ACB conference on New Approaches to Quality Control, Howard Cuckle (Professor of Reproductive Epidemiology at the University of Leeds) recommended the use of the median MoM (multiple of median) as a potential patient population monitor, as well as ongoing estimates of the false positive rate and detection rate [4]. Six Sigma quality design can also be applied to patient data techniques, e.g., Average of Normals (AoN) patient data algorithms can be designed using a clinical quality requirement with the EZ Rules program.
All Six Sigma principles and goals, metrics and methods, and tools and technology should apply to the laboratory testing processes that are used in neonatal screening. If anything, the importance of preanalytical variables makes the clinical quality-planning model even more useful and valuable. The challenge is to develop receptivity for a quantitative quality management system. The driving force should be the laboratorys own consumer guidelines for test interpretation.
1. Define quality requirements in the form of clinical decision intervals. The development of an objective quality management system depends on defining the quality needed for each test. The strategy in neonatal screening should be to utilize the interpretive guidelines to define the quality requirement for each test. This can be done when interpretative guidelines are given in the form of two decision levels or cutoff points that define an intermediate gray zone. That gray zone has been called a clinical decision interval in previous work in clinical chemistry [6,7] where the test interpretation guidelines of the National Cholesterol Education Program have been utilized as clinical quality requirements.
An initial review of test interpretation guidelines available on the Internet has shown that the dual-cutoff gray-zone format is commonly used, but not systematically used. An individual laboratory may use gray-zones on some tests and single cutoffs on others. Different laboratories may use different formats for a specific test, i.e., one lab uses a gray-zone for TSH and another uses a single cutoff for TSH.
The following examples are from the Wisconsin Newborn Screening Program (http://www.slh.wisc.edu/newborn/contents.html) and the Vermont Newborn Screening Program (http://www.vtmednet.org/vhgi/vhgi_mem/nbsman/intro.htm)
Phenylketonuria Phenylalanine > 8 mg/dL
Requires confirmatory test Phenylalanine 5-8 mg/dL
Requires repeat filter paper test Phenylalanine 2-4 mg/dL
Requires repeat filter paper test Clinical decision interval
2-8 or 6 mg/dL Source
Vermont Newborn Screening Program Congenital hypothyroidism TSH < 30 uIU/ml
Possible abnormal Age 26 to 96 hours TSH > 37 uIU/ml
Definite abnormal Clinical decision interval
30-37 or 7 uIU/ml Source
Wisconsin Newborn Screening Program Congenital Adrenal Hyperplasia 17-OH > 135 ng/ml
Definite 17-OH 115-134 ng/ml
Possible Clinical decision interval
115-135 or 30 ng/ml Source
Wisconsin Newborn Screening Program Galactosemia Galactose > 8 mg/dL
Immediate treatment and confirmatory test Galactose 4-8 mg/dL
Repeat filter paper test Clinical decision interval
4-8 or 4 mg/dL Source
Vermont Newborn Screening Program Biotinidase Deficiency Enzyme activity <10% adult
Requires confirmatory test Enzyme activity 10%-30%
Requires repeat filter paper test Clinical decision interval
10-30% or 20% Source
Vermont Newborn Screening Program Homocystinuria Methionine > 2mg/dL
Requires confirmatory test Methionine 1-2 mg/dL
Requires repeat filter paper test Clinical decision interval
1-2 mg/dL or 1 mg/dL Source
Vermont Newborn Screening Program Maple Syrup Urine Disease Leucine > 4 mg/dL
Requires treatment and a confirmatory test Leucine 3-4 mg/dL
Requires repeat filter paper test Clinical decision interval
3-4 mg/dL Source
Vermont Newborn Screening Program 2. Translate clinical decision intervals into operating specifications. The specifications needed at the bench level are operating specifications that include the precision and accuracy that are allowable for the method and the quality control that is needed to detect changes in method performance. Because the clinical decision interval encompasses both analytical and preanalytical factors that influence the variability of a test result, preanalytic factors such as sampling variation and within-subject biological variation need to be taken into account. The existing clinical quality-planning model should be adaptable for neonatal screening because it includes several preanalytical terms that can accommodate sources of variability related to the collection of specimens on filter paper:
- Dint represents the decision interval, or gray zone;
- biasspec is a preanalytical bias term that might be used to characterize different sample volumes from different types of filter paper, e.g., a bias because a specimen is collected on 2992 and the calibrators are provided on 903.
- sspec is a preanalytical variation term that might be used to describe the sampling variation due to collection on filter paper;
- nspec is the number of different specimens that are analyzed as part of one test, e.g., this term could accommodate the punching and analysis of 2 different blood spots;
In addition, the model contains terms to accommodate biological variation and, of course, analytical variation:
- sbiol represents the preanalytic biological variation, which could consider the group or population variation or the within-subject biological variation, depending on whats most appropriate for the application and the information available;
- ntest is the number of different tests that are averaged to provide a single result, e.g., if an infant was sampled at day 2 and then a repeat sample was obtained at a later date, individual variation could be reduced by averaging the results of the two tests;
- biasmeas is the analytical bias or inaccuracy of the method itself, which could be estimated from method validation experiments or from external quality assessment data;
- smeas is the analytical imprecision of the method itself, which is usually estimated from repeated analysis of stable control samples;
- nmeas is the number of replicate measurements, e.g., the number of analytical measurements performed on a single eluted blood spot.
3. Provide computer support for quality design and control. The difficulty of using a clinical decision interval type of quality requirement is that the quality-planning model requires computer support to perform the complicated calculations correctly. The computer screen below illustrates an application for phenylalanine. A critical concentration for test interpretation is 8 mg/dL, according to the Vermont Screening Program, where a value of 8 or greater is taken as a positive test. Analytical performance factors are shown as 9.0% for the method CV and 4.0% for the method bias. The Vermont program recommends interpretation limits of 2 to 8 mg/dL, or 6 mg/dL at a decision level of 8 mg/dL, giving a value of 75% for Dint. For illustrative purposes, the preanalytic factors are defined as biasspec = 5%, swsub = 15%, and sspec = 10%. The OPSpecs chart that follows shows the selection of a QC procedure having 3 SD limits and an N of 2.
While there are obvious differences between neonatal screening and the more routine tests performed in healthcare laboratories, many of the problems and difficulties are similar and the same quality management approaches and techniques apply. The formulation of a quality management system in a neonatal screening laboratory can profit from the recent developments in hospital clinical laboratories and the methods and metrics of Six Sigma Quality Management. Some adaptations are needed to account for the collection of whole blood samples on filter paper, but existing quality-planning models, computer support, and training materials provide a solid foundation for optimizing quality management practices for neonatal screening applications.
There is a need to define the experimental protocols and data that are needed to provide the proper measures of the various bias and imprecision terms in the clinical quality-planning model. In particular, what is the proper way to characterize the variation due to filter paper samples? The analysis of control materials from prepared spots on filter paper includes the minimal sampling variation from well-controlled spotting, as well as the analytical variation of the method. Should those sources of variation be separated? Should the additional variation from specimen spotting under realistic sampling conditions be estimated and included?
Introduction of a new quality management approach will require detailed applications to guide individual laboratories and analysts through the quality-planning process. A series of example applications should be provided based on the expected performance of products that provide the various screening tests. Quality requirements can be taken from the published interpretative guidelines of various neonatal screening laboratories.
Given the variety of backgrounds, experience, and training of laboratory analysts, there is a need for training in analytical quality management, including statistical quality control, method validation, and quality planning. Along with training, analysts need practical data calculation and plotting tools to support their everyday activities. This means some form of computer support, either by PC, LIS, or the Internet.The EZ Rules program provides the most practical form of support for neonatal screening applications because it is the easiest to learn and the easiest to use. The phenylalanine example discussed earlier is shown below. The step-by-step methodology guides users through the data input while providing instructions at the same time.
Step 1: Enter the Test Name, Units, and System
Step 2: Select the Control Rules of Interest
Step 3: Select the Type of Quality Requirement
Step 4: Enter the Decision Level
Step 5: Enter the Clinical Decision Interval
Step 6: Enter the Biologic Variation as a %CV
Step 7: Enter the Pre-analytic factors
Step 8: Enter the method imprecision
Step 9: Enter the method bias
Step 10: Select method stability
Step 11: Select number of control materials
Step 12: View Critical-Error Graph with Sigma-Metric
Please note that this Sigma-metric is part of a special version of our EZ Rules(tm) software. Here, the Sigma-metric of the method is 5.29, which is good. This is why the method is easily controllable with the 13s control rule.
- Guthrie R, Susi A. A simple phenylalanine method for detecting phenylketonuria in large populations of newborn infants. Pediatrics 1963;32:318-343.
- Strategies to set global analytical quality specifications in laboratory medicine. Scand J Clin Lab Invest 1999;59:475-586.
- NCCLS. Statistical Quality Control for Quantitative Measurements: Principles and Definitions; Approved Guideline Second Edition. NCCLS Document C24-A2. NCCLS, 940 West Valley Road, Suite 1400, Wayne, PA, USA 1999.
- AACC/ACB New Approaches to Quality Control workshop manual. May 11-12, 2000, Chicago, IL.
- NCCLS. Quality-Management for Unit-Use Testing; Proposed Guideline. NCCLS document EP-18. NCCLS, 940 West Valley Road, Suite 1400, Wayne, PA, 1999.
- Westgard JO, Hytoft Petersen P, Wiebe DA. Laboratory process specifications for assuring quality in the U.S. National Cholesterol Education Program (NCEP). Clin Chem 1991:37:656-661.
- Fallest-Strobl PC, Olafsdottir E, Wiebe DA, Westgard JO. Comparison of NCEP performance specifications for triglycerides, HDL, and LDL cholesterol with operating specifications based on NCEP clinical and analytical goals. Clin Chem 1997:43:2164-2168.
