QC Past, Present and Future

Those who don't learn from the past are condemned to repeat it. That saying is as applicable to QC practices as it is to the lessons of history. At the 2011 AACC/ASCLS convention, Dr. Westgard reviewed the history of quality control in laboratories, as well as its present problems and possible futures.

Historical Perspective on Laboratory QC: Where we’ve been and where we’re going!

James O. Westgard, PhD
September 2011

In the beginning there was Shewhart
Along came Levey and Jennings and 1st generation QC
Then there was automation
2nd generation QC
3rd generation QC
TQM and 4th generation QC
Six Sigma and 5th generation QC
CLIA quality compliance and "equivalent QC"
QC for the future - back to 1st generation or on to 6th generation QC?
What to do
References

At the 2011 AACC/ASCLS National Meeting in Atlanta, I presented a short history of QC at the Bio-Rad workshop on “Quality Control for the Future.” I thought people might find this rather boring, but was surprised that many of today’s laboratory scientists don’t know where QC came from and how it has been adapted to changes in technology. This history is important to guide us into the future and prevent us from making the same mistakes again. For example, during the exhibits, we got a number of questions from laboratory scientists about some presentations and posters on QC. One concerned QC applications in multiplex testing and another on the use of 2s control limits and the practice of repeating controls. The solution to those problems may be guided by the solutions to past problems with simultaneous multi-test analyzers. We’ll come back to those particular problems in later discussions, but now is the time for history!

In the beginning there was Shewhart!

Industrial QC was introduced in the 1930s by Shewhart, who was a statistician at Bell Laboratories. His classic text “Economic Control of Quality of the Manufactured Product” provided the theory and practice guidelines for statistical quality control [1]. The technique that was recommended was to sample a group of products and determine the mean and range of critical characteristics, thus the first tools for SQC were actually mean and range charts where the average of several measurements were plotted on the mean chart and the range (difference from maximum to minimum of the measurements) was plotted on the range chart. That technique is still standard practice in industry today.

In the 1940s, Deming (who worked with Shewhart at one time) was charged with providing training in SQC to American industry to assure the quality of wartime production [2]. One of the strengths of American armaments was the quality of production and SQC became widely practiced in American industry.

In the late 40s and 1950s, Deming was asked to assist Japanese industry in improving the quality of production, particularly telephones, which were needed to improve communications. In addition, Juran began to provide broader training in quality management [3]. Their efforts led to the principles and practices of Total Quality Management and Continuous Quality Improvement in industry.

Along came Levey and Jennings and 1st Generation QC

Also, in 1950, two pathologists – Levey and Jennings – introduced SQC in medical laboratories [4]. The practice was to utilize a patient sample and perform duplicate measurements. Of course, the mean of these duplicates showed a lot of variation because of differences from patient to patient, but the range gave a measure of precision and a way to monitor what were then strictly manual measurement processes.

The limitation due to patient variation was quickly overcome by Henry and Segalove, who in 1952 recommended that patient pools be used as control samples to provide a specimen that was stable for a longer time [5]. With a stable control sample, they also recommended the use of an individual measurement, rather than duplicates, which is really the basis of today’s practice of “single value” QC. Thus, the well-known Levey-Jennings chart is actually Henry and Segalove’s adaptation for use with single control measurements. Common practice was to analyze a single control and utilize 2s control limits.

Then there was automation

In the 1960s, automation made its way into clinical laboratories. SQC became standard practice and manufacturers began to provide stable commercial controls to support routine laboratory QC.

The Technicon AutoAnalyzer was actually invented in 1951 by Leonard Skeggs, a physician who was working with renal dialysis [6]. Because he was also involved in a clinical laboratory, he became interested in how to automate blood testing and his invention of the “dialyzer” was the breakthrough that allowed for automation of the protein separation step of the testing process. While relatively few AutoAnalyzers were produced in the 50s, they became the standard of practice in the 1960s as laboratory testing started to boom.

With automation, QC also became standard practice. Keep in mind that the AutoAnalyzer was a continuous flow device, meaning that it was basically a pump with different size tubings for the sample and reagents. The speed of analysis was typically 40 samples per hour, which led to the practice of “batch testing” where standards were loaded upfront, followed by some controls, then patients, and at the end of the run, more controls. You can appreciate that the pumping process would wear on the tubes, leading to drift over time, thus batch control was essential for monitoring the stability of testing over time, even the relatively short time of an hour.

Single test AutoAnalyzers quickly were coupled together to automate high volume tests (glucose and BUN) and electrolytes. Soon Simultaneous Multichannel Analyzers (the SMA series by Technicon) became the production workhorses of laboratories. They grew in size from 6 channels to 12 channels to 20 channels during the 60s and 70s. Meanwhile, QC practices continued to use 2 SD control limits and difficulties arose in many laboratories due to the inherent false rejection.

Everyone knows that there is a probability of approximately 0.05, or 5% chance of exceeding 2 SD control limits, even when a testing process is working properly. As the number of controls increases, the probability for false rejections (P_fr) also increases. When 2 controls, P_fr is about 0.095 or 9.5%, with 3 controls, about 0.14 or 14%, and with 4 controls, 0.18, or 18%. What is less well understood is that with simultaneous analysis, the number of different test channels causes the same effect, thus as the number of test channels with SMA systems increased, the chance of false rejections (on at least test channel) also increased, thus the productivity of the analyzers was compromised by the need to often repeat at least one test in the panel.

At the Bio-Rad workshop, I told the story about realizing that with a 20 channel analyzer and 1 SD control limits, we almost always had to repeat at least one test. That I figured out if we increased the number of controls to 2, we could probably just continue to analyze one original set of specimens forever, thus saving much time and money because we would never have to collect more patient specimens. The story was supposed to be a joke, but few people understood what I was talking about, which is evidence that laboratorians today don’t know this history (and also shows how old I am), and that we’ll probably make some of these same mistakes again (because we don’t know this history). For example, the current difficulties with QC for multiplex analysis share this problem of false rejections for simultaneous measurements.

2nd generation QC

In 1976-77, while on a sabbatical leave in Uppsala University in Sweden, I started to study industrial QC practices to understand the origin of laboratory practices and also to learn how laboratory practices might be improved. Fresh from the experiences with the SMA analyzers, it was obvious that 2SD limits had to be eliminated, but what should replace them? It was standard practice in industry to use 3SD limits and only consider 2 SD as a warning rule, which was also Shewhart’s original recommendation. And while 3 SD limits kept the false rejections low, there was a danger that error detection might not be sufficient for the intended use of medical tests.

So, it seemed appropriate to use 2 SD as a warning rule, 3 SD as a rejection rule, and then consider other rules that could improve error detection. Through computer simulation studies, we were able to characterize the rejection characteristics of most of the rules being employed in industry, then recommended a way to optimize QC performance by first minimizing false rejections, then building up error detection by use of a series of control rules [7]. A paper showing a detailed example was published in 1981 and described the use of a “Shewhart Multirule Control Chart” [8].

During the 80s, many laboratories implemented multirule QC as automated systems and laboratory information systems provided the necessary QC software. Technicon was the first instrument manufacturer to do so and popularized the name “Westgard Rules.” One reason for widespread applications was that Westgard Rules were in the public domain, having been published in the scientific literature and being covered only by “copyright” protection. That was still at the time when purpose of University research was to solve problems, rather than protect discoveries and developments to make money.

Multirule QC can be considered as 2nd generation QC for automated analyzers, following the 1st generation Levey-Jennings QC for manual methods. The practice was to use multirule QC uniformly for all tests, just like Levey-Jennings QC had been applied uniformly to all tests.

3rd generation QC

During this same time, analytic systems were improving dramatically. In the 80s, the DuPont ACA ushered in a new era of stability for automated random access analyzers. Never before had analyzers been stable over an entire day, to say nothing about many days and even weeks. While we ran controls diligently based on past experiences, it was apparent that this new analytic system needed less frequent control and we needed to adapt our QC practices for improvements in technology. The ACA taught us that lesson and led to the use of different QC procedures on different analytic systems. This was the start of efforts to select or design QC procedures to fit the performance characteristics of different analytical systems and technology, which might be considered 3rd generation QC.

TQM and 4th Generation QC

Fortunately, Total Quality Management (TQM) was emerging in American industry in the 1980s in an effort to compete with the high quality production of Japanese industry. Deming and Juran were now leading American industries to implement the same techniques that had evolved in Japanese industries. The effort was broader than SQC and particularly emphasized management responsibilities and commitments to quality. Quality was now variously defined as the “totality of features and characteristics of a product or service that bear on its ability to satisfy given needs” [American Society for Quality, ASQ], “fitness for use” [Juran], “conformance to requirements” [Crosby,9], and “satisfying the needs of customers” [Deming]. All these definitions focused attention on “intended use” and the need to understand the customer’s requirements in order to provide the appropriate quality.

At this time, satisfying customer requirements depended on the initial validation of method performance, then QC was applied to monitor the performance available from the method. While we had developed method validation protocols and criteria for judging performance in relation to quality requirements in the form of an “allowable Total Error” (TEa) [10-11], SQC procedures were not optimized for the quality required for a test and the precision and bias observed for a method. We first outlined the general methodology to do this in 1986 in the book “Cost-Effective Quality Control,” which applied the principles of TQM to the practices in a medical laboratory [12].

By the 90s, there were high stability, high precision, high throughput random access analyzers, such as the Hitachi series. In addition to adapting QC to the analytic system, it became apparent that different QC procedures (rules, N) were appropriate for different tests performed on the same system. The principles of TQM guided us to optimize the performance of QC procedures on the basis of the quality required for the intended use of the test and the precision and bias observed for the particular method [13]. The improved throughput due to decreased false rejection rates demonstrated the cost-effectiveness of optimized QC designs [14].

Six Sigma and 5th Generation QC

A strong push for improved QC design was provided by Six Sigma Quality Management, which was introduced in industry in the 90s and came into practice in healthcare organizations and laboratories by the end of the decade [15]. Six Sigma emphasized the need to define “tolerance limits” to describe intended use, set a goal of 6-sigma for “world class quality,” and provided a uniform way of describing quality in terms of defects, defect rates, defects per million (DPM), and the sigma-scale itself. A “sigma-metric QC selection tool” readily evolved from an earlier “critical-error” tool and was eventually included in the CLSI C24A3 guidance for “Statistical Quality Control for Quantitative Measurements” [16]. Thus, standard QC planning tools became available in the forms of manual tools, as well as computer programs.

During this time, Parvin worked to improve QC design, particularly considering the frequency of QC [17]. He demonstrated the need to structure QC around known and unknown events. Known events refers to changes that happen to a measurement process that are known at the time they occur, e.g., change in bottle of reagent, maintenance of analyzer, replacement of mechanical part, etc. QC samples should be scheduled for such known events, plus some samples should be analyzed routinely between events to detect unknown changes that may occur. Consideration of practices for the continuous reporting of test results (vs batch reporting) also required ongoing process monitoring with controls. Thus, a strategy for multi-stage QC emerged for 5th generation QC, wherein different QC designs were employed at different times during the routine operation of an analytical process [18]. One recommended strategy was to employ a “Startup” QC procedure having high error detection at the beginning of a run, then a “Monitor “ design having low false rejection throughout the run, and finally adding patient data QC, such as Average of Normals algorithms (AoN) to measure the length of the run [19].

CLIA Quality Compliance and “Equivalent QC”

In 1988, the Clinical Laboratory Improvement Amendments (CLIA) became law, followed in 1992 by the CLIA rules and regulations to implement that law. Except, one part of the law that required the FDA to “clear” manufacturer’s QC recommendations or instructions was delayed, hence the CLIA law provided a temporary minimum standard for analyzing at least two levels of QC per run, or 2 levels every 24 hours. This temporary minimum QC was extended approximately every 2-3 years until 2003, when the provision for QC clearance was eliminated in the Final, Final, Final, Final, Final CLIA Rule [20]. By that time, the minimum QC of 2 levels per day had become the de facto standard of practice, setting back the advancement of QC practices to 1st generation Levey-Jennings single-rule QC in many laboratories.

During the 90s, POC devices had become popular and CMS was having difficulties enforcing even the minimum of 2 levels per day. Manufacturers argued that built-in controls should suffice and CMS temporarily allowed “electronic QC” to substitute for the use of liquid controls in such devices until the QC clearance provision was implemented. Electronic QC was really nothing new! There had been similar instrument checks in use for many years, but never as a substitute for an independent surrogate control sample. For example, the DuPont ACA utilized a daily “filter balance” procedure, which was an electrical check that preceded the routine analysis of patient samples and control materials. However, the difficulties of implementing SQC in many POC applications led to the dependence primarily on the manufacturer’s built-in controls.

Given that QC clearance wasn’t implemented in the Final, Final, Final, Final, Final Rule, CMS came up with a new remedy for POC applications by requiring laboratories to perform validation protocols to lower the frequency of analyzing surrogate controls from two levels per day to two levels per week or possibly even two levels per month. This was called “equivalent QC,” conveniently relating, but also confusing, this new practice with “electronic QC.” CMS published the new EQC guidelines in the State Operations Manual (SOM) [21] and described the validation protocols for qualifying a device for reduced frequency of QC. Unfortunately, these validation protocols were themselves not valid because the time periods were too short. For example, for devices with built-in controls that monitor the whole testing process, the protocol called for testing external controls along with the built-in controls for a period of 10 days, then if no out-of-control problems had been observed, the frequency of surrogate controls could be reduced to once every 30 days. Clearly 10 days of observed stable operation does not assure the device will be stable for 30 days, but it still qualifies the device for monthly EQC.

QC for the Future – 1st or 6th Generation?

The problems with “Equivalent QC” were obvious to both industry and laboratory users. In March, 2005, CLSI convened a conference on “Quality Control for the Future” [22]. The pre-ordained outcome was the formation of a CLSI committee to develop a new guidance document to deal with the problems arising from EQC. After some 6 years, we are still awaiting the publication of that guidance. A proposed document titled “Laboratory Quality Control Based on Risk Management” [23] was published in 2010 and a final “Approved” version is expected by the end of 2011.

Risk analysis involves the systematic review of an analytical measurement process to identify all possible failure modes. Then the risk of each failure mode is estimated based its probability of occurrence, its detectability if it occurs, and the severity of harm if it goes undetected. High risk failure modes are identified, then a strategy is developed to minimize their harm by preventing occurrence and/or optimizing detection (and the corrective actions necessary for recovery). In a medical laboratory, the primary option for risk mitigation is to improve detection, which means implementing appropriate control mechanisms for each of the high risk failure modes. These control mechanisms are assembled in a “QC Plan” that prescribes their operation, frequency, and corrective actions to be taken for recovery. The “residual risk” of the QC Plan needs to be evaluated in order to determine its acceptability for the intended use of the laboratory test. Once the plan is implemented, the quality and performance of the testing process must be monitored to identify failures that require improvements.

The tool for performing risk analysis is called a FMEA (Failure Modes and Effects Analysis), which has a long history of use in industry, but is totally new in medical laboratories. Therefore, there will be a steep learning curve if risk analysis is to provide a reliable approach for developing laboratory QC Plans. Current ISO and CLSI guidelines describe qualitative methodologies with arbitrary decisions on the acceptability of residual risks. A more quantitative methodology is possible by integration of Six Sigma concepts and metrics [24], but that makes the risk analysis process even more demanding and mainly suitable for laboratories performing moderate and high complexity tests, not the POC devices that were the intended use for EQC and the motivation for development of the CLSI risk management QC guideline [23].

Risk analysis is not likely to replace EQC unless CMS eliminates the EQC options in the State Operations Manual. The time and effort for the EQC validation is much less than what will be required to perform risk analysis, thus the real benefits of risk analysis will be for complex analytic systems where the development of a detailed QC Plan is necessary to adequately monitor critical failure modes. It may be advantageous to manufacturers who want to document the appropriateness of “alternative QC” approaches and to laboratories that need to provide comprehensive QC Plans for Laboratory Developed Tests (LDTs). It will facilitate more comprehensive monitoring of pre-analytic and post-analytic failure modes as part of routine QC. Implementation will be aided by Auto-Verification software, but it will be critical to make sure that SQC is still part of the auto-verification rules that are implemented.

What to do?

If the current EQC options remain, then we’re back to square one - 1st generation QC - for those laboratories that are guided by regulations for compliance! If comprehensive QC Plans are properly developed, they will lead to improvements and 6th generation QC! Such an advancement is needed for the today’s highly complex automated analytic systems, as well as for new complex measurement technologies where traditional SQC procedures need to be supplemented to provide comprehensive monitoring of pre-analytic, analytic, and post-analytic failure modes.

We’re at a fork in the road, with a choice of taking the high road or the low road. Overcoming the last decade of quality compliance may now be a major problem in moving forward with improved quality systems, both for manufacturers and for medical laboratories. Risk analysis offers a new approach for making improvements in quality systems, but it is also an approach that can be easily abused and misused to provide only an appearance of a quality system. The problems with risk analysis on Wall Street and with deep water oil wells should be fresh in our minds and make us aware that risk analysis is itself risky business.

References

Shewhart WA. Economic Control of Quality of Manufactured Product. New York: D Van Nostrand Company, 1931.
Deming WE. Quality, Productivity, and Competitive Position. Boston: MIT Center for Advanced Engineering Study, 1982.
Juran JM. Managerial Breakthrough. New York: McGraw-Hill Book Co, 1964.
Levey S, Jennings ER. The use of control charts in the clinical laboratory. J Clin Pathol 1950;20:1059-66.
Henry RJ, Segalove M. The running of standards in clinical chemistry and the use of the control chart. J Clin Pathol 1952;27:493-501.
Lewis LA. Leonard Tucker Skeggs – A multifaceted diamond. Clin Chem 1981;27:1465-68.
Westgard JO, Groth T, Aronsson T, Falk H, deVerdier C-H. Performance characteristics of rules for internal quality control: Probabilities for false rejection and error detection. Clin Chem 1977;23:1857-67.
Westgard JO, Barry PL, Hunt MR, Groth T. A multi-rule Shewhart chart for quality control in clinical chemistry. Clin Chem 1981;27:493-501.
Crosby PB. Quality is Free. New York: New American Press, 1979.
Westgard JO, Hunt MR. Use and interpretation of statistical tests in method-comparison studies. Clin Chem 1973;19:49-57.
Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1974;20:825-33.
Westgard JO, Barry PL. Cost-Effective Quality Control: Managing the quality and productivity of analytical processes. Washington DC:AACC Press, 1986.
Koch DD, Oryall JJ, Quam EF, Feldbruegge DH, Dowd DE, Barry PL, Westgard JO. Selection of medically useful QC procedures for individual tests on a multi-test analytical system. Clin Chem 1990;36:230-3.
Westgard JO, Oryall JJ, Koch DD. Predicting effects of QC practices on the cost-effective operation of a multitest analytic system. Clin Chem1990;36:1760-4.
Westgard JO. Six Sigma Quality Design & Control: Desirable precision and requisite QC for laboratory measurement processes. Madison WI:Westgard QC, 2001.
CLSI C24A3. Statistical Quality Control for Quantitative Measurements, 3rd ed. Clinical Laboratory Standards Institute, Wayne PA, 2006.
Parvin CA, Gronowski AM. Effect of analytical run length on quality control (QC) performance and the QC planning process. Clin Chem 1997;11:2149-54.
Westgard JO. Assuring the Right Quality Right: Good laboratory practices for verifying the attainment of the intended quality of test results. Madison WI:Westgard QC, 2007.
Westgard JO, Smith FA, Mountain PJ, Boss S. Design and assessment of average of normal (AON) patient data algorithms to maximize run lengths for automatic process control. Clin Chem 1996;42:1683-8.
US Centers for Medicare & Medicaid Services (CMS). Medicare, Medicaid and CLIA Programs. Laboratory Requirements Relating to Quality Systems and Certain Personnel Qualifications. Final Rule. Fed Regist Jan 24 2003;16:3640-3714.
CMS State Operations Manual Appendix C. Regulations and Interpretive Guidelines for Laboratories and Laboratory Services. http://www.cms.hhs.gov/CLIA/03_Interpretive_Gluidelines_for_Laboratories.asp#TopOfPage
Quality Control for the Future. LabMedicine 2005;36:609-640.
CLSI EP23P. Laboratory Quality Control Based on Risk Management. Clinical Laboratory Standards Institute, Wayne PA, 2010.
Westgard JO. Six Sigma Risk Analysis: Designing analytic QC Plans for the medical laboratory. Madison WI:Westgard QC, 2011.

Tools, Technologies and Training for Healthcare Laboratories

Basic QC Practices