Ten Ways to do the Wrong QC Wrong

Think you're the only one who doesn't do QC perfectly? You're not alone. In this article, we look at numerous examples from readers, where the best intentions have gone astray. By seeing how QC practices go wrong in the real world, we can learn what we need to do better.

Detecting (and avoiding) errors of the third kind

In an earlier essay, we described the difference between the “ease” of theoretical QC implementation and the complications of QC implementation in the “real world” laboratory. Outside the confines of a textbook or a website, laboratory professionals face multiple layers of problems when they approach QC; they not only have to interpret the QC rule correctly, they have to choose the correct QC rule, and they must base that rule choice on the correct data. No wonder then, that we often find laboratories not only doing the wrong QC, but doing the wrong QC wrong.

With this background and perspective, let’s look at some of those real world situations. We’re going to look at several of the out-of-the-blue, over-the-transom questions we receive at Westgard Web and dissect them into their component problems. Let’s see if and how they fit into the “wrong QC wrong” model and how we might help identifying the real problem.

Important Note: Anonymity has been preserved.

Names have been removed, unimportant details have been changed, and even the writing patterns have been altered to protect the identities of these questioners. We don’t want to discourage any feedback from customers and readers. This email address is being protected from spambots. You need JavaScript enabled to view it. with your questions. The purpose of this essay is to examine typical real-world scenarios and discover common problems that all laboratories are facing.

Scenario #1:

“We are part of a laboratory network of approximately 20 labs, which use the same instruments and we are compared to one another. I use the Mean of the lab group then 2/3 of their SD's for my labs' Mean and SD's, and then adjust them through the first few months, as long as we stay in the groups' 2 SD. I would like your option of my use of the numbers.”

Here we see the problem with the use of the peer group statistics. Good information is being used for the wrong purpose. The peer group statistics should be used to compare performance between laboratories, for example, to determine bias – how each individual laboratory compares to the group, but shouldn’t be used to calculate the control limits for the individual laboratories. Each lab should use its own data to determine its own means and SDs for control limits.

The adjustment of the SDs, e.g. using 2/3 of the group SD, supports the idea that the group SD is too large for an individual laboratory, but use of an arbitrary factor to reduce the SD is just a guess. It’s best to get the means and SDs right for the individual laboratory. Otherwise, even if you “correctly” design QC and “correctly” interpret the control rules, you are still doing the wrong QC wrong. It’s like playing poker with alphabet flash cards.

Scenario #2:

“My lab runs 3 QC levels for TSH. The QC was very much within range but on day 5, the QC for level 2 was suddenly +3s, but QC 1 and QC 3 were within range. I accepted the run for that batch. On the next run, the QC for level 2 became -3s but the QC 1 and QC 3 were within range. The problem was later identified as deterioration of QC 2 material, and thus changed to a new QC 2 (on the next run all QC were within range). However, can I accept the second batch of run for patient's sample (for QC 2 -3S but QC 1 & QC 3 within range)?”

The QC design here is apparently to use 3 SD limits with 3 different control materials (1_3s with N=3 in our terminology). The correct interpretation for that control procedure is to reject a run if any one of the QC’s exceeds a 3s limit. It’s very unlikely that a control outside of 3s is a false rejection. Something is definitely wrong, which turned out to be the control material itself. That problem should have been identified on day 5 rather than waiting another day to see the problem again. Sounds like the same vials of controls are being used for several days, which assumes the controls are properly stored, remain stable, and do not get contaminated in any way.

Scenario #3

“1) We routinely accept assays with more than one QC out 2SD, if it is out on the side of positive screens. For instance, if we are screening for elevated phenylalanine and 2 or more QC are out +2SD, the run would be accepted and any patients that were elevated would be retested on the next assay.

“2) It had been standard practice that if one QC was out +2SD and one -2SD, they ‘evened out’ and the run could be accepted. (I see on the Westgard worksheets that this would violate the R4S rule and the run should be rejected.)

“Given that we are a screening lab as opposed to diagnostic, are these reasonable practices? CLIA took issue that we did not have published material stating that this was acceptable.”

It’s harder to give advice about this situation because of the use of 2SD control limits, without knowledge of how the control limits are being calculated and whether they really represent 2SD of the method. If it is assumed that the control limits are correct for the method, any situation where 2 or more controls exceed their 2SD limits should identify real analytical problems, either systematic errors (2 out the same direction) or random errors (out different directions). In the case of both out high, the patients get retested, which is good. In the case of one out high and one out low, the errors may even out for the laboratory, but they don’t for the individual patients. Some positive patients are likely misclassified as negative other some negatives are misclassified as positive. Any positives are retested by a confirmatory laboratory, which should sort out the false positives. Unfortunately, there may be some false negatives that don’t get retested and that is a serious issue since the purpose of screening is to detect all potential positives. A screening lab should err on the side of false positives and avoid any false negatives.

To assure the right QC rules and right number of controls are being used, it is possible to apply our QC design process, even for qualitative tests. There is a detailed example for phenylalanine in Chapter 14 of Six Sigma Quality Design and Control. Like many labs, this laboratory appears to be applying blanket rules for QC, which means that individual methods may be over-controlled or under-controlled.

The immediate answer is, if you have decided to use the 2s control rule or the R4s control rule, you need to enforce it every time, not just selectively. And the R4s rule does not have an “even out” clause.

Secnario #4

“If I understand you correctly, we do nothing when the one qc is out and not even run it again. My friend says he runs it again to make sure it is random error.

“ We had a problem with direct bili’s and i must say that there were 2 levels out. It seems to me that when we do have problems that require hotline, there are 2 and sometimes all 3 levels out. I feel our inst may not have problems when only 1 is out. I will go back to the maintenance log of our analyzer and see if that has been the case.

“Regarding the practice of accepting some of these outlier (but less than 3sd) results, our ranges will be wider. However if we don't accept, the ranges will be tighter. I wonder if the ranges will stablize better and we will see the results fall above and below the mean if we did start accepting the outliers.”

I don’t think we’ve ever been on record as saying “do nothing when one qc is out.” We do recommend that the only proper application of 2 SD limits is when only one control material is being run, i.e., N=1 and false rejections will be 5%. Other people may recommend doing nothing when one qc is out and that may become a common practice in some laboratories as a consequence of having a high level of false rejections.

The third paragraph is a perfect example of wrong QC wrong: by using the wrong limits, the laboratory also adopts the wrong practice for calculating control limits. The fundamental principle of statistical QC is to characterize performance under stable operating conditions in order to identify changes due to unstable operation, i.e., when problems occur. Thus, in principle, the control data from any out of control run should not be included in the calculation of control limits because it does not represent stable operation. Even the College of American Pathologists has this problem because of trying to cope with the use of wrong control limits, thus this practice is actually widespread.

Here’s a case where proper design of the QC procedure is important so that any “out-of-control” signals represent real problems, not false rejections. If the control rules being used are tighter than necessary (for example, 2s limits), than the “outliers” may actually be acceptable if the QC were properly designed. And if the QC were properly designed, those outliers would actually be “in-liers” and they would rightly be used in the calculation of ranges. But a well-designed QC procedure that is properly implemented should not include data from “out-of-control” runs to calculate control limits. The problems of this scenario create a tortured logic that makes doing the wrong QC wrong seem like it’s right.

Scenario #5

"I have fixed the lab limits based on the total allowable error, taking one fourth the value of the total allowable error for the analyte as 1SD. I have taken the target value from the assay sheet as my lab mean. Is my approach correct?"

Again, a little knowledge can be a dangerous thing. Using total allowable errors for quality requirements is good, but dividing them by four to generate a standard deviation figure is not correct. The standard deviation should be a measure of actual performance; total allowable error is a goal for the desired performance. You use your actual observed standard deviation to see if you are meeting the goal.

In this scenario, we also see that the target value from the assay sheet has been used for the laboratory mean. This may be acceptable when the control lot is first being introduced, but as soon as real data is available (from a short replication study, for example), that should be used instead of the assay sheet data.

Scenario #6:

“During daily practice in Clinical Chemistry lab if we get the 2 2s and after troubleshoot (recheck the analyser condition, the calibration, the reagent and the QC), we still get the same result as 2s . What should we do? If we still proceed doing the troubleshooting, I afraid the Turn Around Time of the urgent test will be longer than 45 minutes. This will effect the treatment of the patients.”

This situation is a bit ambiguous, but let’s assume that the “2 2s” means the 2_2s rule, rather than a 1_2s rule that was violated and the control then repeated. A 2_2s mean there is a real problem, but evidently the trouble-shooting didn’t resolve it. The next QC was also out by 2s, in this case, most likely indicating that the problem hasn’t been fixed.

The other issue here is a very common one for laboratories: production pressure. The TAT on tests is probably the most measured element of performance – and the element that is most recognized and felt by the ordering physicians.

Here’s the real question: do you want to report possibly erroneous values to your physician just to make the TAT? Would it be acceptable to allow the doctor to make the wrong medical decision based on a wrong test result? Or do you want to get the values right the first time?

In a wrong QC wrong world, getting the numbers out the door becomes the highest priority, regardless of the quality of those numbers. That’s like fast food testing – it’s quick but it’s bad for you. In a right QC right world, you’ll have fewer alarms and out-of-control situations, so when you do get an error flag, you’ll know it’s a serious error and one that could affect patient values and treatment. You’ll know that you don’t want those values out the door.

Scenario #7

“In our laboratory value of many haematological parameters analysed few values are lying on one side of the mean in LJ graphs with no deviation. All the values are lying at one level with no deviation i.e more than 10. Our instrument is calibrated correctly. The parameters which are showing such pattern are MCHC, RDW. Kindly let me know the reason for this.”

This laboratory has noted that the control points don’t seem to show the expected distribution about the mean. There is possibly a very subtle issue here – data rounding. If precision is very good and the control results are being rounded, that could cause many points to show as the same value. It is often good practice to carry one extra significant figure in your control data so that the calculation of the mean and SD will be more accurate.

Another issue is using the right QC rules based on the quality required for the tests. It’s quite likely that the 10:mean rule isn’t needed if the control data are showing high precision. Getting the right calculations for the mean and SD are important, then getting the right QC design is possible. It looks like there are issues with the right QC and also with implementing QC right.

Scenario #8

“I have a case, my control measurement not exceeds a 2s control limit but 4 consecutive control measurement exceed the same mean plus 1s or the same mean minus 1s control limit ( 41s ) or and 10 consecutive control measurement fall on one side of the mean ( 10x ). What does it mean, accept run or reject run ? ( 12s (No) but 41s ( yes ) or 10x (Yes).? Accept run or reject run ?)”

This is actually a very straightforward “Westgard Rules” question. Nevertheless, we’re going to make it complicated.

In the “classic” version of the “Westgard Rules,” you only triggered the other rules after a 1_2s control rule was violated. So, strictly according to the classic rules, if there wasn’t a 1_2s violation, then you don’t use the other rules and everything is in.

Now, in the updated “Westgard Rules”, the 1_2s “warning rule” has been replaced by a 13s rejection rule – and all the rules are to be used as rejection rules. If you were using the updated rules, that 4_1s or 10_x would be a rejection signal – but only if QC Design mandated that those 4_1s and/or 10_x mean rules were necessary to monitor the test. It’s possible that the performance of this method, coupled to the quality required by the test, may make such rules unnecessary. That is, you might only need simple rules like 1_3s and can totally ignore those other data patterns.

Many laboratory professionals like to use the 10_x and 4_1s and similar rules as “warning rules,” using those trends and shifts as a way to get an early eye on a problem, even if QC design doesn’t mandate those rules. That’s fine, but if it starts to make you chase ghosts in the method, it’s counter-productive.

Scenario #9:

“I have a question regarding the 10x rule. If there are assays that consistently run above the established mean, but remain with in 2SD, does the run have to be rejected? Does the control have to be repeated? Can I legitimately adjust the mean to reflect how the control is performing?

“For instance: If our mean for potassium is set at 6.0 and our values consistently run 6.1, all of the values will fall on the same side of the mean. It seems unreasonable that the run should be rejected.”

Believe it or not, this is the same problem as experienced by the laboratory professional in the previous scenario (Did you detect the similarities?).

Under the “classic” version of “Westgard Rules,” there are no out-of-control values in the scenario. With the updated “Westgard Rules,” we would in fact use those 10x rules and declare that run “out” – if in fact the 10_x mean rule was necessary. However, if the mean was adjusted to reflect current performance, a QC Design process might determine that none of the extended multirules were necessary. If the test is performing with barely any variation at all, it’s more likely that a simple single rule, perhaps a 1_3s, will be chosen as the right QC procedure. Then those 10_x “violations” wouldn’t count again.

Perhaps it’s important to note this: just because a rule exists doesn’t mean it needs to be implemented. It’s tempting to lock on 10_x violations once you know about the 10_x rule. But there are times to use the rule and there are times to not use the rule.

Scenario #10

“If I have 6 analyzers performing Glucose, I run my controls 10-15 times on each analyzer to establish my mean and SD on each analyzer. Then I pool all of them and establish a grand mean across the board with CV. When I use this it is too narrow, because not all analyzers behave the same way to the exact precision. In this case how much I can loosen up the SD or CV, so that I can monitor the QC without causing too much problems? Is there a guideline or a standard I can use? Or what should be the criteria. Is there a maximum CV analytical I can use across the board?”

First, this is one example where we can be more specific because we’re working with a single analyte: glucose. We know, for example, that the CLIA criteria for proficiency testing for glucose is 10%. That gives us an analytical goal for our QC Design process.

This laboratory is trying to do the right thing with their data. Pooling the data together to establish a grand mean can be useful – if it’s used to assess bias between the analyzers. But calculating a peer sd from that data and applying it to all the individual instruments is not a good practice. In general peer sds are wider than individual instrument sds.

What’s surprising is that the laboratory has found this wider sd is still too narrow. Here is where we have to start making guesses. Let’s assume the “narrow” complaint actually stems from the use of 2s control limits. The number of false rejections caused by 2s limits is making the director conclude that the limits are too tight and should be loosened up.

Again, the end result of all this diligent effort and good intention is the wrong QC wrong. What actually needs to be done is that each instrument needs its own performance and its own QC chart and its own QC Design. That QC Design might actually result in those desired looser rules, and at the very least, the QC Design process eliminates 2s control limits. The pooled data can still be used to assess bias, which will be included in the QC Design process. While it may seem daunting to create specific rules for each instrument, it’s quite likely that they will all have very similar performance and end up with the same control rules anyway. But if one of those instruments is the runt of the litter, it better get different treatment from the lab.

Conclusion: It's easy to get it wrong

There may only be 50 ways to leave your lover, but this list of just ten problems shows us that there are probably an infinite number of ways to do the wrong QC wrong.

Let us stress this one more time: the people who submitted these questions and problems were not trying to get it wrong. They were well intentioned professionals trying to do things correctly. Believe us when we say there are far worse practices out there: laboratories that don't care about quality at all and would never consider asking a question about what's the right thing to do.

Doing the right QC right is not easy. One mistep and your good intentions can be led astray. That's why quality is so valuable - because it means you've taken the care at every step. That's why doing the right things right is so important - because it means you're delivering the best possible results to the physician and patient.

Tools, Technologies and Training for Healthcare Laboratories

"Westgard Rules"