FAQ's on N, runs, control limits

It's time once again to post some of the questions we received in the last few months. These questions were important enough that we thought the answers should be shared with everyone. The definition of N and a run is discussed, as well as z-values, defect rates, external QC when there is no peer group, etc.

What is N? What is a run?
How can I find quality requirements for hematology?
Can I make up my own control limits?
The z-value and defect rates?
How do you do external QC for a test where there is only a small or no peer group?

What is N? What is a run?

In hematology we run 2 controls each 8 hour shift. This is a three level control, but we alternate which level is run by which shift. All three levels are run in a 24 hour period. What is N? What is a run (each 8 hr shift? Each 24 hour period?)

In chemistry, our main analyzer has one control run each eight hour shift. This is a three level control. Each level is run once in a 24 hour day. What is N? What is a run? Is this too little QC?? How do you know if it is or isn't enough?

These are good questions and everyone is having trouble understanding these issues.

When you analyze one control per shift, as described for your chemistry operation, N is really 1 because you are making a decision on the performance of the method for that shift. Even though you collect a total N of 3 for a 24 hour period, it doesn't sound like the data are being interpreted and applied to the whole 24 hour period. If you did and were out-of-control, you'd have to go back and do something about work reported on earlier shifts.

With hematology, it sounds like N is 2 for each shift, which is probably the data reporting period affected when the QC data is evaluated. These are indeed low Ns. Whether they are sufficient to detect medically important problems depends on the performance of your methods and the quality requirement for medical applications. We recommend using a "QC planning process" to make a quantitative assessment of these factors and their influence on the control rules and Ns that are necessary. See the QC planning applications on the website.

This question comes from the Sysmex Reagents Data Center is Los Alamitos, CA:

How do I find realistic quality requirements for hematology?

I agree that determining each Test's Quality Requirement (Total Allowable Error) is the critical first step, unfortunately it is also the bottle neck I (and how many others?) cannot get past. I tried to come up with realistic Quality Requirements for Hematology - short using the grossly wide CLIA criteria for acceptable proficiency testing performance. NCCLS H26-A Performance goals for the Internal Quality Control of Multichannel Hematology Analyzers (DEC 1996), suggests limiting accuracy bias as required to maintain less than 5% False Pos or Neg rate at medical decision levels. They show an example for WBC that indicates only 0.1x10^3uL negative and only 0.2 positive accuracy error can allowed at a 3.0 count (-3.3 and +6.0% bias respectively). They do not elaborate on how one might replicate the necessary calculations. It would be very helpful for the math-challeged like myself to have this in detail.

Since that route to Quality Requirements was blocked by my lack of usable grey matter, I went with the concept from your European guest essays that allowable accuracy bias should be a fraction of the combined intra & inter-individual biological variation. I went to the European Biological Variation Data Base and found only a few Heme parameters had data available.

For available parameters I calculated "Combined Biological variation" as
follow.WBC 22.66; NEUT# 40.41; LYMPH# 28.76; MONO# 27.97; RBC 7.31; HBG 7.07; PLT 24.98

I determined the avg. analyzer CV% of n=10 run from 28 fresh whole bloods on a Sysmex SE-9500. Avg. analytic CV% are as follow: WBC 2.03; NEUT# 3.32; LYMPH# 3.45; MONO# 7.75; RBC 0.68; HBG 0.35; PLT 1.95

I used the Biologic Allowable Total Error (TE_ba)formula from Hyltoft-Peterson's guest essay. (combined biologic variation *0.25) + (1.65 * analytic imprecision). These are the TE_ba results: WBC 9.01; NEUT# 15.57; LYMPH# 12.89; MONO# 19.77; RBC 2.95; HBG 2.34; PLT 9.48

These quality requirements feel to be of the correct order of magnitude.What do you think of this approach? I am not so certain, and before I invest the time to establish biologic variability data on all hematology parameters, I would like to cross check these outcomes with the NCCLS "Medical decision level 5% false Pos/Neg" model.

Good to hear from you and hear about the work you're doing. You raise a number of issues:

(1) Let me start by asking whether you're trying to set instrument specs or select appropriate QC procedures? Those are two different applications, but both can be handled by using "charts of operating specifications" (OPSpecs charts). I'm assuming you're familiar with OPSpecs charts if you've been accessing material on our website. The best intro/overview to quality-planning is the discussion on Principles of QC Planning for Immunoassays.

(2) The CLIA requirements seem quite broad until you include QC performance,
then they are actually quite reasonable. With a low number of controls (which is typical in laboratories today), it is difficult to detect systematic changes in the process unless the errors are 2 to 4 times the SD of the method. The relationship between the CLIA TE and the method performance characteristics approximates bias + 2SD + 2SD (this last 2SD being an allowance for the performance of a good QC procedure). Our OPSpecs
charts provide a more exact allowance for different control rules and different numbers of control measurements.

(3) The NCCLS H26-A approach is based on George Klee's model which only
provides info about the effect of bias on diagnostic classification. There really is little detail on what to do with that bias or what that bias represents. From my perspective, that bias should be the critical systematic error that needs to be detected by a QC procedure. One approach for using that information to set performance specs or select QC procedures would be to use critical-error graphs (which are also discussed on our
website).

(4) Looks like you're handling the European biologic recommendations properly. Once the biologic allowable total error is calculated, quality planning can proceed just like when using the CLIA TE. Again, OPSpecs charts are probably the best tool. We will facilitate the use of biologic goals in the next version of our QC Validator program - the user can enter the allowable SD and allowable bias, the program calculates the biologic
allowable total error, which can then be used in the automatic QC selection process.

(5) Not sure that it's necessary for you to establish your own data on biologic variability, unless you can't find anything in the literature. We have included one "data-bank" on our website. Carmen Ricos from Spain is compiling a more complete "data-bank" which should become available soon.

A question from Turkey:

Can I make up my own control limits?

Anyway, I would like to ask a question about determining the control limits on the Levey Jennings charts. When we bought new control material, we must analyze consecutive 20 days, and calculate mean and SD. According to these values, we calculate the control limits, for example +, -2s.

My questions are: If we bought a control material with assigned value, and we couldn't measure that value? Can we calculate the control limits by using the allowable total error values, and quality specifications?

An example for my proposal:
For Glucose:
Assigned value: 100 mg/dL
TE_a from CLIA 6 mg/dL
S_a must be 6/3=2 mg/dL
Then the control limits must be 2x2 = 4 mg/dL for +-2s limits

For AST
The assigned value: 40 U/L
TE_a from CLIA target+-%20
Sa must be < TE_a/3 TE_a=.2x40=8 U/L S_a=<8/3=approx. 3
Then the control limits are 3 (for 1s); 6 (for 2s) and so on.

I don't think this is a good way to set control limits. Regardless of what rationale is used to decide where to draw the control limits, the lines still represent statistical control rules - it's just that the real rules are different from what you think they are.

Take a look at the discussion of "medical decision limits". That explains how to assess the actual statistical control rule that is being implemented, regardless how the line is drawn. Error detection depends on the power functions for the actual rule.

Hope this helps clear up this issue.

This question comes from Dr. Graham Jones of St. Vincent's Hospital in Sydney, Australia:

Z-value and the defect rate

I am interested in the choice of the value of z (Lesson: Critical-Error Graphs) which has been set at 1.65. This value is taken from the normal distribution to allow 5% of results on one side of the tail to exceed this multiple of the SD. From the above, if the chance of non-detection of a shift in mean of XSD is 10% (ie using a QC protocol with 90% detection rate) then the chance of producing a result (X+1.65)SD is 10% by 5% = 0.5%. Are we then going to rediculous lengths to avoid this level of change, or is this mathematics affected by the run length (number of patient samples between QCs on an automated system)?

A very good question, which gets to the core of the matter of managing quality - the defect rate of the testing process.

By selecting a z-value of 1.65 in the quality-planning model, we will detect detect a problem run at the point where 5% of the specimens in the run would have errors that exceed the stated quality requirement. The rationale is to stop the method when it starts to produce medically important errors at a low rate.

If we have 90% error detection, the maximum rate of errors would be 0.10*0.05 or 0.005, which is 5 defects per 1000 or 5000 defects per million. A maximum defect rate of 5000 ppm would actually be considered ridiculously high in most production operations.

The actual defect rate should be less and depends on the frequency of problems with the method. Assuming a frequency of problems with the method of 0.01, i.e., 1 run in a 100, the actual defect rate should be 0.01*0.10x*.05 or 50 defects per million, or 5 per 100,000 or 1 per 20,000. If the frequency of problems were 0.1 or 10%, then the expected defect rate would be 1 per 2,000.

The influence of method stability on defect rate may permit the use of lower error detection on very stable processes. For example, for the method above with a frequency of problems of only 1%, 50% error detection would still limit the expected defect rate to 5 per 20,000 or 1 per 4,000. However, if the frequency of problems were 10%, the expected defect rate would be 1 per 400 which is probably not acceptable.

There has been little discussion of what would be an acceptable defect rate, or for that matter, little discussion of the idea of defect rate pertaining to test results. Part of the reason is that the definition of a defect requires the definition of the quality desired in a test result, and most laboratories have yet to define the quality goals or requirements for their testing processes.

Hope this will be helpful for understanding the choices of z-values and probabilities of error detection. Note that the z-values can be changed in the QC Validator program, therefore the quality-planning models can be modified to your preference.

This series of question comes from the Pathology Department of the University of Alabama at Birmingham:

How do you do external QC for a test where there is a small or no peer group?

With rapidly emerging technologies and the wide diversity of testing instruments available there is an increased incidence of tests that are included in an external QC program but which have either a very small or no peer group with which to compare. How can we best ensure that these tests are "in control"?

The main value of a peer comparison group is to provide an estimate of bias or systematic error, i.e., to monitor accuracy. The laboratory "internal" QC should be set up using the means and SDs observed within the laboratory, not peer means and SDs. When selecting the appropriate control rules and number of control measurements for internal QC, the difference between the laboratory's observed mean and the peer mean can be used as an estimate of method bias. For example, in the QC planning process that we recommend, the estimate of bias could come initially from method validation studies (comparison of methods experiment) and later from peer comparison data (to monitor accuracy long term).

Thank you for your prompt response. Given that QC values drift with time due to reagent changes, machine changes, etc., in the absence of a peer group, what is the best means of routinely determining the accuracy/bias of a particular test?

That's a more difficult question and problem for the laboratory.

I guess the first effort should be to monitor the stability of monthly control statistics, i.e., monthly means and SDs. Changes in the observed mean should trigger some additional investigation. There's no reason to expect or to accept a drift in control means. Careful "standardization" of the preparation and processing of the control materials will be important to minimize variation from the materials themselves. Use of control materials from at least two different manufacturers should be helpful to reduce problems with the materials themselves. Seems particularly good practice in light of the recent Abbott-FDA consent decree and the need of laboratories to establish "independent" QC procedures to assure the quality of the final test results.

If a comparitive method exists, you can make periodic checks of real patient samples between the two methods.
If the test is subject to proficiency testing, you can monitor the average bias for the group of PT samples.
If there is the capability of calculating patient means, they may provide another way to monitor stability over time.

Lots of "ifs", but so much depends on the particular test and system under consideration.

Tools, Technologies and Training for Healthcare Laboratories

Questions