METHOD VALIDATION -
THE
DECISION ON METHOD PERFORMANCE
James O. Westgard, Ph.D.
You've performed the experiments, tabulated the results, plotted
the data, and calculated the statistics. Now you have to make
a decision on the acceptability of the method. How do you decide
whether the method is good enough to use in your laboratory?
What's the right approach?
Remember the inner hidden deeper secret
meaning of method validation - ERROR ASSESSMENT. The decision
on the acceptability of method performance depends on the size
of the observed errors relative to some "standard" or
quality requirement that defines the medically allowable error.
Method performance is acceptable when the observed errors are
smaller than the medically allowable error. Method performance
is NOT acceptable when the observed errors are larger than the
medically allowable error.
You should actually define the medically allowable errors at
important medical decision levels in the beginning to help guide
the design of the experiments and the collection of the data.
What will remain to be done, then, is to compare your observed
errors with the defined medically allowable errors.
How should a requirement for medically allowable
errors be stated?
In the scientific literature, requirements for analytical quality
have been defined in three different formats - allowable total
error, allowable SD, and allowable bias. An allowable total error
sets a limit on the combined effect of the random and systematic
errors of a method, whereas an allowable SD and an allowable bias
set separate limits for random and systematic errors, respectively.
Separate requirements for allowable SD and allowable bias would
appear to be useful because these statistics can be calculated
directly from the experimental data (e.g., an SD is calculated
for the data from a replication experiment and a bias for the
data from a comparison of methods experiment). However, the quality
of a patient test result is determined by the net or total effect
of both the random and systematic errors, therefore the total
error is more relevant medically [1].
"The physician thinks rather in terms of the total analytical
error, which includes both random and systematic components.
From his point of view, all types of analytic error are acceptable
as long as the total analytic error is less than a specified
amount. Total error is medically more useful; after all, it makes
little difference to the patient whether a laboratory value is
in error because of random or systematic analytical error, and
ultimately he is the one who must live with the error."
Where do you find recommendations for
allowable total errors?
A common source is the external quality assessment survey or
proficiency testing program in which you participate. These programs
generally define a central "target value" and a range
of values around that target that are considered acceptable. Because
these programs usually ask for a single analysis on each survey
specimen, both the random and systematic errors of your method
will affect the results. The "acceptable range" is therefore
an analytical performance requirement in the format of an allowable
total error.
For US laboratories, the most readily available list of total
error criteria are provided by the CLIA proficiency testing criteria
for acceptable performance, which have been published in the Federal
Register [2] and provide recommendations for some 80 different
tests. See the list of criteria provided on
this website. These criteria are presented in three different
ways:
- As an absolute concentration limit, e.g., target value plus
or minus 1 mg/dL for calcium;
- As a percentage, e.g., target value plus or minus 10% for
albumin, cholesterol, and total protein;
- As the range determined from a survey group, e.g., target
value plus or minus 3 standard deviations for thyroid stimulating
hormone.
In a few cases, two sets of limits are given, e.g., the glucose
requirement is target value plus or minus 6 mg/dL or plus or minus
10%, whichever is greater. At a medical decision level of 50 mg/dL,
the allowable total error is 6 mg/dL or 12%. At a medical decision
level of 125 mg/dL, the allowable total error is 10% or 12.5 mg/dL.
For information on medical decision levels,
see Dr. Statland's guidelines on this website.
How are the observed errors compared
to a total allowable error?
To estimate the random error of the method from the replication
experiment, you will have calculated an SD or CV. To estimate
systematic error from the comparison of methods experiment, you
will calculate the bias between the means obtained by the test
and comparative methods, or will use regression statistics to
calculate the expected difference at particular medical decision
levels. These estimates of random and systematic errors need to
be combined to judge their total effect.
The literature provides three different recommendations on
how to combine random and systematic errors:
- Add bias plus 2 times the observed SD [1], i.e., bias + 2SD
< TEa;
- Add bias plus 3 times the observed SD [3], i.e., bias + 3SD
< TEa;
- Add bias plus 4 times the observed SD [4], i.e., bias + 4SD
< TEa.
Rather than choose between these recommendations, all three
can be utilized in a graphical decision tool, or a method decision
chart [5]. The chart is simple to construct, minimizes the need
for additional calculations, and provides a graphical picture
that simplifies the interpretation and judgment on method performance.
How do you construct a method decision
chart?
First, express the allowable total error as a percentage of
the medical decision concentration. Most CLIA allowable errors
are already given in percent. For those given in concentration
units, express the allowable error as a percent of the medical
decision concentration of interest, i.e., divide the allowable
error by the medical decision concentration and multiply by 100
to express as a percentage.
Next, take a sheet of graph paper
and do the following:
- Label the y-axis "Allowable inaccuracy, (bias,%)"
and scale from 0 to TEa, e.g., if TEa is 10%, scale the y-axis
from 0 to 10% in increments of 1%.
- Label the x-axis "Allowable imprecision, (s,%)"
and scale from 0 to 0.5 TEa, e.g., if TEa is 10%, scale the x-axis
from 0 to 5% in increments of 0.5%.
- Draw a line for bias + 2SD from TEa on the y-axis to 0.5
TEa on the x-axis, e.g., if TEa is 10%, draw the line from 10%
on the y-axis to 5% on the x-axis.
- Draw a line for bias + 3SD from TEa on the y-axis to 0.33
TEa on the x-axis, e.g., if TEa is 10%, draw the line from 10%
on the y-axis to 3.33% on the x-axis.
- Draw a line for bias + 4SD from TEa on the y-axis to 0.25
TEa on the x-axis, e.g., if TEa is 10%, draw the line from 10%
on the y-axis to 2.5% on the x-axis.
- Label the regions "poor, marginal, good, and excellent,"
as shown in the accompanying figure.
How do you use the method decision chart?
Express your observed SD and bias in percent, then plot the
point whose x-coordinate is your observed imprecision and y-coordinate
is your observed inaccuracy. This point is called the "operating
point" because it describes how your method operates. You
judge the performance of your method on the basis of the location
of the operating point, as follows:
- A method with "poor performance" does not
meet your requirement for quality, even when the method is working
properly. It is not acceptable for routine operation.
- A method with "marginal performance" provides
the desired quality when everything is working correctly. However,
it will be very difficult to manage in routine operation and
will require a Total QC strategy that emphasizes well-trained
operators, reduced rotation of personnel, more expensive statistical
QC, more aggressive preventive maintenance, careful monitoring
of patient test results, and continual efforts to improve method
performance.
- A method with "good performance" meets your
requirement for quality and can be well-managed in routine service
if you plan the statistical QC procedure carefully and are willing
to spend the resources necessary to implement a multirule procedure
with 4-6 control measurements per run.
- A method with "excellent performance" is
clearly acceptable because it will be easy to manage in routine
service and can be controlled with minimum expense, usually with
single-rule control procedures and the minimum of 2 control measurements
per run.
Example applications
A. Albumin method with CV of 2.0%
and bias of 0.0% at 3.5 g/dL. The CLIA requirement for analytical
quality is 10%, therefore we can use a method decision chart
exactly like the one shown earlier. The operating point will
have a y-coordinate of 0.0% and an x-coordinate of 2.0%, as shown
by operating point "A". This method is clearly acceptable
and will be easy to control in routine operation.
- B. Cholesterol method with a CV of 2.0% and a bias of 2.0%
at 200 mg/dL. The CLIA requirement is also 10%, therefore the
same method decision chart can be used. The operating point will
have a y-coordinate of 2.0% and an x-coordinate of 2.0%, which
falls on the line between "excellent performance" and
"good performance," as shown by operating point "B"
on the accompanying chart. A careful assessment of this case
has shown that a multirule QC procedure with an N of 4 can guarantee
the desired quality will be achieved by this method [6]. See
the cholesterol QC planning application on this website.
- C. Cholesterol method with CV of 3.0% and bias of 3.0% at
200 mg/dL. The National Cholesterol Education Program (NCEP)
recommends that a routine method should have a CV of 3.0% or
better and a bias of 3.0% or better [7]. To assess whether these
performance specifications are adequate, an operating point can
be plotted with a y-coordinate of 3.0% and an x-coordinate of
3.0%, which is labeled "C" in the accompanying figure.
Such a method would have "marginal performance", which
means the quality will be okay if everything is working perfectly,
but it will be very difficult to detect problems and maintain
the desired quality during routine service operation. [Note that
we are at odds with CDC on the methodology that is appropriate
for setting performance specifications. Reference 7 provides
some of the discussion and debate. Reference 8 is a recent, expansive
description of the CDC-NCEP methodology.]
- D. Glucose method with CV of 4.0% and bias of 3.0% at 120
mg/dL. The CLIA requirement for glucose is also 10% when considering
any decision level greater than 100 mg/dL, therefore the same
method decision chart can be used. The operating point will have
a y-coordinate of 4.0% and an x-coordinate of 3.0%, as shown
by operating point "D". This method is not acceptable
for routine operation.
Try it!
Define the allowable total error for your test. Construct a
method decision chart using a page of graph paper. Plot your observed
inaccuracy (percent bias) from the comparison of methods experiment
versus the observed imprecision (percent CV) from the replication
experiment. See where this operating point is located and judge
whether or not you want to implement the method for routine service.
Try it with our new web tools!
You can also use the Normalized Operating
Point Calculator on this website to calculate your observed
inaccuracy and observed imprecision as a percentage of the allowable
total error, print the accompanying Normalized
Method Decision Chart, and then plot your normalized operating
point on that normalized chart. Again, see where this operating
point is located and judge whether or not you want to implement
the method for routine service. The outcome will be the same as
from a regular (non-normalized) decision chart.
References:
- Westgard JO, Carey RN, Wold S. Criteria for judging precision
and accuracy in method development and evaluation. Clin Chem
1974;20:825-33.
- U.S. Department of Health and Social Services. Medicare,
Medicaid, and CLIA Programs: Regulations implementing the Clinical
Laboratory Improvement Amendments of 1988 (CLIA). Final Rule.
Fed Regist 1992(Feb 28);57:7002-7186.
- Ehrmeyer SS, Laessig RH, Leinweber JE, Oryall JE. 1990 Medicare/CLIA
final rules for proficiency testing: Minimum interlaboratory
performance characteristics (CV and Bias) needed to pass. Clin
Chem 1990;36:1736-40.
- Westgard JO, Burnett RW. Precision requirements for cost-effective
operation of analytical processes. Clin Chem 1990;36:1629-32.
- Westgard JO. A method evaluation decision chart (MEDx Chart)
for judging method performance. Clin Lab Science. 1995;8:277-83.
See PDF files on this website.
- Westgard JO, Wiebe DA. Cholesterol operational process specifications
for assuring the quality required by CLIA proficiency testing.
Clin Chem 1991;37:1938-44.
- Westgard JO, Wiebe DA. Adequacy of NCEP recommendations for
total cholesterol, triglycerides, HDLC, and LDLC measurements.
Clin Chem 1998;44:1064-1066.
- Caudill SP, Cooper GR, Smith SJ, Myers GL. Assessment of
current National Cholesterol Education Program guidelines for
total cholesterol, triglycerides, HDL-cholesterol, and LDL-cholesterol
measurements. Clin Chem 1998;44:1650-8.
References
- U.S. Department of Health and Human Services. Medicare, Medicaid
and CLIA programs: Regulations implementing the Clinical Laboratory
Improvement Amendments of 1988 (CLIA). Final rule. Fed Regist
1992; 57:7002-186.
- NCCLS Document EP6-P. Evaluation of the linearity of quantitative
analytical methods. NCCLS, 940 West Valley Road, Suite 1400,
Wayne, PA, 1986.
Copyright © 2000. All rights reserved.
Westgard QC, 7614 Gray Fox Trail, Madison WI 53717
Call 608-833-4718 or e-mail us at westgard@westgard.com
A Message from
JOW
QC Lessons | QC
Applications | Questions | Multirule
CLIA Requirements |
What's New? | Catalog
| Demo Download
Home | Glossary
| ARCHIVES | Links
| Feedback