METHOD VALIDATION - The detection limit experiment is intended to estimate the lowest concentration of an analyte that can be measured. This low concentration limit is obviously of interest in forensic drug testing, where the presence or absence of the drug may be the critical information desired from the test. Analytical performance at low concentrations is also important for tumor markers, such as prostate specific antigen (PSA), when patient values after treatment may be useful for monitoring "biochemical relapse" [1].
US laboratory regulations require that detection limit (or analytical sensitivity) be verified only for high complexity methods, modified moderate complexity methods, and moderate complexity methods that have not been cleared by FDA as meeting the CLIA requirements for quality control. Given that FDA has not implemented a QC clearance process, the requirement to verify detection limit for moderate complexity methods has been postponed. However until such time that QC clearance has been implemented, good laboratory practice should dictate that detection limit be verified, when relevant, e.g., all forensic and therapeutic drug tests; TSH and similar immunoassay tests; PSA and other cancer markers - and not glucose, cholesterol, enzymes, and constituents where reference range is more relevant for interpretation of the test results.
Terminology in this area is a mess! In making their claims, manufacturers often use a wide variety of terms, such as sensitivity, analytical sensitivity, minimum detection limit, functional sensitivity, limit of detection, and limit of quantitation. At this time there are no accepted standard definitions of these terms, therefore, it is necessary to find out what the actual experimental procedure was, how the data were calculated, how the estimate was made from the data, and whether this estimate is useful for medical application of the test.
A
general description of the experimental procedure is provided
in the accompanying figure. Two different kinds of samples are
generally prepared. One sample is a "blank" that has
a zero concentration of the analyte of interest. The second is
a "spiked" sample that has a low concentration of the
analyte of interest. In some situations, several spiked samples
may be prepared. Both the blank and spiked samples are measured
repeatedly in a replication type of experiment, then the means
and SDs are usually calculated from the values observed for the
samples. Different estimates of detection limit may be calculated
from the data on blank and spiked samples.
Blank solution. One aliquot of the blank solution is typically used for the blank and another aliquot is used to prepare the spiked sample. Ideally the blank solution should have the same matrix as the regular patient samples. However, it is common to use the "zero standard" from a series of calibrators as the blank and the lowest standard as the "spiked" sample.
Spiked sample. In validating the
performance of a method, the amount of analyte added to the blank
solution should represent the detection concentration claimed
by the manufacturer. In establishing a detection limit, it will
often be necessary to prepare several spiked samples whose concentrations
are in the analytical range of the expected detection limit. For
certain tests, there may also be an interest in using samples
from patients who are free of disease following treatment (i.e.,
PSA sera from patients treated for prostate cancer) [2].
Number of replicate measurements.
There is no hard and fast guideline, but 20 replicate measurements
are usually recommended in the literature. This number is reasonable
given that the detection limit experiment is a special case of
the replication experiment and that 20 is the minimum number of
measurements recommended for a replication study. Manufacturers
often recommend 10 measurements in their verification protocols
to minimize cost and laboratories often adopt this lower number
of measurements for practicality.
Time period of study. A within run or short term study is often carried out when the main focus is the method's performance on the blank solution. A longer time period, representing day-to-day assay performance, is recommended when the focus is on the "spiked" sample [2]. When day-to-day performance is considered, practicality may dictate using 10 measurements (and 10 days) rather than a longer time period.
Quantity to be estimated. Here's where
it gets confusing. There are at least three different concepts
(and terms) that are commonly used, as illustrated in the accompanying
figure. The determination of these different quantities involves
different calculations with the data from the blank and spiked
samples.
Consider an example application where the blank and the spiked samples are the zero and 10 ug/L standards. [For convenience and comparison, this example is similar to the one for PSA in Table 1 of reference 1.] Both samples were analyzed 10 times and the means and SDs calculated. For the zero standard, the mean is 1000 units and the SD is 100 units (raw measurement responses being used). For the 10 ug/L sample, the mean is 2000 units and the SD is 200 units.
Lower Limit of Detection (LLD). The manufacturer's claim makes use of a 2 SD definition of LLD (i.e., meanblk + 2sblk) based on 10 replicate measurements.
Biological Limit of Detection (BLD). The manufacturer's claim again makes use of a 2 SD definition of BLD (i.e., LLD + 2sspk) based on 10 replicate measurements.
Functional sensitivity (FS). A manufacturer's claim is 10 ug/L for the functional sensitivity of a method based on 10 replicate measurements.
In validating a manufacturer's performance claim for detection limit, it is important to recognize the specific form of the claim, the data need to verify that claim, and the data calculations appropriate for that form of the claim. Many manufacturers seem to choose the LLD quantity because it is simplest to estimate and also gives the lowest number - a marketing application. For medical applications, it would generally be more useful to estimate BLD or FS.
As mentioned earlier, the terminology and experimental procedures in this area are not yet standardized and additional terms and alternate experimental procedures will likely be encountered. The National Committee for Clinical Laboratory Standards (NCCLS) is working to establish a standard of practice in this area and has a draft document under development [7]. This effort will hopefully lead to a more rational and systematic understanding of detection limit in the near future.
