Definitions of important terms are obtained from documents of professional societies and organizations such the American Society for Quality (ASQ), the International Federation of Clinical Chemistry, (IFCC), as summarized in the document NRSCL8-A “Terminology and definitions for use in CLSI Documents; Approved Standard,” published by the Clinical Laboratory Standards Institute, Wayne, PA., and the International Organization for Standardization (ISO). Bear in mind, in some cases, the definitions are based on the use of these terms in the context of the approach to analytical quality management and the discussions in our lessons on QC practices, method validation, etc.
RE. A term that describes a change in random error from the stable imprecision of the method (smeas). A value of 2.0 indicates a doubling of the original method standard deviation, whereas a value of 1.0 represents the original stable standard deviation. Used in our quality planning models or error budget equations to indicate the increase in random error that can be detected by a control procedure. REcrit is a special case that represents the increase in random error that needs to be detected to maintain a defined quality requirement.
SE. A term that describes a change in systematic error in multiples of the standard deviation (smeas) observed for the method under stable conditions. A value of 2.0 would indicate a systematic shift equivalent to two times smeas. Used in our quality planning models or error budget equations to indicate the size of systematic error that can be detected by a control procedure. SEcrit is a special case that represents the systematic shift that needs to be detected to maintain a defined quality requirement.
Accuracy. “Closeness of the agreement between the result of a measurement and a true value of the measurand. Usually expressed in the same units as the result, as the difference between the true value and the value, or as a percentage of the true value that the difference represents; expressed this way the quantity is more correctly termed ‘inaccuracy.’” [CLSI] This is a total error definition of accuracy that encompasses both the random and systematic errors in a measurement, in contrast to the more traditional (older) concept that accuracy was related only to systematic error (and is now being called “trueness”). In the context of the method validation here, the terms random error, systematic error, and total error are generally used to clarify the intent and meaning.
Aliquot. Measured portion of a whole having the same composition. General term referring to part of a solution, sample, mixture, etc.
Allowable total error, TEa. An analytical quality requirement that sets a limit for both the imprecision (random error) and inaccuracy (systematic error) that are tolerable in a single measurement or single test result.
Analysis of variance, ANOVA. A statistical technique by which the observed variance can be divided into components, such as within-run and between-run variance which make up the total variance observed for an analytical method. Commonly used for analysis of the data from a replication study to make estimates of within-run and total imprecision when the data has been carefully collected over a series of runs and days.
Analyte. The substance to be measured. CLSI recommends using the term “measurand” “for a particular quantity subject to measurement.”
Analytical error. Difference between the estimated value of a quantity and its true value. This difference (positive or negative) may be expressed either in the units in which the quantity is measured, or as a percentage of the true value. Used here to mean the difference between a patient’s test result produced by the analytical process and the true or correct value for that sample.
Analytical measurement range, AMR. Defined by CAP (College of American Pathologists) as the range of numeric results a method can produce without any special specimen pre-treatment, such as dilution, that is not part of the usual analytic process. Same as reportable range in CLIA terminology.
Analytical quality assurance (AQA). Used with charts of operating specifications (OPSpecs charts) to indicate the level of assurance for detecting critical sized errors. For example, 90% AQA(SE) means there will be at least a 90% chance of detecting the critical systematic error when operating within the allowed limits for imprecision and inaccuracy for the given control rules and total number of control measurements (N).
Analytical quality requirement. Used here to refer to a quality requirement in the form of an allowable total error (TEa). Often defined on the basis of proficiency testing criteria for acceptable performance, such as the CLIA requirements for regulated analytes.
Analytical run. Generally defined by CLIA as an 8 hour to 24 hour interval during which control materials must be analyzed. According to CLSI C24, a run is “an interval (i.e., a period of time or series of measurements) within which the accuracy and precision of the measuring system is expected to be stable. In laboratory operations, control samples are analyzed during each analytical run to evaluate method performance, therefore the analytical run defines the interval (period of time or number of specimens) between evaluations of control results. Between quality control evaluations, events may occur causing the measurement process to be susceptible to variations that are important to detect.”
Analytical sensitivity. The ability of an analytical method to detect small quantities of the measured component. Numerically characterized by determination of detection limit.
Analytical specificity. The ability of an analytical method to measure only the sought-for analyte or measurand. Numerically characterized by determination of interferences and non-specific responses to other analytes or materials.
Analytic process, system. Refers to the part of the total testing process that involves measurement and analysis, as opposed to the pre-analytic part that deals with all the steps prior to performing the test and the post-analytic part that deals with all the steps after the test result is once available.
Application characteristics. Refer to those properties of a method which determine whether the method can be implemented in the particular laboratory of interest, assuming the method will perform acceptably. Characteristics such as cost, sample size, specimen type, etc.
Assay. Sometimes used to mean a test or test result, but other times as a synonym for the analytical method.
Assayed control material. A control solution or control material for
which the manufacturer has measured the values expected for different tests and different methods. These “bottle values” are summarized on assay sheets and are useful for selecting control materials, however, they should not generally be used for setting the control limits because these values usually include a between-laboratory component of variation that makes the control limits too wide for use in an individual laboratory.
Bias. A systematic difference or systematic error between an observed value and some measure of the truth. Generally used to describe the inaccuracy of a method relative to a comparative method in a method comparison experiment. It also has a specific meaning in the statistical t-test, where bias equals the difference between the mean values of the two methods being compared or the average of all the differences between the paired sample values.
Biasmeas. Used here to represent the bias of a measurement procedure relative to a comparative method or a comparative group in proficiency testing.
Biological limit of detection, BLD. An older term that was commonly used to refer to an estimate of detection limit calculated from replicate measurements of a blank sample and replicate measurements of a low concentration sample. Typically the estimate is given as the mean of the blank sample plus 2 SD of the variation observed for the blank sample plus 2 SD of the variation observed for a low concentration sample. In effect, BLD equals LLD (the lower limit of detection) plus 2 SD of the variation observed for the low concentration sample.
Bland-Altman plot. The display of paired-data from a comparison of methods experiment by plotting the differences between the test and comparative results on the y-axis versus the average of the test and comparative results on the x-axis. Similar to a traditional “difference plot”, except that the average of the test and comparative results provides the x-value rather than the value of the comparative result alone.
Calibration.“The process of testing and adjustment of an instrument, kit, or test system, to provide a known relationship between the measurement response and the value of the substance being measured by the test procedure.” [CLSI]
Calibration verification. “The assaying of calibration materials in the same manner as patient samples to confirm that the calibration of the instrument, kit, or test system has remained stable throughout the laboratory’s reportable range for patient test results.” [CLSI]
Certified reference material, CRM. “A reference material that has one or more values certified by a technically valid procedure and is accompanied by, or is traceable to, a certificate or other document that is issued by a certifying body.” [CLSI]
Clinical quality requirement. Used here for a quality requirement that states the medically important change in a test result or describes the gray zone or decision interval for interpreting a test result.
Clinical reportable range, CRR. Defined by CAP (College of American Pathologists) as the lowest and highest numeric results that can be reported after accounting for any specimen dilution or concentration that is used to extend the analytical measurement range.
Clinical sensitivity, diagnostic sensitivity. A measure of how frequently a test is positive when a particular disease is present. Generally given as a percentage of individuals with a given disease who have a positive test result. Ideally, a test should have a sensitivity of 100%, i.e., the test should always give a positive result when the patient has the particular disease. CLSI prefers the term true positive ratio, TPR.
Clinical specificity, diagnostic specificity. A measure of how frequently a test is negative in the absence of a particular disease. Generally given as the percentage of individuals without a given disease who have a negative test result. Ideally, a test should have a specificity of 100%, i.e., the test should always give a negative result when the patient does not have the disease. CLSI prefers the term true negative ratio, TNR.
Clinically significant, clinically significant difference. Used in method validation to describe a difference or error that is larger than allowable for the clinical or medical use of a test. An important conclusion when judging the acceptability of a method’s performance, in contrast to a statistically significant difference, which only infers that a difference is larger than the experimental uncertainty of the data.
Coefficient of variation, CV. The relative standard deviation, i.e., the standard deviation expressed as a percentage of the mean [CV=100(s/x)].
Combined standard uncertainty. Standard uncertainty of the result of a measurement when that result is obtained from the values of a number of other quantities, equal to the positive square root of a sum of terms, the terms being the variances or covariances of these other quantities weighted according to how the measurement result varies with changes in these quantities. [ISO]
Comparative method, comparative analytical method. Used here to indicate the analytical method to which the test method (the one under study) is compared in the comparison of methods experiment. This term make no inference about the quality of the method. Other terms such as definitive method, reference method, designated comparative method, or field method can be used to make some inference about the quality of the comparative method.
Comparison of methods experiment. A method validation experiment in which a series of patient samples are analyzed both by the test method (the one under study) and a comparison method (an established method). The purpose is to assess whether a systematic difference (i.e., bias or inaccuracy) exists between the two methods. The differences in results by the two methods are interpreted as analytical errors between the methods.
Comparison graph, comparison plot. Used here to refer to the display of paired test results in which the test method values are plotted on the y-axis versus the comparison method values on the xaxis.
Confidence interval, confidence range. The interval or range of values which will contain the population parameter with a specified probability.
Constant systematic error, CE. An error that is always the same direction and magnitude even as the concentration of analyte changes.
Control chart. "A graphical method for evaluating whether a process is or is not in a 'state of statistical control.' The determinations are made through comparison of the values of some statistical measure(s) for an ordered series of samples, or subgroups, with control limits" [ASQ]. In healthcare laboratories, the Levey-Jennings chart is commonly used to plot the result observed for a stable control material versus time, usually the day or run number.
Control limits. "Limits on a control chart which are used as criteria for signalling the need for action, or for judging whether a set of data does or does not indicate a 'state of control'" [ASQ]. Used here to refer to the defined limits or ranges of results expected due to the random error of the method, and beyond which some course of action should be taken. It is common in clinical laboratories to use Levey-Jennings control charts with limits set as either the mean plus or minus 2 standard deviations, or the mean plus or minus 3 standard deviations.
Control material, control product. A control solution that is available, often commercially, liquid or lyophilized, and packaged in aliquots that can be prepared and used individually.
Control measurements, control observations. The analytical results obtained for control solutions or control materials (that are analyzed for purposes of quality control).
Control procedure, QC procedure. The protocol and materials that are necessary for an analyst to assess whether the method is working properly and patient test results can be reported - that part of an analytical process that is concerned with testing the quality of the analytical results, in contrast to the measurement procedure, which produces the result. A control procedure can be described by the number of control measurements and the decision criteria (control rules) used for judging the acceptability of the analytical results.
Control rule. A decision criterion for interpreting control data and making a judgement on the control status of an analytical run. Symbolized here by AL, where A is the abbreviation for a particular statistic or states the number of control measurements, and L is the control limit. An analytical run is rejected when the control measurements fulfill the stated conditions, i.e., when a certain statistic or number of control measurements exceeds the specified control limits.
Correlation coefficient, r. A statistic that estimates the degree of association between two variables, such as the measurement result by a test method and the measurement result by a comparison method. A value of 1.000 indicates perfect association, i.e., as one variable increases, the other increases proportionately. A value of 0.000 indicates no correlation, i.e., as one variable changes, the other may or may not change. A value of -1.000 indicates perfect negative correlation, i.e., as one variable increases, the other decreases proportionately. For data from a comparison of methods experiment, the correlation coefficient is often calculated along with regression statistics to assess whether the range of concentrations is wide and, therefore, whether the estimates of the slope and intercept from ordinary regression analysis will be reliable.
Criteria for acceptability. CLIA's term for the decision criteria applied to assess the validity of an analytical run. Another name for QC procedure, with emphasis on definition of the decision criteria or control rules for interpreting control measurements.
Critical-error graph. A power function graph on which is imposed the critical-error that needs to be detected. Facilitates the estimation and comparison of the probability for error detection by different QC procedures.
Critical random error, REcrit. The size of random error that causes a 5% maximum defect rate for the analytical process. Calculated as (TEa- biasmeas)/1.65smeas, where TEa is the allowable total error, biasmeas is the inaccuracy, and smeas is the imprecision (standard deviation) of the measurement procedure.
Critical systematic error, SEcrit. The size of the systematic error that needs to be detected to maintain a defined quality requirement. Calculated as [(TEa - biasmeas)/smeas] - 1.65, where TEa is the allowable total error, biasmeas is the inaccuracy, and smeas is the imprecision (standard deviation) of the measurement procedure.
Cumulative control limits. Control limits calculated from estimates of the mean and standard deviation that represent a time period longer than a month. Common practice is for laboratories to calculate monthly control statistics. Cumulative statistics can be easily calculated from monthly statistics that tabulate the sum of the individual values and the sum of the squares of those values. See the lesson QC - The Calculations for more detailed discussion, equations, and examples.
Decision Interval, D_{int}. Used here to represent the gray zone, or interval of uncertainty, in interpreting a test result. An example is the NCEP (National Cholesterol Education Program) guidelines that indicate a cholesterol of less than or equal to 200 mg/dL is okay and that a value of more than or equal to 240 should have followup testing, which defines a gray zone of 40 mg/dL between 200 to 240, which is 20% at a decision level of 200 mg/dL. Dint can also be defined more generally by the change in a test result that is judged to be medically significant.
Decision level for critical interpretation (X_{c}). See medical decision level.
Definitive method, DM. “An analytical method that has been subjected to thorough investigation and evaluation for sources of inaccuracy, including nonspecificity.” [CLSI]
Degrees of freedom, df. This is the number of independent comparisons that can be made among N observations. It may be thought of as the number of measurements in a series minus the number of restrictions on the series. For example, there are N-1 degrees of freedom for the standard deviation because the mean has already been calculated prior to the calculation of the standard deviation. There are N-2 degrees of freedom for the standard deviation about the regression line because the slope and intercept have already been calculated.
Deming regression. An alternate regression calculation that can be employed when ordinary linear regression may not be reliable. This technique takes into account the imprecision of both the test and comparative methods. When the analytical range of comparison results is narrow, ordinary linear regression may give a slope that is too low and a y-intercept that is too high. The correlation coefficient is used as a practical measure of when alternate regression techniques should be applied. When r is less than 0.99 or 0.975, depending on the source of the recommendation, Deming regression or Passing-Bablock regression should be used instead of ordinary linear regression.
Designated comparison method, DCM. “A fully specified method, which, in the absence of an NRSCL-credentialed reference method, serves as the common basis for the comparison of ‘field’ reference materials and methods, and for the development of principal assigned values (PAVs) or principal assigned characteristics (PACs).” [CLSI]
Detection Limit. General concept meaning “the smalled test value that can be distinguished from zero.” Depending on how this is determined experimentally, there are different estimates that can be made. See Limit of Blank (LoB), Limit of Detection (LoD), and Limit of Quantitification (LoQ). Older terms that still may be seen are Lower Limit of Detection (LLD), Biological Limit of Detection (LLB), and Functional Sensitivity (FS).
Difference graph, difference plot. Used here to refer to the display of paired test results in which the differences between the test values and the comparison values are plotted on the y-axis versus the comparison values on the x-axis.
Dispersion. Refers to the spread of values observed for a variable. The standard deviation is a measure of dispersion, in contrast to the mean which is a measure of central tendency or location.
Distribution. Refers to the spread and shape of a frequency curve of some variable. A histogram is one way to graphically display the distribution of test results by showing the frequency of observations on the y-axis versus the magnitude on the x-axis. The normal or gaussian curve is one form of a distribution.
Error. “1) Deviation from the truth or from an accepted, expected true or reference value; between the estimated value of a quantity and its true value. 2) Measurement error. The result of a measurement minus a true value of a measurand.” [CLSI]
Error detection. See probability for error detection.
Examination procedure. Set of operations having the objective of determining the value or characteristics of a property. [ISO]
Expanded uncertainty. Quantity defining an interval about the result of a measurement that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand.
FDA-cleared or approved test system. A test system cleared or approved by the Food and Drug Administration through the pre-market notification (510k) or pre-market approval (PMA) process for in-vitro diagnostic use.
F-test. A statistical test of significance in which the difference between two variances is tested. A variance is the square of a standard deviation. The F-test is often used to compare the imprecision of two analytical methods. The hypothesis being tested (called a null hypothesis) is that there is no difference between the two variances. If the calculated F-value is greater than the critical value which is obtained from a statistics table, then the null hypothesis is rejected. This means that a difference exists and that the difference is statistically significant, or real. If the calculated F-value is less than the critical value, the null hypothesis cannot be rejected, therefore, there is no difference between the two variances being tested, and the difference is not statistically significant.
False rejection. See probability for false rejection.
Frequency of errors, frequency of occurrence of analytical errors, f. A performance characteristic of a measurement procedure that describes how frequently analytical errors are expected to occur. Related to the stability of the measurement procedure.
Functional sensitivity, FS. This term is commonly used to refer to an estimate of detection limit that is calculated from replicate measurements of low concentration patient samples. By definition, FS is the lowest concentration at which the method provides a 20% coefficient of variation (CV).
Gaussian curve. Gaussian distribution. Normal curve. Normal distribution. Refers to a symmetrical bell-shaped distribution whose shape is given by a specific equation (called the normal equation) in which the mean and standard deviation are variables. It is commonly assumed that the random error of an analytical method fits the Gaussian distribution and therefore can be characterized by calculating the standard deviation. The standard deviation is not a valid statistic if a distribution is not Gaussian.
High complexity tests. A CLIA category of tests that has the most demanding QC requirements. Includes any tests developed by the laboratory, modified by the laboratory, or manufacturer's tests that have been classified as high complexity.
Histogram. “A graph of a frequency distribution in which the rectangles on the horizontal or x-axis are given widths proportional to the intervals of the quantities being displayed, and heights proportional to the frequency of occurrence of quantities within that interval.” [CLSI]
Imprecision. “The random dispersion of a set of replicate measurements and/or values expressed quantitatively by a statistic, such as standard deviation or coefficient of variation.” [CLSI] IFCC has recommended that the mean value and number of replicates should also be stated, and the experimental design described in such a way that other workers can repeat it. This is particularly important whenever a specific term is used to denote a particular type of imprecision, such as within-run, within-day, day-to-day, total, or between-laboratories.
Inaccuracy. "Numerical difference between the mean of a set of replicate measurements and the true value. This difference (positive or negative) may be expressed in the units in which the quantity is measured, or as a percentage of the true value" [IFCC]. We use the term biasmeas to describe the average systematic difference between a measurement procedure and a comparative method and express it in percent on OPSpecs charts..
Inherent imprecision, inherent random error. The standard deviation or coefficient of variation of the results in a set of replicate measurements obtained when the measurement procedure is operating under stable conditions.
Intercept, y-intercept. The place at which a line on a graph intersects the axis. In regression analysis, this statistic refers specifically to the y-intercept, a.
Interference. “Artifactual increase or decrease in apparent concentration or intensity of an analyte due to the presence of a substance that reacts nonspecifically with either the detecting reagent or the signal itself.” [CLSI]
Interference experiment. A method validation experiment which estimates the systematic error resulting from interference and lack of specificity. One test sample is prepared by adding the suspected interferer to a sample containing the analyte of interest. A second aliquot of the original sample is diluted by the same amount, then both samples are analyzed by the test method. The average difference of replicate measurements and the average difference for a group of interference samples provide an estimate of constant systematic error.
Levey-Jennings control chart. A commonly used control chart in which individual control measurements are plotted directly on a control chart with limit lines drawn either as mean ± 2s or mean ± 3s. Time is displayed on the x-axis usually in terms of days or runs.
Limit of Blank (LoB). Highest measurement result that is likely to be observed (with a stated probability) for a blank sample; typically estimated as a 95% one-sided confidence limit by the mean value of the blank plus 1.65 times the SD of the blank. [CLSI]
Limit of Detection (LoD). Lowest amount of analyte in a sample that can be detected with (stated) probability, although perhaps not quantified as an exact value; estimated as a 95% one-sided confidence limit by the mean of the blank plus 1.65 times the SD of the blank plus 1.65 times the SD of a low concentration sample. [CLSI]
Limit of Quantification (LoQ)/Lower limit of quantification. Lowest amount of analyte that can be quantitatively determined with stated acceptable precision and trueness, under stated experimental conditions; the analyte concentration at which the 95% limit of total error, i.e., bias plus 2*SD, meets the required or stated goal for allowable error. [CLSI]
Linear regression analysis, least-squares analysis, ordinary regression analysis. A statistical technique for estimating the best linear relationship between two variables. The estimated line has the property that the sum of the squares of the deviations from the line is a minimum, hence the name least-squares analysis. This statistical technique is commonly applied to the data from a comparison of methods experiment, taking the test method values as the y-variable and the comparison method values as the xvariable. The statistics calculated usually include the slope (b), yintercept (a), and standard deviation about the regression line, also termed the standard error of the regression line (sy/x) and also called the standard deviation of residuals (sres). These statistics provide information about the proportional, constant, and random errors between the methods, respectively.
Lot number. “An alphanumeric and/or symbolic identification placed on the label by the manufacturer that enables the manufacturing history of the product to be traced.” [CLSI]
Lower limit of detection, LLD. Older term that was commonly used to refer to an estimate of detection limit calculated from replicate measurements of a blank sample. Typically the estimate is given as the mean of the blank sample plus 2 SD of the variation observed for the blank sample.
Matrix. “All components of a material system, except the analyte.” [CLSI] Used here to refer to the physical and chemical nature of the specimen, the substances present, and their concentrations. The matrix of a control material is an important consideration in selecting and implementing a QC procedure. In this context, the matrix refers to the substances and base from which the control material is prepared, in addition to all the additives such as spiking materials, preservatives, etc., necessary to make the product useful.
Mean. The arithmetic average of a set of values. A measure of central tendency of the distribution of a set of replicate results. Often abbreviated by an x with a bar over it.
Measurand. Quantity intended to be measured. [ISO]
Medical decision level, decision level, Xc. A concentration of analyte where medical interpretation is critical for patient care. There may be several different medical decision levels for a particular analyte. Xc should provide guidance for selecting relevant estimates of stable imprecision, stable inaccuracy, and matrix inaccuracy. This is analogous to identifying a critical Target Value (TV) for assessing test performance and validating QC design.
Medically important errors. Used here to indicate errors that, when added to the inherent imprecision and inaccuracy of a measurement procedure, cause the quality requirement to be exceeded. Medically important random errors are those increases in the standard deviation of the measurement procedure that cause the error distribution to exceed the quality requirement (see critical random errors). Medically important systematic errors are those shifts in the mean of the error distribution that cause the error distribution to exceed the quality requirement (see critical systematic error).
Method development. Refers to the process of formulating the materials, conditions, and protocol for measuring an analyte. Method development is mainly carried out by industry. Laboratories may make minor modifications to methods to improve performance, in which case, the modified methods should be subject to more rigorous testing and evaluation by the laboratory.
Method selection. The process of defining the laboratory requirements and choosing an analytical method which has the desired characteristics. Application and methodology characteristics must be carefully considered when selecting the method.
Method validation. The process of testing a measurement procedure to assess its performance and to validate that performance is acceptable. The magnitudes of the analytical errors are experimentally determined and their acceptability for the application of the method is judged versus defined requirements for quality in the form of maximum allowable errors.
Methodology characteristics. Those properties of a method which in principle should contribute to the best analytical performance in the measurement of the analyte of interest. Characteristics such as the specificity of the chemical reaction, optimization of the reaction conditions, etc.
Model. A mathematical equation that describes the behavior of a process as a function of its important characteristics.
Moderate complexity tests. A CLIA category of tests that includes about 75% of all tests performed by healthcare laboratories, including most automated analytical systems. This category has more stringent requirements than for “waived tests” or “provider performed microscopy.”
Multi-rule quality-control procedure. A control procedure that uses two or more control rules for testing control measurements and determining control status. At least one rule is chosen for its ability to detect random errors and one to detect systematic errors.
Non-waived tests. Used in the Final CLIA Rule as a category of tests that encompasses both moderate complexity and high complexity tests. QC regulations in the Final CLIA Rule are the same for all non-waived tests, whereas earlier drafts provided different QC requirements for the classes of moderate and highly complex tests.
Null hypothesis. A hypothesis of the form: there is no difference between A and B. This form of the hypothesis is the basis for statistical tests of significance, such as the t-test and the F-test. In the t-test, A and B are mean values. In the F-test, A and B are variances (squares of standard deviations)
Number of control measurements, N. Used here to indicate the total number of control measurements available for use in assessing the quality of an analytical run. We consider N to be the total number of control measurements available for inspection when using common Levey-Jennings type QC charts or multirule type QC procedures where it is possible to combine the measurements from different materials to accumulate a higher N (and higher error detection) for evaluating control status. These measurements may be replicates on one level or material, individual measurements on two or more materials, or replicate measurements on two or more materials. For example, if you assay a single material and make two measurements on that material, N is 2. If you assay two materials (as required by US CLIA regulations) and make single measurements on each, N is 2. If you assay two materials and make duplicate measurements on each, N is 4. If you assay three materials and make single measurements on each, N is 3. If you assay three materials and make duplicate measurements on each, N is 6. With the use of mean/range or cusum type of QC procedures where it is more difficult to combine the measurements from different control materials, N is more likely to be the number of replicates on an individual material.
Operating specifications (OPSpecs). Used here to describe the imprecision and inaccuracy that are allowable and the QC that is necessary to assure, at a stated level, that a defined quality requirement will be achieved in routine operation.
OPSpecs chart. A plot of the inaccuracy (on the y-axis) and the imprecision (on the x-axis) that are allowable for different QC procedures. The chart is prepared for a defined quality requirement and for a stated level of analytical quality assurance (AQA). Readily available in workbook format in the OPSpecs Manual and also easily prepared by the QC Validator computer program.
Outliers. Discrepant values. Values which do not agree with the pattern of the majority of other values. They may be due to mistakes or they may represent a significant finding. When outliers are suspected, it is best to calculate the data set with and without the outlier values. If their presence changes the conclusion drawn from the data, then the experimental results are not reliable. It is possible to apply a wide variety of statistical tests or rules for purposes of rejecting outliers, however, the choice of rules is always subject to argument. It is always better to inspect the data as it is collected during the experiment, identify discrepant values, and determine their cause.
Passing-Bablock regression. An alternate regression calculation that can be employed when ordinary linear regression may not be reliable. This technique is non-parametric and therefore makes fewer assumptions about the nature of the data. It depends on calculating the slopes of all possible pairs of points, ranking those slopes, and selecting a median value. The correlation coefficient is used as a practical measure of when alternate regression techniques should be applied. When r is less than 0.99 or 0.975, depending on the source of the recommendation, Deming regression or Passing- Bablock regression should be used instead of ordinary linear regression.
Paired t-test. A form of the t-test where the data consist of pairs of observations on one set of samples, either before and after experimental treatment, or by two different methods of measurement. This statistical test is often used in analyzing the data from a comparison of methods experiment. Information about the systematic error is provided by the bias statistic; information about the random error between methods is provided by the standard deviation of the differences. These estimates will be reliable if proportional error is absent, or if the estimate of bias is only interpreted for a medical decision concentration that is close to the mean of the data. The t-value itself is a ratio of the systematic and random error terms and is useful only to assess whether sufficient data has been collected to conclude that a real difference exists. The t-value should not be interpreted as an indicator of method acceptability.
Performance characteristic. “A property of a test that is used to describe its quality.” [CLSI] For a measurement procedure, the performance characteristics include reportable range, imprecision, inaccuracy or bias, interference, recovery, detection limit, and reference interval. Those properties that describe how well a procedure performs. For a control procedure, the performance characteristics are the probabilities for error detection and false rejection, or the average run lengths for rejectable and acceptable quality. For a measurement procedure, the performance characteristics include analytical range, precision, accuracy, interference, recovery, and also the frequency and duration of analytical errors.
Performance specification. “A value or range of values for a performance characteristic, established or verified by the laboratory, that is used to describe the quality of patient test results.” [CLSI]
Power curve. A line on a power function graph that describes the performance of a certain control rule and N.
Power-function graph. A graphical presentation of the performance characteristics of QC procedures that describes the probability for rejection (on the y-axis) versus the size of analytical error occurring (on the x-axis) for stated control rules and numbers of control measurements.
Precision. Closeness of agreement between quantity values obtained by replicate measurements of a quantity, under specified conditions. [ISO]
Preventive maintenance. The renewing or refurbishing or critical parts of a method on a regular basis to prevent malfunctions.
Primary standard material. "Substance of known chemical composition and sufficient purity to be used in preparing a primary standard solution" [IFCC].
Primary standard solution. "Solution used as calibration standard in which the concentration is determined solely by dissolving a weighed amount of primary standard material in an appropriate solvent, and making a stated volume or weight" [IFCC].
Probability, p. The likelihood an event will occur, usually stated as a decimal fraction between 0 and 1, 0 meaning that the event will never occur and 1 meaning that the event will always occur. For example, p=0.05 means there is a 5% chance that an event will occur. Commonly used in quality control to describe the chance that a run will be rejected.
Probability for error detection, P_{ed}. A performance characteristic of a QC procedure that describes how often an analytical run will be rejected when results contain errors in addition to the inherent imprecision of the measurement procedure. Ideally, P_{ed} should be 1.00 for errors that are medically significant. In practice, we generally aim for a P_{ed} of 0.90 when selecting and designing QC procedures.
Probability for false rejection, P_{fr}. A performance characteristic of a QC procedure that describes how often an analytical run will be rejected when there are no errors occurring, except for the inherent imprecision of the measurement procedure. Ideally, P_{fr} should be 0.00. In practice, we generally aim for a P_{fr} of 0.05 or less.
Process capability. An industrial term used to describe how the inherent variability of a production process under stable operation compared to the allowable variation. SEcrit is an index of process capability for an analytical testing process.
Process stability. Used here to characterize the performance of the measurement procedure in terms of the frequency of analytic runs having medically important errors (f) that invalidate the medical usefulness.
Proficiency sample. “A specimen containing analytes of unknown concentration or identification that is sent to laboratories participating in testing programs in order to independently verify the laboratory technical competency.” [CLSI]
Proficiency testing, PT. “A program in which multiple specimens are periodically sent to members of a group of laboratories for analysis and/or identification; in which each laboratory’s results are compared with those of other laboratories in the group and/or with an assigned value, and reported to the participating laboratory and others.” [CLSI]
Proficiency testing criteria for acceptable performance, PT criteria. Defined limits about a target value (TV) that are used to classify analytical performance as acceptable or not. CLIA defines PT criteria for about 80 regulated analytes, using a format of TV ± a stated %, TV ± a fixed concentration, or TV ± 3 SD, where the SD is usually a group standard deviation from a PT survey. These PT criteria should be interpreted as total error criteria because only single measurements can be made on PT specimens and the test result is subject to both random and systematic errors.
Provider performed microscopy, PPM. A special subset of moderately complex tests that may be performed by physicians, dentists, nurse practioners and midwives, and physician assistants as part of a patient examination.
QC, Quality control. A generic term that refers to the monitoring and assessment of laboratory testing processes to identify problems and maintain performance.
QC acceptability criteria. The term used by CLIA to indicate the decision criteria or control rules used to monitor test performance during a run of patient specimens.
QC planning process, QC Design, quality planning process. The steps to be followed to select control rules, N, and a Total QC strategy that are appropriate for the quality needed and the imprecision and inaccuracy observed for a laboratory test.
Quality. The totality of characteristics of an entity that bear on its ability to satisfy stated and implied needs. [CLSI]
Quality assessment. CLIA’s term for the overall system for assuring the quality of laboratory test results. Includes the monitoring and assessment of general laboratory systems, as well as pre-analytic, analytic, and post-analytic systems, with the objective of identifying problems, making corrections, and improving the quality of testing services.
Quality assurance. Planned and systematic activities to provide adequate confidence that requirements for quality will be met. [CLSI, ISO]
Quality management. All activities of the overall management function that determine quality policy objectives and responsibilities; and implement them by means such as quality planning, quality processes, quality control, quality assessment, and quality improvement within the quality system. [CLSI, ISO]
Quality planning model. Term used to describe an equation that shows the additive effects of different factors that influence the variation of a test result. The analytical model relates the imprecision and inaccuracy of the measurement procedure and the sensitivity of the control procedure to the total analytical error that is allowable. The clinical model includes the analytical components plus pre-analytical components, such as within-subject biological variation, etc., and relates them to the clinical decision interval or gray zone for interpreting a test result.
Quality system. The organizational structure, resources, processes, and procedures needed to implement quality management. [CLSI, ISO]
Random error, RE. An error that can be either positive or negative, the direction and exact magnitude of which cannot be exactly predicted. In contrast, systematic errors are always in one direction.
Recovery. “The measurable increase in analyte concentration or activity in a sample after adding a known amount of that analyte to the sample.” [CLSI] Characterizes the ability of an analytical method to correctly measure pure analyte when added to the matrix routinely analyzed.
Recovery experiment. A method validation experiment performed to estimate proportional systematic error. A test sample is prepared by adding a standard solution of the analyte of interest to an aliquot of a patient specimen. A baseline sample is prepared by adding an equal amount of diluent or solvent to the same patient specimen. The two samples are analyzed and recovery estimated from the difference observed between the two samples divided by the amount added.
Reference Change Value (RCV). An uncertainty term that expresses the difference that must be observed before a change of patient values should be considered clinically important. Defined by Fraser as a function of the analytical variation and the within-subject biologic variation.
Reference interval. A particular statistical range, rather than the entire range of observed reference values. Commonly used to characterize the range of test results expected for a defined group of people.
Reference interval experiment. A method evaluation experiment in which specimens are collected from selected individuals in defined states of health in order to characterize the expected range of test values for that population.
Reference material, RM. “A material or substance, one or more of whose property values are sufficiently homogeneous and well established to be used for the calibration of an apparatus, the assessment of a measurement method, or for assigning values to materials.” [CLSI] See also certified reference material.
Reference method. “A thoroughly investigated method, in which exact and clear descriptions of the necessary conditions and procedures are given for the accuracy determined of one or more property values, and in which documented accuracy and precision of the method are commensurate with the method’s use for assessing theaccuracy of other methods for measuring the same property values, or for assigning reference method values to reference materials.” [CLSI]
Reference values. All of the values observed for a particular determination when sampling a population of individuals in defined states of health.
Regression equation. The equation for the line obtained in linear regression calculations (Y = a + bX). This equation is used to calculate the amount of systematic error from the comparison of methods experiment. For a concentration where medical interpretation of the test value is critical (called a medical decision level, X_{c}), the corresponding value by the test method can be calculated from the regression equation (Y_{c} = a + bX_{c}). The amount of systematic error, SE, is the difference between Y_{c} and X_{c}.
Regression statistics. Used here to refer to the terms that are commonly calculated, i.e., the slope, y-intercept, and standard deviation about the regression line.
Repeatability (of results of measurements). Closeness of agreement between the results of successive measurements of the same measurand carried out under the same conditions of measurement. NOTE: Formerly, the term within-run prediction was used. [CLSI EP15-A2]
Repeatability conditions. Conditions where independent test results are obtained with the same method on identical test material in the same laboratory by the same operator using the same equipment within a short interval of time. [CLSI EP15-A2]
Replication experiment. A method validation experiment that estimates the random analytical error. It is performed by making measurements on a series of aliquots of a test solution within a specified time period, usually within the time of an analytical run (within-run imprecision), within a day (within-day imprecision), and over a period of at least 20 days (day-to-day or total imprecision).
Reportable range. The range of concentration of the substance in the specimen for which method performance is reliable and test results can be reported.
Residual. “The difference between a given data point and its predicted value.” [CLSI]
Run. See analytical run.
Sample. "The appropriate representative part of a specimen which is used in the analysis." [IFCC]
Sensitivity, analytical sensitivity. The ability of an analytical method to detect small quantities of the measured component. It has no numerical value. See detection limit.
Sigma metric. A numeric value that characterize method performance in terms of the number of standard deviations or sigmas that fit within the tolerance limit or quality requirement of a test. For analytic processes, calculated as Sigma = [(%TE_{a} – %bias)/%CV], where TE_{a} is the allowable total error or CLIA PT criterion for acceptable performance, %bias represents the observed inaccuracy or systematic error of the method, and %CV represents the observed imprecision or random error of the method.
Six Sigma. A concept for world-class quality and a goal for process performance that requires 6 SDs of process variation to fit within the tolerance limit or quality requirement of a process. Applied in the Method Decision Chart as a criterion that requires bias + 6 SDs to be less than TE_{a}, the allowable total error for the test.
Slope, b. “The relationship between the change in y and the change in x between any two points along a line.” [NCCLS] Used here to refer to a statistic that is calculated as part of linear regression analysis. Commonly calculated for the data from a comparison of methods experiment in a method validation study. The ideal value for the slope is 1.000. Deviations from this value are taken as estimates of proportional systematic error. For example, a slope of 0.950 would indicate a proportional error of 5.0%.
Specimen. "Material available for analysis." [IFCC]
Standard. "Material or solution with which the sample is compared in order to determine the concentration or other quantity. The compound term calibration standard should be used whenever needed to avoid confusion with other technical or colloquial meanings of the word standard." [IFCC]
Standard deviation, s. A statistic that describes the dispersion or spread of a set of measurements about the mean value of a gaussian or normal distribution. Calculated from the equation:
where n is the number of measurements, and xi is an individual measurement.
Standard deviation index, SDI. Generally used in reports from proficiency testing (PT) survey to describe how far a PT result is from the target value (TV). An SDI of +2.0 means that the laboratory's reported result is 2 standard deviations of the group above the target value or mean of the group.
Standard deviation of y about the regression line, s_{y/x}. Also known as standard error about the regression line and standard deviation of the residuals (s_{res}). A statistic calculated as part of linear regression analysis. It is the standard deviation of the differences y_{i}-Y_{i}, where y_{i} is the observed value corresponding to x_{i} and Y_{i} is the value calculated from the regression equations (Y_{i} = a + bx_{i}). This statistic measures the dispersion or spread of the data points about the regression line. In the comparison of methods experiment, its ideal value would be zero. Values greater than zero describe the random error between the methods, which is composed of the random error or imprecision from both the test and comparison methods, as well as any matrix interferences that vary from sample to sample.
Standard deviation of the differences, s_{d}, s_{diff}. A statistic calculated as part of paired t-test analysis. It is the standard deviation of the individual differences between pairs of values (x_{i}, y_{i}), after those differences have been corrected for any systematic error or bias between the methods. When applied to data from a comparison of methods experiment, its properties are similar to those of the standard deviation about the regression line, except that the presence of proportional error contributes to s_{diff} and invalidates its quantitative interpretation.
Standard deviation of the intercept, s_{a}. Indicates the dispersion or spread of values for the estimate of the y-intercept in linear regression analysis.
Standard deviation of the slope, s_{b}. Indicates the dispersion or spread of values for the estimate of the slope in linear regression analysis.
Standard error of the mean, s_{x}. A statistic which indicates the dispersion or spread of values for a mean of a set of measurements. Standard deviation index, SDI. Generally used in reports from proficiency testing (PT) surveys to describe how far a PT result is from the target value (TV). An SDI of +2.0 means that the laboratory’s reported result is 2 standard deviations of the group above the target value or mean of the group.
Standard uncertainty. Uncertainty of the results of a measurement expressed as a standard deviation. [ISO]
State Operations Manual, SOM. CMS’s official document that provides guidelines for interpreting the CLIA regulations, as well as suggest probes for inspectors to use when reviewing a laboratory.
Statistical quality control. Those aspects of quality control in which statistics are applied, in contrast to the broader scope of quality control which includes many other procedures, such as preventive maintenance, instrument function checks, and performance validation tests. Statistical QC procedures are often used to monitor routine performance of a method and to alert the laboratory when the performance of a method changes.
Statistical control limits. As used with Levey-Jennings and Westgard multirule types of QC procedures, these are the lines drawn on control charts to define the range of results expected due to the random error of the method. The limits are often obtained from a group of 20 or more measurements on a particular control material by calculating the mean and standard deviation, then using multiples such as the mean plus/minus 3s, 2s, or 1s to establish rejection limits for different control rules.
Statistical process control. A general term used to describe those aspects of a control system in which statistics are applied to determine whether observed measurements fall within the range expected due to the random variation of the process. Industrial process control procedures provided the basis for introduction of statistical control in healthcare laboratories, however, industrial process control procedures often use the mean and range of a group of control measurements (e.g., Shewhart mean and range charts), whereas healthcare applications tend towards the use individual measurements or individual-value control charts, such as the Levey-Jennings chart.
Statistically significant, statistically significant difference. A conclusion that the difference observed is larger than that expected due to chance (or the uncertainty or random error in the experimental data). The statement usually includes a probability level, such as “statistically significant at p=0.05”, which means there is only a 5 percent chance that the difference observed could be due to the uncertainty in the experimental data (or chance as it is often called).
Systematic error, SE. An error that is always in one direction and is predictable, in contrast to random errors that may be either positive or negative and whose direction cannot be predicted.
t-test. Often called Student’s t-test. A statistical test of significance in which the difference between two mean values is tested. The null hypothesis is that there is no difference between the two means. The test is carried out by calculating a t-value, then comparing the calculated t-value with a critical t-value which is obtained from a statistics table. If the calculated t-value is greater than the critical t-value, the null hypothesis is rejected; this means that a statistically significant or real difference exists between the mean values being compared. If the calculated t-value is less than the critical t-value, the null hypothesis stands, therefore no difference has been observed between the two mean values.
t-value, t. A statistic from the t-test. It is a ratio of a systematic error component divided by a random error component [bias/(sdiff/N1/2)].
Target measurement uncertainty. Measurement uncertainty formulated as a goal and decided on the basis of a specific intended use of measurement results.
Target value, TV. Used in proficiency testing to designate the correct value, usually estimated by the mean of all participant responses, after removal of outliers, or by the mean established by definitive or reference methods.
Test complexity. Refers to the CLIA system of classifying tests into categories on the basis of the difficulty of measurement and interpretation. Categories include waived, provider performed microscopy, moderate complexity, and high complexity.
Test method. The method which is selected for experimental testing to validate its performance characteristics.
Total error, TE. The net or combined effect of random and systematic errors.
Total error requirement. See allowable total error.
Total imprecision. The random error observable over a period of many runs and many days.
Total QC strategy. The balance of the efforts expended on statistical QC, preventive maintenance, instrument function checks, method performance tests, and quality improvement.
Total testing process. CLIA’s term for the entire testing process that includes pre-analytic, analytic, and post-analytic steps and procedures.
Traceability. “A property of the result of a measurement or the value of a standard whereby it can be related to stated references, usually national or international standards, through an unbroken chain of comparison, all having stated uncertainties.” [CLSI]
True value. Generally used to indicate that this is the correct analytical concentration or result.
Trueness (of measurement). Closeness of agreement between the average value obtained from a large series of test results and an accepted reference value. NOTE: The measurement of trueness is usually expressed in terms of bias. [CLSI EP15-A2]
Validation. Action or process of proving that a procedure, process, system, equipment, or method used works as expected and achieves the intended results. [CLSI]
Variable. A quantity of interest, whose value or magnitude fluctuates or changes.
Variance. The standard deviation squared. If there are independent sources of errors, the variance of the total error is the sum of the variances due to the individual sources of error.
Verification. The confirmation by examination and provision of objective evidence that specified requirements have been fulfilled. [CLSI]
Waived tests. A specific category of tests defined by CLIA, such as dipstick tests, fecal occult blood, urine pregnancy tests, erythrocyte sedimentation rates, blood glucose monitoring devices, etc., which are subject to the lowest level of regulation. The main requirement for QC is to follow the manufacturer's directions.
Westgard rules, Westgard multi-rule control procedure. A control procedure that uses a series of control rules to test the control measurements. A 12s rule is used as a warning, followed by use of 13s, 22s, R4s, 41s, and 10x as rejection rules.
Within-laboratory precision. Precision over a defined time and operators, calibration and reagents may vary within the same facility and using the same equipment. NOTE: Formerly, the term total precision was used. [CLSI EP15-A2]
Within-run imprecision. The random error observable within the time period of a single analytical run.
z-score, z-value. A calculated number that tells how many standard deviations a control result is from its mean value, e.g., a control result of 112 on a material having a mean of 100 and a standard deviation of 5 has a z-score of +2.4, i.e., it is 2.4 standard deviations above its mean.
Abbreviations
AMR analytical measurement range.
CAP College of American Pathologists.
CDC Centers for Disease Control
CLIA Clinical Laboratory Improvement Amendments of 1988
CMS Centers for Medicare and Medicaid Services
COLA Originally, Commission for Office Laboratory Accreditation
CRR Clinically Reportable Range
CV Coefficient of Variation (SD divided by mean times 100)
FDA Food and Drug Administration
ISO International Standards Organization
JC Joint Commission for Accreditation of Healthcare Organizations
LAP Laboratory Accreditation Program
CLSI Clinical Laboratory and Standards Institute, formerly National Committee for Clinical Laboratory Standards (NCCLS)