Tools, Technologies and Training for Healthcare Laboratories

Part V: The Quality of Calcium Testing

January 2005
with Sten Westgard, MS
The quality of calcium tests was a topic on the front page of Clinical Laboratory News in August of 2004 [1]. In a report sponsored by NIST, the effect of calibration errors was studied to determine their potential cost on treatment, which was estimated to be between $60 million and $199 million [2] for systematic errors or biases from 0.1mg/dL to 0.5 mg/dL.

One might think that calibration errors should be a thing of the past for calcium, given that calcium was a model test for the development of the “National Reference System for the Clinical Laboratory” [3]. This reference system includes reference materials (and calibrators), reference methodology (atomic absorption) [4], and even a definitive method (isotope-dilution-mass spectrometry) for determining the true value of calcium in clinical materials [5]. The problem today is that laboratories utilize methods that are easy to automate, rather than methods like atomic absorption that were the standard of practice twenty years ago. These automated methods depend on serum-like calibrators, whose value-assignment is critical and still problematic.

The quality of calcium tests should be a good exemplar of current laboratory practice and obviously is of critical interest today due to the cost associated with calibration biases. To make a quantitative and objective assessment of quality [4], we make use of proficiency testing results that are required by CLIA and reported to CMS. Our assessment methodology has also been applied to cholesterol and glucose in previous reports in this series [5,6].

Materials and Methods

Sigma-metrics are estimated for calcium testing based on proficiency testing data collected during 2004. Current testing is based predominately on three measurement principles – photometric analysis using o-cresolphthalein complexone, photometric analysis using arsenazo III dye, and ion specific electrodes. The CAP survey data provides the mean and SD (CV) for the total survey group, as well as means and SDs for each of these three measurement principles.

  • National requirement for the quality of calcium is defined as an allowable total error (TEa) of 1.0 mg/dL. This is the CLIA criterion for acceptable performance in proficiency testing (PT).
  • Survey specimens were selected near the medically important decision concentration of 10.2 mg/dL. According to the NIST report, the reference range for calcium is typically 8.9-10.1 mg/dL, with critical action limits being 13.0 mg/dL. Levels outside the typical reference range require additional testing and were the basis for the cost estimates of calibration errors [2].
  • PT data comes from 2004 surveys performed by the American Academy of Family Physicians (AAFP), Medical Laboratory Evaluation (MLE), American Association of Bioanalysts (AAB), American Proficiency Institute (API), College of American Pathologists (CAP), and New York State (NY).
  • National Test Quality (NTQ) observed for a single proficiency testing sample is estimated from the CLIA total allowable total error (TEa) divided by the group SD or CV, i.e., Sigma = TEa/CV. The average NTQ observed for multiple surveys is weighted for the number of laboratories participating in the survey.
  • Local Method Quality (LMQ) for a single proficiency testing sample is a weighted average of the Sigmas determined for each method subgroup without accounting for method bias, i.e., Sigma = TEa/CVmethsubgroup. The average LMQ observed for multiple surveys is weighted for the number of laboratories participating in each survey.
  • National Method Quality (NMQ) observed for a single proficiency testing sample is a weighted average of the Sigmas determined for each method subgroup taking bias into account, i.e., Sigma = (TEa – biasmethsubgroup)/CVmethsubgroup. The average NMQ observed for multiple surveys is weighted for the number of laboratories patricipating in each survey.
Further details on the methodology are discussed in an earlier essay.


Estimates of quality. Table 1 provides the estimates of quality from 5 different PT programs, which are identified in Column 1. Column 2 shows the number of laboratories participating in each program, which totals 9,786 laboratories for all 5 programs. AAFP is the smallest with 164 labs, followed by MLE with 528, AAB with 1444, API with 2695, and CAP with 4955 participants.

Column 3 shows the group means for each of the survey samples from the different PT programs. We selected samples close to a concentration of 10.2 mg/dL, which is critical for diagnostic and treatment practices today. We observed that survey samples have a concentration range from about 7 to 12 mg/dL, but it was often difficult to find a sample within a few tenths of 10.2 mg/dL.

Column 4 shows the estimates of National Test Quality for each survey group. The weighted average of the estimates is only 2.84 Sigma, which indicates, on average, calcium test quality would not meet the minimum standard of quality for routine production in other industries.

Column 5 gives a more optimistic estimate of Local Method Quality based on method subgroups without consideration of bias between methods. Even so, the numbers range from 2.71 to 3.50 for half the labs, with the CAP figure of 4.33 boosting the weighted average to 3.86. Remember that this estimate of quality assumes that local reference ranges and medical cutoff points would be used to compensate for the bias between methods. If bias is not compensated in this way, then column 6 provides a better estimate of quality with respect to standardized national treatment guidelines. Those figures are lower, ranging from 2.35 to 3.07, with a weighted average of 3.00 Sigma.

As a benchmark for comparison, we again include survey data from the NY program, which we believe sets the most demanding regulatory standards. These Sigmas are higher, being 3.57 for NTQ, 4.08 for NMQ, and 4.73 sigma for LMQ.

Table 1. Summary of calcium quality (weighted averages) from 5 national PT survey programs

PT Program Labs Group Mean NatTestQ LocMethodQ NatMethodQ Datasheet

AAFP 2004A 164 10.2 2.50 2.71 2.35 1
MLE 2004A 528 10.5 2.44 3.50 2.69 2
AAB 3rd 2004 1444 11.1 2.78 3.37 2.95 3
API 3rd 2004 2695 11.1 2.63 3.45 2.98 4
CAP1st2004C-04 4955 10.4 3.03 4.30 3.07 5

Group summary 9786 10.7 2.84 3.86 3.00

NY Benchmark 391 9.94 3.57 4.73 4.08 6

Variability of estimates. Table 2 provides a more complete assessment for CAP testing results for 2 Events that include a total of 10 samples. The overall average Sigmas are 3.09 for NTQ and 3.67 Sigma for NMQ, with the most optimistic estimate being 4.77 for LMQ. Note that there seems to be some dependence of Sigma on the concentration of the PT sample, with higher Sigmas being observed for lower concentration samples. This is most likely a result of the quality requirement being less demanding at lower concentrations since the CLIA PT criteria has been set in concentration units, rather than as a percentage. The overall average concentration for the 10 PT specimens is 9.98 mg/dL, thus the overall average sigmas should be representative of the performance at the critical concentration of 10.2 mg/dL.

The CAP surveys also provide subgroup means for the three major method principles – arsenazo III dye, cresophthalein complexone, and ion selective electrode measurements. When these subgroup means are used to calculate method bias (rather than the overall group mean), the estimates for NTQ and NMQ are higher, as shown in columns 7 and 8 in Table 2, with the overall averages being 3.88 Sigma for NTQ and 4.20 Sigma for NMQ.

Table 2. Summary of Calcium Quality from 10 CAP 2004 Specimens

Sigma Quality Performance Metrics

vs Group mean

vs Subgroup means

PT Specimen Number Mean NatTestQ NatMethQ LocMethQ NatTestQ NatMethQ Datasheet


C-01 4954 11.43 2.33 2.86 4.12 3.31 3.52 7, 8
C-02 4950 11.10 2.56 2.93 4.42 3.57 3.81 9, 10
C-03 4927 8.42 3.45 3.90 5.17 4.29 4.65 11, 12
C-04 4955 10.36 3.03 3.07 4.30 3.81 4.09 13, 14
C-05 4940 9.60 3.33 3.77 4.83 4.01 4.32 15, 16
C-06 5048 9.14 3.57 4.05 5.10 4.15 4.52 17, 18
C-07 5047 8.01 3.85 4.78 5.55 4.55 4.98 19, 20
C-08 5049 9.96 3.23 4.22 5.06 3.99 4.38 21, 22
C-09 5057 10.34 3.13 3.92 4.78 3.79 4.10 23, 24
C-10 5036 11.44 2.44 3.24 4.36 3.35 3.61 25, 26

EVENT1 AV 4945 10.18 2.94 3.31 4.57 3.80 4.08
EVENT2 AV 5047 9.78 3.24 4.04 4.97 3.97 4.32
AV ALL 4996 9.98 3.09 3.67 4.77 3.88 4.20
SD 54.66 1.20 0.51 0.63 0.46 0.40 0.47

Estimates of bias. Given the significance of method bias and the potential costs of calibration errors, we also estimated the biases of the different subgroups vs the overall group. Table 3 shows the respective biases for arsenazo II dye (AZD) in column 4, cresophthalein complexone (CPC) in column 5, and ion specific electrodes (ISE) in column 6. The average biases observed for these subgroups over the 10 specimens are 0.21 mg/dL, 0.19 mg/dL, and 0.08 mg/dL, resp. (Note that these biases are presented as absolute values, otherwise positive and negative values would cancel out and provide unrealistically low estimates of the average biases.) Columns 7 and 8 provide weighted averages of the overall biases vs the group mean and vs subgroup means. The average bias vs the group mean is 0.24 mg/dL, whereas the average bias vs subgroup means is 0.13 mg/dL.

Table 3. Summary of Calcium Bias from 10 CAP 2004 Specimens

Weighted Average

PT Specimen Number Mean AZD Bias CPC Bias ISE Bias Bias vs G Bias vs SG Datasheet

mg/dL mg/dL mg/dL mg/dL mg/dL
C-01 4954 11.43 0.29 0.30 0.15 0.31 0.14 7, 8
C-02 4950 11.10 0.26 0.27 0.12 0.36 0.13 9, 10
C-03 4927 8.42 0.18 0.13 0.01 0.25 0.10 11, 12
C-04 4955 10.36 0.21 0.20 0.10 0.28 0.12 13, 14
C-05 4940 9.60 0.16 0.16 0.05 0.23 0.10 15, 16
C-06 5048 9.14 0.15 0.14 0.05 0.20 0.12 17, 18
C-07 5047 8.01 0.17 0.09 0.07 0.14 0.11 19, 20
C-08 5049 9.96 0.18 0.16 0.05 0.17 0.14 21, 22
C-09 5057 10.34 0.20 0.18 0.07 0.19 0.14 23, 24
C-10 4996 9.98 0.30 0.27 0.10 0.27 0.18 25, 26

EVENT1 AV 4945 10.18 0.22 0.21 0.09 0.29 0.12
EVENT2 AV 5039 9.49 0.20 0.17 0.07 0.20 0.14
AV ALL 4992 9.83 0.21 0.19 0.08 0.24 0.13
SD 52.87 1.08 0.05 0.07 0.04 0.07 0.02


The quality of calcium tests appears to depend on the size of laboratory and sophistication of the methods and analysts involved, though these differences are not as large as observed for cholesterol and glucose in earlier reports in this series. The overall estimate of National Test Quality for calcium is 2.84 Sigma, which compares with earlier estimates of 2.88 for cholesterol and 2.95 for glucose. When evaluated on the Sigma Scale, none of these tests can be considered to provide acceptable quality and be suitable for routine applications. As shown in the figure below, current QC practices that follow the CLIA minimum of 2 controls per day are clearly inadequate to assure the quality of routine calcium tests.

The assessment for calcium is particularly discouraging because most laboratory scientists consider the CLIA allowable total error of 1.0 mg/dL to be much too large. The NIST study suggests that the maximum allowable error should be 0.5 mg/dL or less based on the significant costs of misclassification and mistreatment of patients. Given the estimate of an average bias of 0.24 mg/dL for current calcium methods (based on CAP survey results), the associated cost would be somewhere between $60 million (for 0.1 mg/dL bias) and $199 million (for 0.5 mg/dL bias). Even if local reference limits and cutoff points were being used, the average bias of 0.13 mg/dL would have an associated cost of at least $60 million.


Strike three!


  1. Downer K. How Much Does Test Calibration Error Cost? NIST Report Suggests $60-$199M for Calcium Testing Alone. Clin Lab News 2004;30 (No. 8), pp 1, 8-9.
  2. Gallaher MP, Mobley LR, Klee GG, Schryver P. Planning Report 04-1. The Impact of Calibration Error in Medical Decision Making. NIST, April 2004.
  3. NRSCL 13-A. The Reference System for the Clinical Laboratory: Criteria for development and credentialing of methods and materials for harmonization of results. NCCLS, Wayne, PA, 2000.
  4. RS9-P. Calcium; Proposed Summary of Methods and Materials Credntialed by the NRSCL Council. NCCLS, Wayne, PA, 1989.
  5. Cali JP, Mandel J, Moore L, Young Ds. A referee method for determination of calcium in serum. NBS Special Publication 260-236:1-121, 1972.
  6. Westgard JO, Westgard S. The Quality of Laboratory Testing. Part II. Touchstone Test Methodology.
  7. Westgard JO, Westgard S. The Quality of Laboratory Testing. Part III. Cholesterol.
  8. Westgard JO, Westgard S. The Quality of Laboratory Testing. Part IV. Glucose.

James O. Westgard, PhD, is a professor of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.