Tools, Technologies and Training for Healthcare Laboratories

Analysis of common cortisol assays

Clinical Chemistry had a comprehensive review of the performance of routine serum cortisol assays, including performance on different patient cohorts of males, nonpregnant females, pregnant females, and those under different drug regimens. Is the analytical performance the same for all of these patients? And if not, how big are the differences and how do we finally judge the method performance? Does a Sigma-metric analysis clarify the picture at all?

Sigma-metrics of Serum Cortisol assays

September 2016
Sten Westgard, MS

[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the link provided.] six sigma quality design control

In most performance evaluations, the study authors (as well as our analysis) concentrates on one single set of patients. We don't take into account the differences between males and females, or between healthy patients, unhealthy patients, and/or those under different medical treatments. It's rare that an evaluation study looks so deeply at an assay.

But in Clinical Chemistry, we have a study that did all of that for not just one, but 5 routine assays for Serum Cortisol:

Serum Cortisol: An Up-to-Date Assessment of Routine Assay Performance Hawley JM, Owen LJ, Lockhart SJ, Monaghan PJ, Armson A, Chadwick CA, Wilshaw H, Freire M, Perry L, Keevil BG. Clin Chem 2016;62:9:12201229.

This study looked at cohorts of not only males, but nonpregnant females, pregnant females, and those on metyrapone treatment (to manage Cushing syndrome) and prednisolone treatment. Each of the patient sample cohorts was compared against LC-MS/MS certified reference method procedure. Thus, we have bias figures for each cohort.

The Imprecision and Bias Data

"Inter-assay imprecision for all routine assays ranged from 1.6% to 7.5% over cortisol concentrations from 84 to 990 nmol/L." A summary of the imprecision at various levels, found in the supplemental data files, is shown below.

For the different cohorts, between 27 and 72 patient samples were used and compared against the certified reference method procedure (cRMP). The exact numbers are listed below. Passing-Bablok regression analysis was performed on the data. Using the regression equations, we are able to calculate the bias at each level where imprecision was measured.

Routinelevel = (slope * cRMPlevel ) + Y-intercept

As an example, let's take the low level of the ARCHITECT, 84 nmol/L, where the study determined a slope of 1.01 and y-intercept of -0.22.

Routinelevel1 = (1.01 * 84 ) - 0.22

Routinelevel1 = (84.84) -0.22

Routinelevel1 = 84.62

The bias between the old and new level is the absolute value of the difference between 84.62 - 84 = 0.62

This is an 0.74% bias at the level of 84 nmol/L.

The 2014 biological variation database sets desirable specifications for imprecision and bias at 7.6% and 10.3%, with an allowable total error of 22.8%.

We will highlight in red any number below that exceeds those recommendations.

Assay Platform
CV% Bias Males
Bias non-preg
Females (n=45)
Bias Pregnant
Females (n=72)%
Bias Metyrapone
Bias Prednisolone
Abbott Architect 96 3.9%  0.74% 4.3% <3 106.2% 785%
  434 1.6%  0.96%  8.75%  28.6% 8.2% 800%
  664 2.3%  0.98%  9.63%  34.6% 15.9%  811%
Beckman Access 133 7.5%  0.49%  1.92%  25.7% 134.4%  947%
  402 7.5%  0.07%  5.69%  26.9% 13.0% 964%
  892 7.5%  0.04%  6.2%  30.4% 4.8% 966%
Roche E170 Generation I 101 4.3%  0.85%  2.4%  12.1% 162.2% >1000%
  436 3.8%  20.59%  16.7%  9.4% 55.4%  >1000%
  1095 2.5%  21.92%  17.99%  9.3% 48.2%  >1000%
Roche E170 Generation II 161 2.1%  3.62%  1.82%  10.2% 35.5% 122%
  532 1.6%  4.58%  3.68%  2.4% 0.97%  83%
  837 1.8%  5.14%  3.8%  1.8% 1.36%  80%
Siemens Centaur XP 160 6.0% 31.56%  11.13%  30.8% 167.2%  
  596 5.9%  15.74%  14.43%  6.6% 37.4% >600%
  990 4.9%  14.67%  14.65%  9.1% 28.7%

 As you can see, the imprecision is acceptable at all levels for all methods, but the bias is a very different story.

If you just count up how many times one of these assays has bias in excess of the recommendation, the least biased method is the Roche E170 Generation II, followed by the Abbott ARCHITECT and Beckman Access. The Roche first generation assay has more bias than that, but the Siemens Centaur has the most bias, showing excessive bias for nearly every cohort at every level.

Overall, the study shows there is a significant bias problem for most assays. But when a single result is generated, and is impacted by both imprecision and bias, what's the story then?

Calculate Sigma metrics

The Sigma-metric approach takes both imprecision and bias into account in a single equation. We're going to calculate Sigma-metrics using the "Ricos goal" of 22.8% .

Remember the equation for Sigma metric is (TEa - bias%) / CV.

Example calculation: for the Abbott ARCHITECT, with a 22.8% quality requirement, given 3.9% imprecision and 0.74% bias for males:

(22.8 - 0.74) / 3.9 = 22.06 / 3.9 = 5.7 Sigma

Similar to the judgement we made earlier with separate components, this Sigma-metric verdict on this plaform assay is excellent, at least for males.

So here's the table with all the Sigma-metrics, again according to the Ricos goal:

Assay Platform
CV% Sigma-metric
Pregnant Females

Abbott Architect 96 3.9% 5.7 4.7 <3 <3 <3
  434 1.6%  >6 >6%  <3 >6  <3
  664 2.3%  >6  5.7  <3 3.0  <3
Beckman Access 133 7.5% 3.0  2.8  <3 <3   <3
  402 7.5%  3.0  2.3  <3 1.3  <3
  892 7.5% 3.0  2.2  <3 2.4   <3
Roche E170 Generation I 101 4.3%  5.1 4.7  2.5 <3   <3
  436 3.8% 0.6 1.6  3.5 <3   <3
  1095 2.5% 0.4 1.9 5.4 <3   <3
Roche E170 Generation II 161 2.1% >6 >6  6.0 <3  <3
  532 1.6%  >6  >6  >6 >6   <3
  837 1.8%  >6  >6  >6 >6   <3
Siemens Centaur XP 160 6.0% >3  1.9  <3 <3  
  596 5.9%  >3  1.4  2.8 <3  <3
  990 4.9%  >3  1.7  2.8 <3  

Overall, there are a lot of bad metrics here. The bias for certain cohorts is industry-wide.


Summary of Performance by Sigma-metrics Normalized Method Decision Chart

We can make visual assessments of this performance using a Normalized Sigma-metric Method Decision Chart:

Cortisol MaleCohort NMEDx

For the male cohort, there are two very good methods available, the Abbott and Roche II methods.


Cortisol NonPregnancy NMEDx

For the cohort of non-pregnant females, the second generation Roche assay has the best performance.


Cortisol Pregnancy NMEDx

For the pregnant females cohort, we see that a majority of methods are biased when compared vs. the certified reference method procedure.


Cortisol Metyrapone NMEDx

For patients under metyrpapone therapy, clinicians should be aware that most methods are not able to meet a 22.8% allowable total error at even a three sigma level. Switching to a use of reference change value for serial test results may be the better approach.


Cortisol Prednisalone NMEDx


For patients under prednisalone, there's a clear issue, isn't there? There's no method that can give reliable results. Clinicians monitoring patients under this therapy should be advised that the results are going to be significantly impacted by method factors.

Overall, the Roche second generation method fares best. No method here is perfect, but the Roche second generation method does better than other methods in pregnant and nonpregnant females.


The authors conclude "[R]outine serum cortisol immunoassay peformance remains highly variable. Although there is increasing awareness of the impact of assay-specific biases on result interpretation, it is unrealistic to expect that all users are familiar with the limitations of their assay. The case for sex- and assay-specific cutoffs has been made...Although these may be helpful in the short-term, reagent lot changes, recalibration and assay reformulation may increase result uncertainty and thus not represent a long-term solution....The recent drive towards standardization is welcome...and we reason that given the limitations evidenced here, serum cortisol assays should be considered for inclusion in this initiative."

Based on Sigma-metric analysis, we would agree. Bias is the major problem, and ultimately this can only be addressed by the manufacturers of these assays. In the short-term, laboratories need to be careful about informing clinicians of what constitutes a significant change in test results.