When manufacturers make mistakes

Are Sigma-metrics immune from manipulation? It should hardly be surprising that manufacturers will find ways to "boost" their Sigma-metric performance. A recent white paper provides an example of how manufacturers might mislead you - or make a mistake in their Sigma-metrics - and therefore alerts us to the safeguards we need to insist upon when we analyze performance data.

When Manufacturers Mislead (make Mistakes) with Sigma-metrics

July 2016
Sten Westgard, MS

[Important Update, October 2016: It was brought to our attention that the manufacturer in question here, still unnamed, was already in the process of recalling the white paper that is discussed below. So even before we noticed their mistake, they were in the process of correcting their paper. In light of this new information, we have re-edited the article.]

[Note: This lesson is an extension of From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the link provided.]

Ronald H Coase is known for saying, "If you torture the data long enough, it will confess" in his book on Economics and Economists. It calls to mind our worst opinion of statistics, which is that they can be manipulated to give any outcome that you desire.

When it comes to Six Sigma, the Sigma-metric is not immune from such torture and manipulation. A few years ago, we didn't have to worry about that happening because very few laboratories or manufacturers knew about Sigma-metrics and most preferred to ignore it.

However, as Sigma-metrics have grown in popularity, have been published in more and more studies, and been adopted agressively by some companies, the lagging manufacturers have begun to pay attention. And for some manufacturers, it's become very tempting to either assert that all their assays are Six Sigma quality, or, worse still, goose the numbers in a study to make it appear like their assays perform that way.

So now laboratories have to beware. Just as they can't trust everything a manufacturer tells them about downtime, maintenance, average time between failure, etc. they need to be wary whenever the manufacturer provides a white paper with pristine Sigma-metrics. If the paper doesn't source the data, doesn't allow you to see where the goals were chosen from, etc., there's a very real probability that the numbers are being manipulated.

[Updated note: or, as may be the case, the manufacturer may have made a mistake in calculating and presenting their Sigma-metrics. In either case, the laboratory must be wary of simply accepting the results blindly.]

Muddying the Choice of Goals

Let's take a recent white paper that I came across this year. There's a pretty solid treatment of Sigma-metrics, correctly noting how to calculate them, how to use them to redesign QC, even a discussion of the "Westgard Sigma Rules" but then they show a table with single example of allowable total errors from different sources and they make a very strange conclusion: "The CLIA and Spanish Minimum Consensus goals are essentially the same and the Ricos and RCPA are also close. Rilibak is in the middle."

Whoa, wait? What did you just say? The German Rilibak goals are the "middle of the road" quality requirements? It's certainly great that the white paper is comparing quality goals from 5 different sources, but let's make sure this is done accurately.

Let's put that assertion to the test:

Analyte	CLIA Goal	Ricos Des. Goal	RCPA Goal	Rilibak Goal	Spanish Minimum Goal	Rilibak Place	Bigger than CLIA?
ALP	30%	11.7%	15 U/L < 125 U/L 12% > 125 U/L	*21%*	31%	middle	no
Albumin	10%	4.07%	2.0 g/L < 33.0 g/L 6% > 33.0 g/L	20%	14%	largest	yes
ALT	20%	27.48%	5 U/L < 40 U/L 12% > 40 U/L	21%	23%	middle	yes
AST	20%	16.69%	5 U/L < 40 U/L 12% > 40 U/L	21%	21%	largest	yes
Amylase	30%			--		--	--
T. Bili	0.4 mg/DL or 20% (greater)	26.94%	3 umol/L < 25 umol/L 12% > 25 umol/L	22%	24%	middle	yes*
D. Bili	--	44.5%	3 umol/L < 15 umol/L 20% > 15 umol/L	--	--	--	--
Calcium	1.0 mg/dL	2.55%	0.10 mmol/L < 2.5 mmol/L 4% > 2.5 mmol/L	10%	11%	middle	yes*
Chloride	5.0%	1.5%	3.0 mmol/L < 100 mmol/L 3% > 100 mmol/L	8%	9%	middle	yes
Cholesterol	10%	9.0%	0.3 mmol/L < 5 mmol/L 6% > 5 mmol/L	13%	11%	largest	yes
Creatinine Kinase	30%	30.3%	15 U/L < 125 U/L 12% > 125 U/L	20%	24%	middle	no
Creatinine	0.3 mg/dL or 15% (greater)	8.87%	8 umol/L < 100 umol/L 8% > 100 umol/L	--	20%	--	yes*
CO2	5 mm Hg or 8% (greater)		--	--	--	--	--
GGT	--	22.11%	5 U/L < 40 U/L 12% > 40 U/L	21%	22%	middle	same
Glucose	6 mg/dL or 10% (greater)	6.96%	0.4 mmol/L < 5.0 mmol/L 8% > 5.0 mmol/L	15%	11%	largest	yes*
Iron	20%	30.7%	3 umol/L < 25 umol/L 12% > 25 umol/L	--	24%	--	--
LDH	20%	11.4%	20 U/L < 250 U/L 8% > 250 U/L	18%	26%	middle	no
Magnesium	25%	--	0.1 mmol/L < 1.25 mmol/L 8% > 1.25 mmol/L	15%	--	middle	no
Phosphate	--	10.11%	0.06 mmol/L < 0.75 mmol/L 8% > 0.75 mmol/L	16%	17%	middle	--
Potassium	0.5 mmol/L	5.61%	0.2 mmol/L < 4.0 mmol/L 5% > 4.0 mmol/L	8%	8%	largest	no*
Sodium	4.0 mmol/L	0.73%	3 mmol/L < 150 mmol/L 2% > 150 mmol/L	5%	5%	largest	yes
Total Protein	10%	3.63%	3.0 g/L < 60 g/L 5% > 60 g/L	10%	12%	middle	same
Urea Nitrogen	2 mg/dL or 9% (greater)	15.55%	0.5 mmol/L < 4.0 mmol/L 12% > 4.0 mmol/L	--	19%	--	--
Triglycerides	25%	25.99%	0.2 mmol/L < 1.6 mmol/L 12% > 1.6 mmol/L	16%	18%	middle	no
Uric Acid	17%	11.97%	0.03 mmol/L < 0.38 mmol/L 8% > 0.38 mmol/L	--	17%	--	--

On the whole, the Rilibak goals are in the middle about 2/3 of the time, but the other 1/3 of the time, they are the largest goals. Whether that means it's a stretch to say they're always in the middle of the pack is up to you. Perhaps that sentence only meant to refer to just one example test, not make a generalization. More importantly, when you compare the Rilibak goals to the CLIA goals, a majority of the time, the Rilibak goals are bigger than the CLIA goals (58.8% to be specific). Only about a third of the time are the CLIA goals bigger than the Rilibak goals (35.5%), and for 2 tests (11.8%) the goals are exactly the same. So from the US perspective, Rilibak isn't in the middle, it's on the larger side. [Note, whenever there is an * that it means that with a unit-based goal, at the low end of the range, the CLIA goal might be bigger than the Rilibak goal. In some cases, it's possible to look at the reference range and determine the expected size of the CLIA unit goals and still determine that they are always going to be smaller than the Rilibak goals).

This may not matter to everyone, but in a white paper where you aren't pressed for space by an editor or journal requirements on word cound, it wouldn't be difficult to clarify the statement. Particularly for a white paper being circulated in the US, it's probably not useful to talke about Rilibak goals, since in particular chemistry assays are directly regulated by CLIA.

All right, let's move on. It may seem like I'm belaboring this point, but the problem becomes more evident in the next table.

Stoking the Sigma-metrics

The white paper provides a table of Sigma-metrics, then states unequivocally, "CLIA targets were used for the TEa."

But when the table is closely examined, you can see that's simply not true.

Analyte	Listed Goal	Source	Larger or smaller than CLIA?
ALP	21%	Rilibak	Smaller
Albumin	20%	Rilibak	Larger
ALT	21%	Rilibak	Larger
Amylase	30%	CLIA	(Rilibak's goal is smaller)
AST	21%	Rilibak	Larger
T. Bili	22%	Rilibak	Larger*
D. Bili	20.0%	?	(Neither Rilibak nor CLIA has a goal specified for this)
Calcium	10%	Rilibak	Smaller*
Chloride	5.0%	Rilibak	Larger
Cholesterol	10%	either	both sources have same goal
Creatinine Kinase	20%	Rilibak	Smaller
Creatinine	20%	Rilibak	Larger*
CO2	10%	?	might be greater than CLIA goal, uncertain
GGT	21%	Rilibak	(CLIA has no goal for this)
Glucose	15%	Rilibak	Larger
Iron	20%	CLIA	(Rilibak has no goal for this)
LDH	20%	CLIA	(Rilibak's goal is smaller)
Magnesium	15%	Rilibak	Smaller*
Phosphate	16%	Rilibak	(CLIA has no goals for this)
Potassium	8%	Rilibak	Smaller*
Sodium	5%	Rilibak	Larger
Total Protein	10%	Either	3.0 g/L < 60 g/L 5% > 60 g/L
Urea Nitrogen	20%	Rilibak	Larger
Triglycerides	25%	CLIA	(Rilibak's goal is smaller)
Uric Acid	13%	Rilibak	Smaller

The table shows that most of the goals selected, for 17 out of 25 analytes, were in fact the German Rilibak goals, NOT the CLIA goals. For 9 of those analytes, the Rillibak goal is larger than the CLIA goal. For another two analytes, the CLIA and Rilibak goals are exactly the same, so there is no difference between the sources.

Now for 6 of those analytes, the Rilibak goals are actually smaller than the CLIA goals, so whatever metrics are calculated with those goals, the Sigma-metrics based on CLIA goals will be higher. For 5 analytes, the CLIA goals were used - in 3 of those cases the CLIA goals were larger than the Rilibak goals, while for the other 2, no Rilibak goal has been specified.

In other words, the white paper statement is demonstrably false. In a majority of the analytes, the Rilibak goal was chosen, and for the majority of those Rilibak goals, the goals specified were larger than CLIA goals. Thus, the white paper was displaying a set of Sigma-metrics that were NOT representative of the performance expected in US labs under CLIA, and the Sigma-metrics presented are higher than what US labs should expect to experience.

At this point, of course, it would be easy to substitute the CLIA goals and calculate out the Sigma-metrics according to US requirements. We still have imprecision and bias data listed in the paper (not shown here).

But if the white paper has been wrong or misleading about the choice of goals, how can we assume that's the only mistake present in the paper? Can we trust the other data (imprecision and bias) that have been supplied? Unfortunately, the CV and bias data are not sourced (not from a published study), nor are they attributed to any specific institution. There's no way to check the veracity of those numbers. If the author of this white paper choose "convenient" goals, they might also have chosen "convenient" imprecision and bias.

In a paper like this, once the trust is broken, it's hard to know which parts of the document are acceptable. [Updated Note: as it happens, the manufacturer has admitted to making a genuine mistake, recalled the white paper, and made corrections.]

[Note: if the white paper were written in German, distributed to a German audience, none of this would be a problem. But this is in English, being presented to US audiences, and even broadcast on webinars to potential US customers]

We used to advise laboratories to ask for Sigma-metrics from all their vendors. Now we are seeing more frequent cases of mistaken, manipulated or artificially optimized Sigma-metrics being submitted during RFPs and bids. So labs seeking out accurate information on vendor performance need to demand more:

1. Get data from a third party source, such as a Bio-Rad Unity report from a customer site. Ideally, while the vendor takes you to customer sites, ask those customers directly for something like a Unity peer report. That way you're getting the data directly from a customer, and it doesn't pass through the hands of the vendor.

2. If data is provided by the vendor, require that the data is identified to a specific laboratory who can be called for reference.

3. Make sure the "raw" data is available: imprecision from 3-6 months of data, and bias calculated against either a peer group mean from a peer group program, the peer group mean from a PT/EQA survey, or from the difference between an assayed control's assigned/expected values and the observed values of the laboratory. Aslo make sure that the TEa goals are specified.

4. Preferably, take the raw data and calculate the Sigma-metrics directly for at least one or two analytes. You need to double-check that the vendor is performing the calculations correctly. [Updated note: again, this is a way to make sure the manufacturer hasn't made a mistake in their calculations.]

Conclusion

There was a recent tragic incident with a Tesla car driving on "autopilot." The driver trusted the car-maker so much that he didn't realize that the car was unable to detect a trailer truck crossing directly in front of the car. The self-driving mechanism slammed the car into the truck, killing the driver.

Automotive engineers have a term for this type of behavior. It's called "overtrust" - where consumers and customers assume that the car is capable of more safety than it is really able to deliver. The "Autopilot" feature on a Tesla car is really just a beta-version of enhanced cruise control, not fully perfected, and even in the documentation it tells drivers that they have to remain vigilant.

In the laboratory, we're in an era of increasing automation and informatics, where instruments are becoming more and more like "black boxes" that techs cannot adjust or modify. We're starting to behave like those instruments are running on "autopilot". More and more, this means the lab has to trust the vendor to deliver an instrument that is providing the right level of quality. It's okay to trust the vendor when that trust is warranted, but we have the same danger of overtrust.

As they used to say in the cold war, Trust, but verify. Proceeding with blind faith in the data that vendors provide may lead to a disastrous conclusion - a lab operating a poor instrument with a false sense of security, churning out bad results without any suspicion they are wrong.

[Updated note: Everyone makes mistakes. Manufacturers are not immune. When they make a mistake, we should laud them when they recognize it, recall it, and fix it. But that just underscores the need for us to remain vigilant about the data we get from all manufacturers.]

Tools, Technologies and Training for Healthcare Laboratories

Advanced Quality Management / Six Sigma