*Posted by Sten Westgard, MS*

What's in a Sigma-metric of 3, 6, or even 11?

Sigma-metrics provide a useful way of classifying method performance and relating that performance to the QC that is necessary to “verify the attainment of the intended quality of test results,” which is a requirement of ISO 15189. But, Sigma-metrics are not foolproof. Does that bother you?

Maybe it's better if we frame this by referencing the 1984 cult film, This is Spinal Tap.

There's a classic scene in *This is Spinal Tap *that has been on my mind recently. For those not familiar, this is a mock documentary of a fake rock band. In the following scene, the interviewer (Marty) raises questions about the band's amplifier:

Nigel Tufnel [member of the band]:The numbers all go to eleven. Look, right across the board, eleven, eleven, eleven and...

Marty DiBergi:Oh, I see. And most amps go up to ten?

Nigel Tufnel:Exactly.

Marty DiBergi:Does that mean it's louder? Is it any louder?Nigel Tufnel:Well, it's one louder, isn't it? It's not ten. You see, most blokes, you know, will be playing at ten. You're on ten here, all the way up, all the way up, all the way up, you're on ten on your guitar. Where can you go from there? Where?

Marty DiBergi:I don't know.

Nigel Tufnel:Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?

Marty DiBergi:Put it up to eleven.Nigel Tufnel:

Marty DiBergi:Eleven. Exactly. One louder.Why don't you just make ten louder and make ten be the top number and make that a little louder?

Nigel Tufnel: [pause]

These go to eleven.

Whether you number your amplifier from 1 to 10, or 2 to 11, the output is really still the same, no matter what you call it. For Spinal Tap, it didn't matter how loud they played, they were still an awful band. High volume was not correlated with High quality. (Creeping back into the parlance of our laboratory world: having a constant bias of 1 does not improve the band's result.)

To return to our laboratory, what’s the real meaning of a Sigma-metric of 3, or 6, or even 11? The typical Sigma benchmarks range from 3 to 6, i.e., a Sigma-metric of 6 indicates world-class quality, whereas a Sigma of 3 or less indicates that a process is not suitable for routine production because of its low quality. The importance of a Sigma-metric of 11 is that it shows the method achieves world class quality. Given an analytic system, with one method with 3 sigma performance and another with 11 sigma, that means one method is not suitable for routine use and the other is. It doesn’t follow that, on average, both methods achieve world class quality.If we accept that Sigma-metrics are useful in the laboratory, we have a new challenge. As with *Spinal Tap'*s amplifier, there is a concern about how the Sigma-metric numbers are generated. The final Sigma-metric is a number, but how we get that number is just as important. We need to assure that the Sigma-metric is a product of the best measurements, not simply a creature of manufactured biases and optimistic assumptions. And even then, we also need to recognize that the Sigma-metric is not the only number. It's a statistic that measures one component of quality in the Total Testing Process.

The main components of the Sigma-metric are the quality requirement, the medical decision level that is the source of the quality requirement , the estimate of imprecision (CV), and the estimate of inaccuracy (bias). Failure to get the best number for each one of these components can render the resulting calculation useless.

It's certainly possible to "gin up" a high Sigma-metric. Use only a within-run precision study to determine your CV for a lower estimate of imprecision. Zero your bias or only compare yourself to the most similar method. Choose a single medical decision level, preferably one at the point where performance is best (whether or not any interpretation takes place in that region). Finally, shop around for quality requirements among the different organizations (Rilibak, RSCP, CAP, CLIA) until you find the biggest one.

But does the fact that it is possible to make a bad calculation mean that the equation itself should be jettisoned? It is possible to abuse many statistics - look at the correlation coefficient. While this statistic is often misused as an indicator of method acceptability, that does not eliminate its usefulness (it does indicate whether or not method comparison data cover a wide enough range so that regression statistics are reliable for estimating bias). The fact that people abuse statistics doesn't mean we should abolish them. It simply means we must be aware of the problem and be careful about how we use the calculations.

Here's the same question in a more current frame: because of Toyota's 2009-2010 quality problems, should we abandon Lean and the Toyota Production System? Have all those concepts and calculations been rendered invalid? Certainly the Lean consulting industry has been quick to separate the behavior of Toyota from the practices of the "Toyota" Production System. The truth is, every hot new management system, and also every new calculation or graph, is open to manipulation and failure. The saying "Garbage In, Garbage Out", which once referred to the nascent computer programming industry, still applies. Just because one person is able to misuse a practice doesn't mean that we should condemn the practice. There is a difference between the individual use of a thing and the thing itself. [note: there is a more valid argument against a "new approach" if, in practice, almost everyone misuses it in a particularly catastrophic way. See derivatives, subprime mortgages, etc. ]

There seems to be a extremely demanding purity test for new approaches in the laboratory field. If a new statistic isn't both perfect and incorruptible, it is not worthy to replace the status quo. It would be more realistic to assess the calculation for its practicality: can the new statistic tell us something that other statistics can't? And/or can the new statistic make it easier for us to learn something about the method?

The main strength of the Sigma-metric, in my view, is that it is an tool that differentiates between good and bad performance in an way that anyone can readily grasp. While people may be able to look at regression statistics and estimates of bias and imprecision and reach the same conclusion, it's faster to look at a simple number between 1 and 6.

But while the Sigma-metric equation has a practical use, that doesn't mean it is infallible. Care needs to be taken in the components and estimates that lead up to the calculation. I intend to explore some "best practices" for the calculation of Sigma-metrics in future articles and posts. Stay tuned.
## Comments