Posted by Sten Westgard, MS
In the iconic western film, The Magnificent Seven, there is a famous scene about marksmanship. [Quick set up: The Magnificent Seven are - you guessed it, seven - gunmen hired to protect a peasant village from a much larger group of bandits] Early in the film, the heroic gunmen detect three bandit scouts and want to capture them. In an abrupt exchange of gunfire, two bandits are killed, but the third bandit mounts his horse and attempts to escape. As the bandit flees, one of the gunmen, Britt, steadies his pistol and takes aim. The escaping bandit gallops farther and farther away. But just as he is about to disappear behind a hill, Britt shoots, hitting the bandit square in the back, killing him. The youngest of the gunmen, Chico, shouts:
Chico: (in awe) That was the greatest shot I've ever seen!
Britt: (sternly) The worst! I was aiming at the horse.
This scene reminds us that even when what one person thinks is great performance may not be acceptable by another person's standards. Indeed, what appears like an accomplishment may actually be an error.
And what, you may ask, does this have to do with Six Sigma?
In the laboratory, this can be particularly true, because outside observers find it easy to be in awe of the test results we produce. Laboratory test results look very impressive on the face of it. They appear so definitive, numerical, with decimal points and significant figures. Usually the results arrive without any mention of uncertainty, or a confidence interval. Clinicians often accept the latest test results as if there were no errors at all in the measurements.
Of course, we know better. We know there is some amount of error in every measurement. Perfection simply isn't possible. But how much error is present - and whether or not that error makes any difference in the clinical scenario - is harder to know.
How much error is allowable for a given method? Getting an answer for this question is harder than it seems. The difficulty of obtaining an answer is confounded by conflicting models and differing terminology. At various times, quality requirements have been expressed as medically allowable (maximum) inaccuracy, medically allowable (maximum) imprecision, total error, etc.
A recent review of this issue by Dr George G Klee of the Mayo Clinic concluded:
"There is no consensus currently about the preferred methods for establishing medically necessary analytic performance limits. The various methods give considerably different performance limits."
Klee GG, Establishment of Outcome-Related Analytic Performance Goals, Clin Chem 2010 56:5;714-722.
The Sigma-metric approach is predicated on a quality requirement that is expressed as a total allowable error, but does not specify how this quality requirement should be determined or obtained.
Thus, a quality requirement could be obtained from the US CLIA proficiency testing criteria, the RCPA guidelines, the Rilibak rules, the Ricos et al database on desirable specification for total error based on within-subject biologic variation, an ISO standard, a peer group specification, or a locally-determined specification. The laboratory choose which quality requirement to use, but the choice of the appropriate quality requirement is critical to the subsequent utility of the Sigma-metric calculation.
In this regard, the Sigma-metric approach is similar to the ISO standards that are "horizontal," not "vertical." The Sigma-metric approach gives you general guidance (establish a quality requirement) without dictating an exclusive method for (i.e. all quality requirements should be determined by CLIA, or all tests should meet a goal of 15%, etc.). The idea behind a horizontal standard is that it allows flexibility as well as adaptability. If local use of a laboratory test is particularly strict, then the Sigma-metric approach allows us to define a tighter requirement than may otherwise be applied. The general nature also allows the approach to evolve. As testing usage changes in the future, different quality requirements can be applied.
For labs and those who dislike the ambiguity of "free will" in the choice of quality requirements, the Stockholm Consensus Hierarchy provides guidance and a framework that rates which sources of information are preferred for quality requirements. Some quality requirements are better than others. In general, local, evidence-based quality requirements are preferred over survey-derived, or expert-group-driven goals. If we can actually get our clinicians to explain how they use a test result, that is the best quality requirement; understanding the use of the test allows us to work backward to assure that a "normal" result does not become an "abnormal" result because of the bias and imprecision of the analytical method.
One consequence of the freedom to choose quality requirements is that Sigma metrics from different studies may not be comparable. While one study may report Six Sigma performance, another study of the same data may not - if a different quality requirement is applied. This means that we can't accept Sigma metric results blindly; we must also know what quality requirements were used (as well as how the imprecision and bias were estimated, at what decision levels the metrics were measured, etc.)
Admittedly, this means that the "universality" of the Sigma metric is not quite as universal. If every method validation study chooses different quality requirements, the Sigma metrics can't be compared. But even in the traditional defect-counting approach of Six Sigma, this type of goal-setting challenge occurs. With studies that measure the Sigma metric of TAT, the result is often heavily weighted by how the defects are measured (For example, some studies attempt to count only 95% of the TAT for test results, leaving out the extreme 5% of results - which makes a higher Sigma metric much easier to achieve).
As always, the possibility of different quality requirements means laboratories must be vigilant in the use of Sigma metrics. Sigma metric studies can't be accepted simply at face value - we have to read the study to make sure the results are valid for our laboratory. But really, this is hardly a new requirement for scientific papers.
Thus, when someone compares Sigma metrics from different studies, we have to critically assess the papers to confirm that comparability. Someone may claim an apples-to-apples comparison, but upon closer examination, we may find instead that they're backing the wrong horse.
Comments