Science News is a respectable magazine in the mould of New Scientist. So I was surprised to be pointed to a recent article by Tom Siegfried which comes close to blaming all the world’s woes on…we humble statisticians. I suggest that you read the article from beginning to end before returning to my comments.
It is easy to be defensive when statistics is polemically described as
a mutant form of math that has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos
and I will not disappoint! But ultimately a better response is to understand how others view and misinterpret statistics and sketch out an appropriate response.
The first paragraph seems to contain a naïve suggestion that we had some better system in the past. We didn’t. Prior to statistics there was no systematic way of dealing with inherently variable systems. Scientists could just about handle astronomical measurement error by averaging it out. But evaluating alternative medical treatments? Not a hope! In a world rife with unaccounted-for variance, science cannot operate without analysis of uncertainty.
My main criticisms of the position put by Siegfried are first that he offers no alternatives to current best practice, second that the problems described (and they are mostly genuine problems) are a consequence of scientific practice not statistics, and that third, statisticians are at the forefront of solving, or at least addressing, these problems.
The most prominent claim of the piece – Siegfried calls it science’s “dirtiest secret” - is that many, perhaps the majority, of statistically significant results that are published in the literature are false, by which he presumably means that the null hypothesis is true. Leaving aside how you would ever determine this, the source of the problem, and I think it is a genuine one, is publication bias. Insignificant P-values are not published so we only get to see those that are smaller than 0.05. If there are no real effects left to discover then these would be 100% false alarms. If real effects are rare, then we would not be surprised to find that most low P-values are false alarms.
Under the null hypothesis then, the distribution of a published P-value is presumably uniform on the interval [0,0.05] rather than [0,1]. If this is as big a problem as claimed then journals might start requiring P-values smaller than 0.05. But the best way to weed out false alarms is with replication. For the replicated experiment, the P-value is uniform on [0,1] since it will be automatically published (you would think) regardless of its value. If the original claim was at all important then it will be replicated. If it isn’t, it will sit there as an incorrect claim about something nobody cares about. Sounds like science at work to me.
The article also criticizes meta-analysis which is an attempt to get the benefits of replication. Of course, meta-analyses are only applied to published studies which are all subject to publication bias. Statisticians have some models for publication bias – for instance the funnel plot – but a lack of genuine replication is the responsibility of the scientists.
Confused conditional: What does a P-value mean?
It is true that many people interpret a P-value as the probability of the null given the data. I actually think that in a lot of contexts this confusion doesn’t matter i.e. in those contexts where you might have prior probability of 1/2 for the null. But in science where most nulls are probably true it is seriously misleading. So I agree that, on top of the publication bias problem, the likely prominence of nulls being true makes this interpretation a problem. However, what alternative is Siegfried offering here?
I can’t see a better solution to this than better education of scientists. The problem though is that many scientists are not that interested. They want a P-value less than 0.05 to get their publication accepted in order to progress their careers. In the real world of a scientist, a P-value does not actually mean the probability of the observed result or worse under the null. What is actually means is
the probability of this research being published has just increased from 0 to a much larger value!
Not good epistemology perhaps but the practice of science nevertheless.
With many scientists out there looking for significant P-values, the harder they look the more they will find. This is really a variant of the publication bias problem. But it is probably better understood by the scientists who are involved in these kinds of data dredging for P-values.
When searching for genetic markers of disease, not even a scientist thinks that the P-value represents the chance the gene is a cause. Again, what is the alternative being offered here?
There are numerous adjustments to P-values to make them better represent significance. There is a relatively recent emphasis on false discovery rates. This is still a hot research area for statisticians, and it is we statisticians who are the most likely to solve the scientists’ problem. We are surely not the cause.
There are two points made by Siegfried. One is that randomization in clinical trials does not get rid of selection bias. This appears to be nonsense. A properly randomized clinical trial gives samples that are unbiased, under the sampling scheme, with respect to all unobserved confounders. And observed confounders can be deliberately balanced. What I think Siegfried is arguing is that the researcher may – by bad luck – end up with the two samples not being balanced with respect to some unobserved confounder.
But the P-value takes this into account. It is one of the great early achievements of statistics that by the device of randomization we can make specific and unequivocal statements about the likelihood of a given result. False alarms could possibly still happen from unlucky distributions of unobserved confounders. The P-value tells us how likely this is to happen.
I’ll say it again: What is the alternative being offered here?
The second point relates to clinical trial results only giving treatment that are appropriate to the average patient. He makes the point that
…trial results are reported as averages that may obscure individual differences, masking beneficial or harmful effects and possibly leading to approval of drugs that are deadly for some and denial of effective treatment to others.
Well, any statistical analysis I know of will include all known patient specific information, not only to adjust for these factors and obtain reduced variability in the assessment of the treatment, but also to identify interactions of the treatment with these factors. While it may be ideal to have treatments individually tailored to each and every patient, in a world where we cannot conduct experiments on our own personal clones we will have to make do with less individualized medicine.
Siegfried quotes scientists as saying that
…reporting a single number gives the misleading impression that the treatment-effect is a property of the drug rather than of the interaction between the drug and the complex risk-benefit profile of a particular group of patients.
This is possibly a fair call, though I have just noted that many analyses will specifically include terms which measure this and are actually called interactions. But I think this touches on a deep but irrational psychological reaction that people have to medicine. If I tell someone that a particular drug I give them has a 1% chance of killing them and 99% chance of curing them, they will probably take it. If I tell them that 1 in 100 people carry a gene which interacts with the drug and they will definitely be killed by the drug, while everyone else is definitely cured by the drug, most people somehow feel different. The thought of that genetic marker invisibly tattooed on their forehead as the doctor sticks the syringe into their vein just freaks them out.
Of course, a one-sided rant against conventional statistics would not be complete without rolling out some false claims about the wonders of Bayesian statistics. Frequentist statisticians are, according to Siegfried, unable to take into account disease prevalance in the way Bayesian math can. Sign! However, he then goes on to complain that Bayesians have introduced
confusion into the actual meaning of the mathematical concept of “probability” in the real world.
Now why is probability in quotes? Perhaps even he realises that in his “real world” probability is not such a well defined concept as in smoky Casinos. Bayes is a red-herring in the entire discussion. Bayesian methods do not, of themselves, overcome publication bias, or multiplicity or treatment interactions. They have really become just an additional set of tools and models in the statistical armoury.