The problems, as Tabarrok points out, are numerous. For one, the 5% cutoff is completely arbitrary. There's no reason a paper with a significance level of 6% isn't providing almost as much evidence of an effect as one with 5%. But more than that, by having a level of significance that we feel "comfortable" with, we lose sight of what statistical significance is telling us: There's still a 5% probability we're reporting an effect that in fact occurred by chance. That's right, a 5% probability that drug didn't work at all, the school program wasn't effective, or that two groups are exactly the same. Tabarrok concludes, based on the work of Joan Ioannides, that with a very large population of researchers working, it becomes possible that many or most of the published papers report false results. The argument is complicated, so you'll have to read it yourself, but it's certainly worth thinking about.

I, however, find this issue less compelling than another he raises. Generally, if you have reason to believe that the model you're testing is well specified, scientists have a good sense for what is a real effect and what isn't. This gut confidence increases with repetition, as Tabarrok points out, because the likelihood of several studies finding an effect where none exists is even smaller than for each individual study. No, the real problem, to me, is that most studies aren't very well specified, which is what Tabarrok discusses in this piece. A test for statistical significance indeed tests for some difference between the two groups, but if we think X causes that difference, and our model leaves out Z which actually causes the difference, the end-result publication might be badly misleading. We too often expect statistical significance to do our heavy lifting for us, and believe me, you can hear economics grad students celebrating whenever they get across the magic 5% line. But as I said, all statistical significance tells us is that there is a difference, it doesn't tell us

*why.*

For more on this last point, see our earlier piece on correlation not being causation.

[An earlier version of this piece incorrectly attributed the above posts to Tyler Cowen, who co-writes the MR blog. They are by Alex Tabarrok.]

