Article from fivethirtyeight.com:
“If you follow the headlines, your confidence in science may have taken a hit lately.
(..) International Journal of Advanced Computer Technology recently accepted for publication a paper titled “Get Me Off Your Fucking Mailing List,” whose text was nothing more than those seven words, repeated over and over for 10 pages. Two other journals allowed an engineer posing as Maggie Simpson and Edna Krabappel to publish a paper, “Fuzzy, Homogeneous Configurations.”
Revolutionary findings? Possibly fabricated. In May, a couple of University of California, Berkeley, grad students discovered irregularities in Michael LaCour’s influential paper suggesting that an in-person conversation with a gay person could change how people felt about same-sex marriage. The journal Science retracted the paper shortly after, when LaCour’s co-author could find no record of the data.
Taken together, headlines like these might suggest that science is a shady enterprise that spits out a bunch of dressed-up nonsense. But I’ve spent months investigating the problems hounding science, and I’ve learned that the headline-grabbing cases of misconduct and fraud are mere distractions. The state of our science is strong, but it’s plagued by a universal problem: Science is hard — really fucking hard.
If we’re going to rely on science as a means for reaching the truth — and it’s still the best tool we have — it’s important that we understand and respect just how difficult it is to get a rigorous result. I could pontificate about all the reasons why science is arduous, but instead I’m going to let you experience one of them for yourself. Welcome to the wild world of p-hacking.
If you tweaked the variables until you proved that Democrats are good for the economy, congrats; go vote for Hillary Clinton with a sense of purpose. But don’t go bragging about that to your friends. You could have proved the same for Republicans.
The data in our interactive tool can be narrowed and expanded (p-hacked) to make either hypothesis appear correct. That’s because answering even a simple scientific question — which party is correlated with economic success — requires lots of choices that can shape the results. This doesn’t mean that science is unreliable. It just means that it’s more challenging than we sometimes give it credit for.
Which political party is best for the economy seems like a pretty straightforward question. But as you saw, it’s much easier to get a result than it is to get an answer. The variables in the data sets you used to test your hypothesis had 1,800 possible combinations. Of these, 1,078 yielded a publishable p-value,1 but that doesn’t mean they showed that which party was in office had a strong effect on the economy. Most of them didn’t.
The p-value reveals almost nothing about the strength of the evidence, yet a p-value of 0.05 has become the ticket to get into many journals. “The dominant method used [to evaluate evidence] is the p-value,” said Michael Evans, a statistician at the University of Toronto, “and the p-value is well known not to work very well.”
Scientists’ overreliance on p-values has led at least one journal to decide it has had enough of them. In February, Basic and Applied Social Psychology announced that it will no longer publish p-values. “We believe that the p < .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research,” the editors wrote in their announcement. Instead of p-values, the journal will require “strong descriptive statistics, including effect sizes.”
After all, what scientists really want to know is whether their hypothesis is true, and if so, how strong the finding is. “A p-value does not give you that — it can never give you that,” said Regina Nuzzo, a statistician and journalist in Washington, D.C., who wrote about the p-value problem in Nature last year. Instead, you can think of the p-value as an index of surprise. How surprising would these results be if you assumed your hypothesis was false?
As you manipulated all those variables in the p-hacking exercise above, you shaped your result by exploiting what psychologists Uri Simonsohn, Joseph Simmons and Leif Nelson call “researcher degrees of freedom,” the decisions scientists make as they conduct a study. These choices include things like which observations to record, which ones to compare, which factors to control for, or, in your case, whether to measure the economy using employment or inflation numbers (or both). Researchers often make these calls as they go, and often there’s no obviously correct way to proceed, which makes it tempting to try different things until you get the result you’re looking for.
Scientists who fiddle around like this — just about all of them do, Simonsohn told me — aren’t usually committing fraud, nor are they intending to. They’re just falling prey to natural human biases that lead them to tip the scales and set up studies to produce false-positive results.
Since publishing novel results can garner a scientist rewards such as tenure and jobs, there’s ample incentive to p-hack. Indeed, when Simonsohn analyzed the distribution of p-values in published psychology papers, he found that they were suspiciously concentrated around 0.05. “Everybody has p-hacked at least a little bit,” Simonsohn told me.”
this stretches a bit longer – read full article