# Why Are So Many Scientific Studies Flawed And Poorly Understood?

## Flawed, misleading research is costly to society because much of it is the result of poorly spent government funding, and it often gives rise to unwise regulation.

Should we believe the *USA Today* headline, “Drinking four cups of coffee daily lowers risk of death”? And what should we make of, “Mouthwash May Trigger Diabetes. . .”? Should we really eat more, not less, fat? These sorts of conclusions, supposedly from “scientific studies,” seem to vary from month to month, leading to ever-shifting “expert” recommendations. However, most of their admonitions are based on dubious “research” that lacks a valid scientific basis and should be relegated to the realm of folklore and anecdotes.

Flawed, misleading research is costly to society because much of it is the result of poorly spent government funding, and it often gives rise to unwise regulation. One remedy would be greater statistical literacy that would enable the public – and their elected leaders – to reject “junk” science.

Statistics is a mathematical tool used in many scientific disciplines to analyze data. It is intended to provide a result that will reveal something about the data that otherwise is not obvious, which we will refer to as a “finding” or a “claim.” Before undertaking an analysis, a researcher formulates a hypothesis — which is his best guess for what he expects to happen. A “p-value” is a term used in statistics to indicate whether the finding confirms the result that the researcher was expecting. An essential part of this process is that *before* undertaking the analysis, the researcher must formulate a hypothesis that he expects the analysis would tend to prove or disprove based on the p-value. The lower the p-value, the greater the confidence that the finding is valid.

Usually a “strawman” hypothesis is advanced, for example that treatments A and B are equally effective. The two treatments are then compared, and any p-value less than 0.05 (p<.05) is, by convention, usually considered “statistically significant” and tends to *disprove* the strawman hypothesis that the effects of the treatments are the same. The alternative hypothesis, A is different from B (for example, aspirin is better than a sugar pill, to relieve a headache) is now accepted.

However – and this is a key point — a p-value less than 0.05 (p<.05) can occur by chance, which is known as a false positive. The standard scientific approach to identifying a false positive is to attempt to replicate the possibly false positive result. If the original results don’t replicate, it is assumed that they were false – and we’re left with the original “strawman” hypothesis that there is no difference between A and B.

But things can get complicated, because the p-value analysis can be manipulated so that it *appears* to support a false claim. For example, a scientist can look at a lot of questions, which is known as “data dredging,” and formulate a hypothesis *after* the analysis is done, which is known as HARKing, Hypothesis After the Result is Known. Together these violate the fundamental scientific principle that a scientist must *start *with a hypothesis, not concoct one after the data set has undergone analysis. […]

Spurious FFQ studies are published constantly. The inventor of the FFQ has to his credit (?) more than 1,700 papers. The original FFQ paper is cited over 3,300 times. It appears that virtually none of the researchers using FFQs correct their analysis for the statistical phenomena discussed here, and the authors of FFQ papers are remarkably creative in providing plausible rationales for the “associations” they discover – in other words, HARKing.

This situation creates a kind of self-licking ice cream cone: Researchers have been thriving by churning out this dubious research since the early 1990s, and inasmuch as most of the work on Food Frequency Questionnaires is government funded – by the National Cancer Institute, among other federal entities — it’s ripping off taxpayers as well as misleading them. Curiously, editors and peer-reviewers of research articles have not recognized and ended this statistical malpractice, so it will fall to government funding agencies to cut off support for studies with flawed design, and to universities to stop rewarding the publication of bad research. We are not optimistic.