The Truth Wears Off

Many rigourously proved scientific results start shrinking in later studies. What went wrong? (My guess - widespread data overfitting and confirmation bias).

Is there something wrong with the scientific method?

New Yorker, by Jonah Lehrer, December 13, 2010

Many results that are rigorously proved and accepted start shrinking in later studies.

On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties. The drugs, sold under brand names such as Abilify, Seroquel, and Zyprexa, had been tested on schizophrenics in several large clinical trials, all of which had demonstrated a dramatic decrease in the subjects' psychiatric symptoms. As a result, second-generation antipsychotics had become one of the fastest-growing and most profitable pharmaceutical classes. By 2001, Eli Lilly's Zyprexa was generating more revenue than Prozac. It remains the company's top-selling drug.

But the data presented at the Brussels meeting made it clear that something strange was happening: the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren't any better than first-generation antipsychotics, which have been in use since the fifties. "In fact, sometimes they now look even worse," John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.

Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it's known, is the foundation of modern research. Replicability is how the community enforces itself. It's a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.

But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It's as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn't yet have an official name, but it's occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.

Read more.

Comments

Gregory Piatetsky
See also a study by Stanley Young from NISS, "Everything is Dangerous" where he suggests that there is widespread overfitting in analysis of medical data analysis, resulting in a perception that everthing has dangerous side effects
www.kdnuggets.com/2010/03/pub-most-medical-studies-are-wrong.html December 28, 2010

Gregory Piatetsky
If this problem appears whenever there are small samples, then all medical and social science findings are suspect ! January 04, 2011

Scott Czepiel
Not to divert attention from the main point of the article, but many social scientists routinely deal with sample sizes in the hundreds of thousands (eg. Current Population Surveys) or millions (Census). So please do not disparage an entire discipline by unfairly characterizing its methods as suspect.
January 05, 2011

Gregory Piatetsky
There are a few large datasets like Framingham Heart Study, but my impression is that most social science papers are based on small datasets. The main point of the study is that poor data analysis and confirmation bias tend to create stronger results than warranted, and thus many social science findings can be suspect. Census was not designed for individual level analysis.
January 05, 2011

Gregory Piatetsky
Here is another example of the trend of bad evaluation in social sciences - a paper which claims evidence of ESP - Extra Sensory Perception - and it looks like another case of bad statistics www.nytimes.com/2011/01/06/science/06esp.html
January 05, 2011