KDnuggets Home » News » 2010 » Mar » Publications » Most medical studies wrong  ( < Prev | 10:n06 | Next > )

We're so good at medical studies that most of them are wrong


 
  
A survey of the recent medical literature found that 95 percent of the results of observational studies on human health had failed replication when tested using a rigorous, double blind trial. Given massive data sets and ability perform multiple tests, many researchers fall into trap of finding "significant" results which are due to random chance.


ArsTechnica, By John Timmer, March 2010

It's possible to get the mental equivalent of whiplash from the latest medical findings, as risk factors are identified one year and exonerated the next. According to a panel at the American Association for the Advancement of Science, this isn't a failure of medical research; it's a failure of statistics, and one that is becoming more common in fields ranging from genomics to astronomy. The problem is that our statistical tools for evaluating the probability of error haven't kept pace with our own successes, in the form of our ability to obtain massive data sets and perform multiple tests on them. Even given a low tolerance for error, the sheer number of tests performed ensures that some of them will produce erroneous results at random.

Gregory PS: here is an excellent essay by S. Stanley Young, National Institute of Statistical Sciences, Everything is Dangerous, A Controversy (PDF)

...

The problem now is that we're rapidly expanding our ability to do tests. Various speakers pointed to data sources as diverse as gene expression chips and the Sloan Digital Sky Survey, which provide tens of thousands of individual data points to analyze. At the same time, the growth of computing power has meant that we can ask many questions of these large data sets at once, and each one of these tests increases the prospects than an error will occur in a study; as Shaffer put it, "every decision increases your error prospects." She pointed out that dividing data into subgroups, which can often identify susceptible subpopulations, is also a decision, and increases the chances of a spurious error. Smaller populations are also more prone to random associations.

In the end, Young noted, by the time you reach 61 tests, there's a 95 percent chance that you'll get a significant result at random. And, let's face it - researchers want to see a significant result, so there's a strong, unintentional bias towards trying different tests until something pops out.

...

Consequences and solutions

It's pretty obvious that these factors create a host of potential problems, but Young provided the best measure of where the field stands. In a survey of the recent literature, he found that 95 percent of the results of observational studies on human health had failed replication when tested using a rigorous, double blind trial. So, how do we fix this?

The consensus seems to be that we simply can't rely on the researchers to do it. As Shaffer noted, experimentalists who produce the raw data want it to generate results, and the statisticians do what they can to help them find them. The problems with this are well recognized within the statistics community, but they're loath to engage in the sort of self-criticism that could make a difference. (The attitude, as Young described it, is "We're both living in glass houses, we both have bricks.")

Shaffer described how there were tools (the "family-wise error rate") that were once used for large studies, but they were so stringent that researchers couldn't use them and claim much in the way of positive results. The statistics community started working on developing alternatives about 15 years ago but, despite a few promising ideas, none of them gained significant traction within the community.

Both Moolgavkar and Young argued that the impetus for change had to come from funding agencies and the journals in which the results are published. These are the only groups that are in a position to force some corrections, such as compelling researchers to share both data sets and the code for statistical models.

Read more.


KDnuggets Home » News » 2010 » Mar » Publications » Most medical studies wrong  ( < Prev | 10:n06 | Next > )