I’m a data scientist – mind if I do surgery on your heart?
If I walked into an operating room and said I'm going to start dabbling in surgery I would be immediately thrown out. But people do that with statistics and data analysis all the time.
There has been a lot of recent interest from scientific journals and from other folks in creating checklists for data science and data analysis. The idea is that the checklist will help prevent results that won't reproduce or replicate from the literature. One analogy that I'm frequently hearing is the analogy with checklists for surgeons that can help reduce patient mortality.

You would never let me do surgery on you. I have no medical training at all. But I'm frequently asked to review papers that include complicated and technical data analyses, but have no trained data analysts or statisticians. The most common approach is that a postdoc or graduate student in the group is assigned to do the analysis, even if they don't have much formal training. Whenever this happens red flags are up all over the place. Just like I wouldn't trust someone without years of training and a medical license to do surgery on me, I wouldn't let someone without years of training and credentials in data analysis make major conclusions from complex data analysis.
You might argue that the consequences for surgery and for complex data analysis are on completely different scales. I'd agree with you, but not in the direction that you might think. I would argue that high pressure and complex data analysis can have much larger consequences than surgery. In surgery there is usually only one person that can be hurt. But if you do a bad data analysis, say claiming say that vaccines cause autism, that can have massive consequences for hundreds or even thousands of people. So complex data analysis, especially for important results, should be treated with at least as much care as surgery.

If I walked into an operating room and said I'm going to start dabbling in surgery I would be immediately thrown out. But people do that with statistics and data analysis all the time. What they really need is to require careful training and expertise in data analysis on each paper that analyzes data. Until we treat it as a first class component of the scientific process we'll continue to see retractions, falsifications, and irreproducible results flourish.
Jeff Leek is an assistant professor in the Biostatistics Department of the Johns Hopkins Bloomberg School of Public Health. His work focuses on statistical methods for high-dimensional data and genomics.
Original. Reposted with permission.
Related:
- Is Analytics Career Right for You?
- How To Become a Data Scientist And Get Hired
- Michael Li, Data Incubator on Data-driven Hiring for Data Scientists