KDnuggets Home » News » 2014 » Jul » Opinions, Interviews, Reports » Guidelines for statistical education ( 14:n20 )

Guidelines for statistical education


Data science has grown in importance to the point where statistics education should begin to integrate data science into the core statistics curriculum, as opposed to treating data science as a separate strand.



I would bet that soon, if not today, most uses of statistical modeling methods are going to be in a data science context.  The guidelines include a section on data science.  However, I think the importance of data science is going to be such that statistics courses need to go further and not just teach data science as a separate strand, but integrate it throughout the curriculum.

For example, regression is a tool, and it can be used in research statistics to explain data (in which case R-squared and other goodness-of-fit statistics are important), or in data mining to predict new values (in which case predictive performance on a hold-out sample is the key metric).

Statistics courses generally teach regression in the former context.  Any data science (predictive modeling) angle comes later, if at all.  When approaching a data mining problem, "statistically-minded" analysts are trained to get tangled up in various technically elegant but substantively unimportant issues -- this reinforces the perception in the data science community that statisticians are not relevant to their needs.

We need to embrace the idea that there are (at least) two communities that use the contents of statistical toolkits -- data scientists and research statisticians. We should be the teachers of the tools, and how to use them appropriately in the two distinct contexts in ways that make sense given the real-world needs of the two communities.

Peter Bruce is the President of The Institute for Statistics Education at Statistics.com. He is the developer of Resampling Stats software (originated by Julian Simon in the 1970's), and taught resampling statistics at the U. of Maryland and elsewhere. He is the co-author of Data Mining for Business Intelligence (Wiley, 2006, 2nd ed. 2010), Introductory Statistics: A Resampling Perspective (Wiley 2014) and many journal articles.

Related:

Sign Up