# Guidelines for statistical education

Tags: Data Science Education, Statistics

Data science has grown in importance to the point where statistics education should begin to integrate data science into the core statistics curriculum, as opposed to treating data science as a separate strand.

I would bet that soon, if not today, most uses of statistical modeling methods are going to be in a data science context. The guidelines include a section on data science. However, I think the importance of data science is going to be such that statistics courses need to go further and not just teach data science as a separate strand, but integrate it throughout the curriculum.

For example, regression is a tool, and it can be used in research statistics to explain data (in which case R-squared and other goodness-of-fit statistics are important), or in data mining to predict new values (in which case predictive performance on a hold-out sample is the key metric).

Statistics courses generally teach regression in the former context. Any data science (predictive modeling) angle comes later, if at all. When approaching a data mining problem, "statistically-minded" analysts are trained to get tangled up in various technically elegant but substantively unimportant issues -- this reinforces the perception in the data science community that statisticians are not relevant to their needs.

We need to embrace the idea that there are (at least) two communities that use the contents of statistical toolkits -- data scientists and research statisticians. We should be the teachers of the tools, and how to use them appropriately in the two distinct contexts in ways that make sense given the real-world needs of the two communities.

For example, regression is a tool, and it can be used in research statistics to explain data (in which case R-squared and other goodness-of-fit statistics are important), or in data mining to predict new values (in which case predictive performance on a hold-out sample is the key metric).

Statistics courses generally teach regression in the former context. Any data science (predictive modeling) angle comes later, if at all. When approaching a data mining problem, "statistically-minded" analysts are trained to get tangled up in various technically elegant but substantively unimportant issues -- this reinforces the perception in the data science community that statisticians are not relevant to their needs.

We need to embrace the idea that there are (at least) two communities that use the contents of statistical toolkits -- data scientists and research statisticians. We should be the teachers of the tools, and how to use them appropriately in the two distinct contexts in ways that make sense given the real-world needs of the two communities.

**Peter Bruce**is the President of The Institute for Statistics Education at Statistics.com. He is the developer of Resampling Stats software (originated by Julian Simon in the 1970's), and taught resampling statistics at the U. of Maryland and elsewhere. He is the co-author of*Data Mining for Business Intelligence*(Wiley, 2006, 2nd ed. 2010),*Introductory Statistics: A Resampling Perspective*(Wiley 2014) and many journal articles.**Related:**- Exclusive Interview: Peter Bruce, President Statistics.com
- ASA – American Statistical Association and Big Data: Why statistical community is disconnected from Big Data and how to fix it.
- Statistical Community and Big Data disconnect: Discussion Highlights
- What is Wrong with the Definition of Data Science