Guidelines for statistical education
Tags: Data Science Education, Statistics
Data science has grown in importance to the point where statistics education should begin to integrate data science into the core statistics curriculum, as opposed to treating data science as a separate strand.
I would bet that soon, if not today, most uses of statistical modeling methods are going to be in a data science context. The guidelines include a section on data science. However, I think the importance of data science is going to be such that statistics courses need to go further and not just teach data science as a separate strand, but integrate it throughout the curriculum.
For example, regression is a tool, and it can be used in research statistics to explain data (in which case Rsquared and other goodnessoffit statistics are important), or in data mining to predict new values (in which case predictive performance on a holdout sample is the key metric).
Statistics courses generally teach regression in the former context. Any data science (predictive modeling) angle comes later, if at all. When approaching a data mining problem, "statisticallyminded" analysts are trained to get tangled up in various technically elegant but substantively unimportant issues  this reinforces the perception in the data science community that statisticians are not relevant to their needs.
We need to embrace the idea that there are (at least) two communities that use the contents of statistical toolkits  data scientists and research statisticians. We should be the teachers of the tools, and how to use them appropriately in the two distinct contexts in ways that make sense given the realworld needs of the two communities.
Peter Bruce is the President of The Institute for Statistics Education at Statistics.com. He is the developer of Resampling Stats software (originated by Julian Simon in the 1970's), and taught resampling statistics at the U. of Maryland and elsewhere. He is the coauthor of Data Mining for Business Intelligence (Wiley, 2006, 2nd ed. 2010), Introductory Statistics: A Resampling Perspective (Wiley 2014) and many journal articles.
Related:
For example, regression is a tool, and it can be used in research statistics to explain data (in which case Rsquared and other goodnessoffit statistics are important), or in data mining to predict new values (in which case predictive performance on a holdout sample is the key metric).
Statistics courses generally teach regression in the former context. Any data science (predictive modeling) angle comes later, if at all. When approaching a data mining problem, "statisticallyminded" analysts are trained to get tangled up in various technically elegant but substantively unimportant issues  this reinforces the perception in the data science community that statisticians are not relevant to their needs.
We need to embrace the idea that there are (at least) two communities that use the contents of statistical toolkits  data scientists and research statisticians. We should be the teachers of the tools, and how to use them appropriately in the two distinct contexts in ways that make sense given the realworld needs of the two communities.
Peter Bruce is the President of The Institute for Statistics Education at Statistics.com. He is the developer of Resampling Stats software (originated by Julian Simon in the 1970's), and taught resampling statistics at the U. of Maryland and elsewhere. He is the coauthor of Data Mining for Business Intelligence (Wiley, 2006, 2nd ed. 2010), Introductory Statistics: A Resampling Perspective (Wiley 2014) and many journal articles.
Related:
 Exclusive Interview: Peter Bruce, President Statistics.com
 ASA – American Statistical Association and Big Data: Why statistical community is disconnected from Big Data and how to fix it.
 Statistical Community and Big Data disconnect: Discussion Highlights
 What is Wrong with the Definition of Data Science
Top Stories Past 30 Days

