Check out the courses in Data Mining from Statistics.com - Gregory PS, Editor
Online courses in Data Mining at The Institute for Statistics Education
Predictive Modeling and Forecasting
In predictive modeling (also called predictive analytics) we seek to predict the value of a variable of interest (purchase/no purchase, fraudulent/not fraudulent, malignant/benign, amount of spending, etc.) by using "training" data where the value of this variable is known. Once a statistical model is built with the training data ("trained"), it is then applied to data where the value is unknown. Predictive modeling is also termed "supervised learning" and is covered in the following courses:
- Introduction to Predictive Modeling
- Data Mining in R
- Decision Trees & Rule-Based Segmentation
- Statistical Analysis of Microarray Data in R
- Support Vector Machines in R
- Forecasting Analytics (Time Series)
Segmentation/Clustering
In clustering, we seek to identify groups of customers, records, etc. that are similar to one another. "Clustering" is the general statistical technique; when we apply it to customers it is the statistical component in customer segmentation. Clustering is an "unsupervised" data mining method - there is no known outcome that serves to train a model.
Recommender Systems
The purpose of a recommender system is to identify, statistically, "what goes with what." These systems lie behind the notices you see on web sites advising you that "customers who bought X also bought Y." The general statistical terms for the methods used are affinity analysis and association rules; these are unsupervised methods.
Before You Start...
Surprisingly, most of your work will involve preparing the data for analysis. And if you're not answering the right question, using an appropriate method, and avoiding the common pitfalls, all your work may be in vain.
- Data Prep & Cleaning for Analytics
- Data Mining Mistakes and How to Avoid Them
- Interactive Data Visualization
Text Mining and Sentiment Analysis
The most rapid data growth is not in numerical data, but in text - Twitter feeds, the contents of Facebook pages, emails, etc. - which must be pre-processed to be usable. Learn more:
Data Mining Webinar with Peter Bruce
This webinar (47 MB) will give an overview of data mining techniques. The presenter is Peter Bruce, President of Statistics.com and co-author of Data Mining for Business Intelligence (Wiley, 2010).
(Video doesn't play? Click here and see instructions at the bottom of the page.)