KDnuggets Home » News » 2014 » Jan » Courses, Events » Data Mining for Beginners Boot Camp, Salford video series ( 14:n03 )

Data Mining for Beginners Boot Camp, Salford video series


This series shows how to easily apply SPM software suite to your predictive modeling projects, using a modern banking application as an example. This series is at the beginner level, and is perfect for first-time users or for those who need a refresher course in model building and data analysis.



Data Mining for Beginners Boot Camp Data Mining for
Beginners Boot Camp
,
a new video series
from Salford Systems.


1. Opening and Working with a Data File: We introduce a typical banking dataset in an excel spreadsheet and make initial observations. Our modeling goal is to determine what drives the account going delinquent and what is associated with it? We will review the key predictive variables available and use the View Data button to see in the inside of the data and/or generate graphs and stats prior to building a model.

2. Build a CART Model: We recommend starting with a simple decision tree so that you can make initial judgments about the quality of the data and see what's going on from a top level. We will cover how to set up your model, the importance of selecting a test sample, and setting limits on your segmentation size. We will build a classification model on the banking dataset and make some initial observations about the model performance and explore the various models within the modeling sequence we built.

3. Build a TreeNet Gradient Boosting Model: We will build a logistic binary model with TreeNet on the same banking dataset to gain additional insights and predicted probabilities about the data. There are a number of advantages to using this approach: the model is more accurate, the resulting predictions are smooth in form, and you are able to look inside the model and understand the relationships between the variables. Additionally, TreeNet works in a native logistic regression, so it can generate predicted probabilities about your data for you.

4. Save, Score, and Translate Predictive Models for Future Deployment: In a simplistic deployment, you need to be able to save your model and score new data using the model that you saved. You may then desire to translate the model into another language such as SAS, PMML, C, Java, etc. We will review how to accomplish all of these tasks within the SPM software suite.

See also