Interpreting Model Performance with Cost Functions

Cost functions are critical for the correct assessment of performance of data mining and predictive models. This series goes deep into the statistical properties and mathematical understanding of each cost function and explores their similarities and differences.

Salford Systems, Jan 9, 2014.

In this Salford Systems 10-part video series we discuss the concept of cost functions, which are directly related to the performance of data mining and predictive models. We go deep into the statistical properties and mathematical understanding of each cost function and explore their similarities and differences.

Cost functions are important because there are many ways to design a machine learning algorithm, as well as interpret its performance. This cost functions series will help the analyst discover the underpinnings of the algorithms and usefulness of each algorithm's functionality. Following this series you will have a deep understanding of a slew of cost functions available for classification and regression models, as well as interpret your predictive models from an expert's point of view.

Part 1: An Introduction to Understanding Cost Functions (12 minutes)

  • The supervised learning problem: what is it and how is it applied in machine learning?
  • How cost functions are used to solve the supervised learning problem
  • Evaluating the 'fit' of the current response surface on the data available.
  • What is special about predicting the response in a regression problem?

Part 2: Least Squares Deviation Cost for a Regression Problem (11 minutes)

  • What is the Least Squares Deviation Cost function?
  • What are the advantages of LS?
  • Introducing the underpinnings of the LSD's statistical properties on the formula level
  • What are the disadvantages of LS?

Part 3: Least Absolute Deviation and Huber M Costs for a Regression Problem (14 minutes)

  • What is the Least Absolute Deviation Cost function?
  • How is LAD different from LS (part2)?
  • Understanding of how LAD handles outliers
  • What are LAD's negative attributes?
  • What is Huber-M Cost, and how does it compare to LS and LAD?
  • Conclusion to cost functions used for a regression problem

Part 4: Introducing the Binary Classification Problem (10 minutes)

  • Why binary classification is commonly used among data analysts?
  • What are the fundamentals of the binary classification problem?
  • How to construct a simple response surface using linear regression
  • How to use decision rules to make predictions

Part 5: Evaluating Prediction Success with Precision and Recall (13 minutes)

  • Recap: working with a binary classification problem
  • Evaluating the 'positive' group of predictions
  • Define: precision, recall (sensitivity), and specificity

Part 6: Measuring Performance with the ROC Curve (19 minutes)

  • Review: prediction success table
  • Sensitivity vs. Specificity
  • What is the ROC Curve, and how is it used to evaluate model performance?
  • Advantages of evaluating based on ROC
  • How to utilize the Area Under Curve (AUC)

Part 7: Measuring Performance with Gains and Lift (21 minutes)

  • Introduction to how gains and lift can be applied to a direct marketing application
  • Score the sample dataset and plot the Gains curve
  • Define: base rate and lift
  • Similarities and differences between ROC and Gains curve

Part 8: Direct Interpretation of Response using Logistic Function (19 minutes)

  • Introducing the mathematical structure behind the algorithm
  • Direct interpretation of likelihood as a cost function
  • Two ways to write logistic cost function

Part 9: Multinomial Classification - Expected Cost (23 minutes)

  • What is multinomial classification?
  • Assigning prior probabilities to your model
  • Interpreting the expected cost
  • Define: base-line, relative cost, and unit costs
  • Alternative approaches to finding the expected cost

Part 10: Multinomial Classification - Log Likelihood (19 minutes)

  • Build a probability response model
  • Evaluate the performance of your multinomial classification model with Log-Likelihood
  • Applying Log Likelihood to ensemble modeling scenarios
  • Alternative method: Margin-based Cost Function
Additional information:

Cost Functions 101: The Underpinnings of Model Performance blog