# Tag: Cross-validation (11)

**Training Sets, Test Sets, and 10-fold Cross-validation**- Jan 9, 2018.

More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.**How (and Why) to Create a Good Validation Set**- Nov 24, 2017.

The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.**Top KDnuggets tweets, Sep 06-12: Visualizing Cross-validation Code; Intro to #Blockchain and #BigData**- Sep 13, 2017.

Also: WTF #Python - A collection of interesting and tricky Python examples; Thoughts after taking @AndrewYNg #Deeplearning #ai course; Another #Keras Tutorial For #NeuralNetwork Beginners.**Visualizing Cross-validation Code**- Sep 5, 2017.

Cross-validation helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Check out this post for a visualized explanation.**Understanding overfitting: an inaccurate meme in Machine Learning**- Aug 23, 2017.

Applying cross-validation prevents overfitting is a popular meme, but is not actually true – it more of an urban legend. We examine what is true and how overfitting is different from overtraining.**Making Predictive Models Robust: Holdout vs Cross-Validation**- Aug 11, 2017.

The validation step helps you find the best parameters for your predictive model and prevent overfitting. We examine pros and cons of two popular validation strategies: the hold-out strategy and k-fold.**Understanding the Bias-Variance Tradeoff: An Overview**- Aug 8, 2016.

A model's ability to minimize bias and minimize variance are often thought of as 2 opposing ends of a spectrum. Being able to understand these two types of errors are critical to diagnosing model results.**How to Compute the Statistical Significance of Two Classifiers Performance Difference**- Mar 30, 2016.

To determine whether a result is statistically significant, a researcher would have to calculate a p-value, which is the probability of observing an effect given that the null hypothesis is true. Here we are demonstrating how you can compute difference between two models using it.**3 Things About Data Science You Won’t Find In Books**- May 11, 2015.

There are many courses on Data Science that teach the latest logistic regression or deep learning methods, but what happens in practice? Data Scientist shares his main practical insights that are not taught in universities.**11 Clever Methods of Overfitting and how to avoid them**- Jan 2, 2015.

Overfitting is the bane of Data Science in the age of Big Data. John Langford reviews "clever" methods of overfitting, including traditional, parameter tweak, brittle measures, bad statistics, human-loop overfitting, and gives suggestions and directions for avoiding overfitting.**Top KDnuggets tweets, Apr 18-20**- Apr 22, 2014.

Cross-validation pitfalls for regression/classification and how to avoid them; Data Workflows for Machine Learning ; Apache Spark, the hot new trend in Big Data ; Visual Analysis Best Practices - download a free guidebook from Tableau.