Cross-validation (22)

Dealing with Data Leakage - Oct 8, 2021.

Target leakage and data leakage represent challenging problems in machine learning. Be prepared to recognize and avoid these potentially messy problems.

Cross-validation, Data Science, Datasets, Machine Learning, Modeling, Training Data
Full cross-validation and generating learning curves for time-series models - Jul 23, 2021.

Standard cross-validation on time series data is not possible because the data model is sequential, which does not lend well to splitting the data into statistically useful training and validation sets. However, a new approach called Reconstructive Cross-validation may pave the way toward performing this type of important analysis for predictive models with temporal datasets.

Cross-validation, Time Series
Can you trust AutoML? - Dec 23, 2020.

Automated Machine Learning, or AutoML, tries hundreds or even thousands of different ML pipelines to deliver models that often beat the experts and win competitions. But, is this the ultimate goal? Can a model developed with this approach be trusted without guarantees of predictive performance? The issue of overfitting must be closely considered because these methods can lead to overestimation -- and the Winner's Curse.

Accuracy, AutoML, Cross-validation, Machine Learning, Model Performance, Overfitting
20 Core Data Science Concepts for Beginners - Dec 8, 2020.

With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.

Beginners, Bias, Cross-validation, Data Science, Data Visualization, Data Wrangling, Outliers, PCA, Variance
Key Machine Learning Technique: Nested Cross-Validation, Why and How, with Python code - Oct 5, 2020.

Selecting the best performing machine learning model with optimal hyperparameters can sometimes still end up with a poorer performance once in production. This phenomenon might be the result of tuning the model and evaluating its performance on the same sets of train and test data. So, validating your model more rigorously can be key to a successful outcome.

Cross-validation, Machine Learning, Python
Data Validation for Machine Learning - Jan 31, 2020.

While the validation process cannot directly find what is wrong, the process can show us sometimes that there is a problem with the stability of the model.

Cross-validation, Data Science, Machine Learning
Common Machine Learning Obstacles - Sep 9, 2019.

In this blog, Seth DeLand of MathWorks discusses two of the most common obstacles relate to choosing the right classification model and eliminating data overfitting.

Cross-validation, Decision Trees, Logistic Regression, Machine Learning, MathWorks, Overfitting, SVM
Feature selection by random search in Python - Aug 6, 2019.

Feature selection is one of the most important tasks in machine learning. Learn how to use a simple random search in Python to get good results in less time.

Collinearity, Cross-validation, Feature Selection, Python, Random
7 Tips for Dealing With Small Data - Jul 29, 2019.

At my workplace, we produce a lot of functional prototypes for our clients. Because of this, I often need to make Small Data go a long way. In this article, I’ll share 7 tips to improve your results when prototyping with small datasets.

Cross-validation, Data Models, Ensemble Methods, Modeling, Tips, Transfer Learning
7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.

This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!

7 Steps, Classification, Cross-validation, Dimensionality Reduction, Feature Engineering, Feature Selection, Image Classification, K-nearest neighbors, Machine Learning, Modeling, Naive Bayes, numpy, Pandas, PCA, Python, scikit-learn, Transfer Learning
Careful! Looking at your model results too much can cause information leakage - May 24, 2019.

We all are aware of the issue of overfitting, which is essentially where the model you build replicates the training data results so perfectly its fitted to the training data and does not generalise to better represent the population the data comes to, with catastrophic results when you feed in new data and get very odd results.

Cross-validation, Modeling, Overfitting, Validation
How To Fine Tune Your Machine Learning Models To Improve Forecasting Accuracy - Jan 23, 2019.

We explain how to retrieve estimates of a model's performance using scoring metrics, before taking a look at finding and diagnosing the potential problems of a machine learning algorithm.

Cross-validation, Forecasting, Machine Learning, Overfitting, Time Series
5 Reasons Why You Should Use Cross-Validation in Your Data Science Projects - Oct 2, 2018.

In cross-validation, we do more than one split. We can do 3, 5, 10 or any K number of splits. Those splits called Folds, and there are many strategies we can create these folds with.

Cross-validation, Data Science, Machine Learning
Building Reliable Machine Learning Models with Cross-validation - Aug 9, 2018.

Cross-validation is frequently used to train, measure and finally select a machine learning model for a given dataset because it helps assess how the results of a model will generalize to an independent data set in practice.

Comet.ml, Cross-validation, Machine Learning, Modeling, scikit-learn
Training Sets, Test Sets, and 10-fold Cross-validation - Jan 9, 2018.

More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.

Cross-validation, Data Mining, Datasets, Machine Learning
How (and Why) to Create a Good Validation Set - Nov 24, 2017.

The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.

Cross-validation, Datasets, Rachel Thomas, Training Data, Validation
Visualizing Cross-validation Code - Sep 5, 2017.

Cross-validation helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Check out this post for a visualized explanation.

Cross-validation, Machine Learning, Python, scikit-learn
Understanding overfitting: an inaccurate meme in Machine Learning - Aug 23, 2017.

Applying cross-validation prevents overfitting is a popular meme, but is not actually true – it more of an urban legend. We examine what is true and how overfitting is different from overtraining.

Cross-validation, Machine Learning, Overfitting
Making Predictive Models Robust: Holdout vs Cross-Validation - Aug 11, 2017.

The validation step helps you find the best parameters for your predictive model and prevent overfitting. We examine pros and cons of two popular validation strategies: the hold-out strategy and k-fold.

Cross-validation, Dataiku, Overfitting
Understanding the Bias-Variance Tradeoff: An Overview - Aug 8, 2016.

A model's ability to minimize bias and minimize variance are often thought of as 2 opposing ends of a spectrum. Being able to understand these two types of errors are critical to diagnosing model results.

Bias, Cross-validation, Model Performance, Variance
How to Compute the Statistical Significance of Two Classifiers Performance Difference - Mar 30, 2016.

To determine whether a result is statistically significant, a researcher would have to calculate a p-value, which is the probability of observing an effect given that the null hypothesis is true. Here we are demonstrating how you can compute difference between two models using it.

Classifier, Cross-validation, Model Performance, Statistical Significance
11 Clever Methods of Overfitting and how to avoid them - Jan 2, 2015.

Overfitting is the bane of Data Science in the age of Big Data. John Langford reviews "clever" methods of overfitting, including traditional, parameter tweak, brittle measures, bad statistics, human-loop overfitting, and gives suggestions and directions for avoiding overfitting.

Cross-validation, John Langford, Overfitting

Cross-validation (22)

Latest Posts

Top Posts