- Dealing with Data Leakage - Oct 8, 2021.
Target leakage and data leakage represent challenging problems in machine learning. Be prepared to recognize and avoid these potentially messy problems.
- Full cross-validation and generating learning curves for time-series models - Jul 23, 2021.
Standard cross-validation on time series data is not possible because the data model is sequential, which does not lend well to splitting the data into statistically useful training and validation sets. However, a new approach called Reconstructive Cross-validation may pave the way toward performing this type of important analysis for predictive models with temporal datasets.
- Can you trust AutoML? - Dec 23, 2020.
Automated Machine Learning, or AutoML, tries hundreds or even thousands of different ML pipelines to deliver models that often beat the experts and win competitions. But, is this the ultimate goal? Can a model developed with this approach be trusted without guarantees of predictive performance? The issue of overfitting must be closely considered because these methods can lead to overestimation -- and the Winner's Curse.
- 20 Core Data Science Concepts for Beginners - Dec 8, 2020.
With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.
- Key Machine Learning Technique: Nested Cross-Validation, Why and How, with Python code - Oct 5, 2020.
Selecting the best performing machine learning model with optimal hyperparameters can sometimes still end up with a poorer performance once in production. This phenomenon might be the result of tuning the model and evaluating its performance on the same sets of train and test data. So, validating your model more rigorously can be key to a successful outcome.
- Data Validation for Machine Learning - Jan 31, 2020.
While the validation process cannot directly find what is wrong, the process can show us sometimes that there is a problem with the stability of the model.
- Common Machine Learning Obstacles - Sep 9, 2019.
In this blog, Seth DeLand of MathWorks discusses two of the most common obstacles relate to choosing the right classification model and eliminating data overfitting.
- Feature selection by random search in Python - Aug 6, 2019.
Feature selection is one of the most important tasks in machine learning. Learn how to use a simple random search in Python to get good results in less time.
- 7 Tips for Dealing With Small Data - Jul 29, 2019.
At my workplace, we produce a lot of functional prototypes for our clients. Because of this, I often need to make Small Data go a long way. In this article, I’ll share 7 tips to improve your results when prototyping with small datasets.
- 7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
- Careful! Looking at your model results too much can cause information leakage - May 24, 2019.
We all are aware of the issue of overfitting, which is essentially where the model you build replicates the training data results so perfectly its fitted to the training data and does not generalise to better represent the population the data comes to, with catastrophic results when you feed in new data and get very odd results.
- What my first Silver Medal taught me about Text Classification and Kaggle in general? - May 13, 2019.
A first-hand account of ideas tried by a competitor at the recent kaggle competition 'Quora Insincere questions classification', with a brief summary of some of the other winning solutions.
- How To Fine Tune Your Machine Learning Models To Improve Forecasting Accuracy - Jan 23, 2019.
We explain how to retrieve estimates of a model's performance using scoring metrics, before taking a look at finding and diagnosing the potential problems of a machine learning algorithm.
- 5 Reasons Why You Should Use Cross-Validation in Your Data Science Projects - Oct 2, 2018.
In cross-validation, we do more than one split. We can do 3, 5, 10 or any K number of splits. Those splits called Folds, and there are many strategies we can create these folds with.
- Building Reliable Machine Learning Models with Cross-validation - Aug 9, 2018.
Cross-validation is frequently used to train, measure and finally select a machine learning model for a given dataset because it helps assess how the results of a model will generalize to an independent data set in practice.
- Training Sets, Test Sets, and 10-fold Cross-validation - Jan 9, 2018.
More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.
- How (and Why) to Create a Good Validation Set - Nov 24, 2017.
The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.
- Top KDnuggets tweets, Sep 06-12: Visualizing Cross-validation Code; Intro to #Blockchain and #BigData - Sep 13, 2017.
Also: WTF #Python - A collection of interesting and tricky Python examples; Thoughts after taking @AndrewYNg #Deeplearning #ai course; Another #Keras Tutorial For #NeuralNetwork Beginners.
- Visualizing Cross-validation Code - Sep 5, 2017.
Cross-validation helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Check out this post for a visualized explanation.
- Understanding overfitting: an inaccurate meme in Machine Learning - Aug 23, 2017.
Applying cross-validation prevents overfitting is a popular meme, but is not actually true – it more of an urban legend. We examine what is true and how overfitting is different from overtraining.
- Making Predictive Models Robust: Holdout vs Cross-Validation - Aug 11, 2017.
The validation step helps you find the best parameters for your predictive model and prevent overfitting. We examine pros and cons of two popular validation strategies: the hold-out strategy and k-fold.
- Understanding the Bias-Variance Tradeoff: An Overview - Aug 8, 2016.
A model's ability to minimize bias and minimize variance are often thought of as 2 opposing ends of a spectrum. Being able to understand these two types of errors are critical to diagnosing model results.
- How to Compute the Statistical Significance of Two Classifiers Performance Difference - Mar 30, 2016.
To determine whether a result is statistically significant, a researcher would have to calculate a p-value, which is the probability of observing an effect given that the null hypothesis is true. Here we are demonstrating how you can compute difference between two models using it.
- 3 Things About Data Science You Won’t Find In Books - May 11, 2015.
There are many courses on Data Science that teach the latest logistic regression or deep learning methods, but what happens in practice? Data Scientist shares his main practical insights that are not taught in universities.
Pages: 1 2
- 11 Clever Methods of Overfitting and how to avoid them - Jan 2, 2015.
Overfitting is the bane of Data Science in the age of Big Data. John Langford reviews "clever" methods of overfitting, including traditional, parameter tweak, brittle measures, bad statistics, human-loop overfitting, and gives suggestions and directions for avoiding overfitting.
- Top KDnuggets tweets, Apr 18-20 - Apr 22, 2014.
Cross-validation pitfalls for regression/classification and how to avoid them; Data Workflows for Machine Learning ; Apache Spark, the hot new trend in Big Data ; Visual Analysis Best Practices - download a free guidebook from Tableau.