- Data Validation in Machine Learning is Imperative, Not Optional - May 24, 2021.
Before we reach model training in the pipeline, there are various components like data ingestion, data versioning, data validation, and data pre-processing that need to be executed. In this article, we will discuss data validation, why it is important, its challenges, and more.
- Data Validation and Data Verification – From Dictionary to Machine Learning - Mar 16, 2021.
In this article, we will understand the difference between data verification and data validation, two terms which are often used interchangeably when we talk about data quality. However, these two terms are distinct.
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
- Adversarial Validation Overview - Feb 13, 2020.
Learn how to implement adversarial validation that builds a classifier to determine if your data is from the training or testing sets. If you can do this, then your data has issues, and your adversarial validation model can help you diagnose the problem.
- Reproducibility, Replicability, and Data Science - Nov 19, 2019.
As cornerstones of scientific processes, reproducibility and replicability ensure results can be verified and trusted. These two concepts are also crucial in data science, and as a data scientist, you must follow the same rigor and standards in your projects.
- Careful! Looking at your model results too much can cause information leakage - May 24, 2019.
We all are aware of the issue of overfitting, which is essentially where the model you build replicates the training data results so perfectly its fitted to the training data and does not generalise to better represent the population the data comes to, with catastrophic results when you feed in new data and get very odd results.
- How to do Machine Learning Efficiently - Mar 13, 2018.
I now believe that there is an art, or craftsmanship, to structuring machine learning work and none of the math heavy books I tended to binge on seem to mention this.
- 5 Things to Know About Machine Learning - Mar 7, 2018.
This post will point out 5 thing to know about machine learning, 5 things which you may not know, may not have been aware of, or may have once known and now forgotten.
- Top KDnuggets tweets, Nov 22-28: Reinforcement Learning: An Introduction by Sutton and Barto – Complete Second Draft - Nov 29, 2017.
Also #DeepLearning Specialization by Andrew Ng - 21 Lessons Learned; How (and Why) to Create a Good Validation Set; Predicting Cryptocurrency Prices With #DeepLearning
- How (and Why) to Create a Good Validation Set - Nov 24, 2017.
The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.
- Adversarial Validation, Explained - Oct 7, 2016.
This post proposes and outlines adversarial validation, a method for selecting training examples most similar to test examples and using them as a validation set, and provides a practical scenario for its usefulness.
Pages: 1 2
- NPD: Head of Global Validation & Input - Mar 20, 2015.
Initial focus on standardizing and developing efficient processes to validate data from input through client delivery.