- Dealing with Data Leakage - Oct 8, 2021.
Target leakage and data leakage represent challenging problems in machine learning. Be prepared to recognize and avoid these potentially messy problems.
- Budgeting For Your AI Training Data: Consider These 3 Factors - May 26, 2021.
Before you even plan to procure the data, one of the most important considerations in determining how much you should spend on your AI training data. In this article, we will give you insights to develop an effective budget for AI training data.
- Continuous Training for Machine Learning – a Framework for a Successful Strategy - Apr 14, 2021.
A basic appreciation by anyone who builds machine learning models is that the model is not useful without useful data. This doesn't change after a model is deployed to production. Effectively monitoring and retraining models with updated data is key to maintaining valuable ML solutions, and can be accomplished with effective approaches to production-level continuous training that is guided by the data.
- 5 Essential Papers on AI Training Data - Jun 4, 2020.
Data pre-processing is not only the largest time sink for most Data Scientists, but it is also the most crucial aspect of the work. Learn more about training data and data processing tasks from 5 leading academic papers.
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
- Achieving Accuracy with your Training Dataset - Mar 5, 2020.
How do we make sure our training data is more accurate than the rest? Partners like Supahands eliminate the headache that comes with labeling work by providing end-to-end managed labeling solutions, completed by a fully managed workforce that is trained to work on your model specifics.
- Hand labeling is the past. The future is #NoLabel AI - Feb 19, 2020.
Data labeling is so hot right now… but could this rapidly emerging market face disruption from a small team at Stanford and the Snorkel open source project, which enables highly efficient programmatic labeling that is 10 to 1,000x as efficient as hand labeling?
- Why are Machine Learning Projects so Hard to Manage? - Feb 3, 2020.
What makes deploying a machine learning project so difficult? Is it the expectations? The people? The tech? There are common threads to these challenges, and best practices exist to deal with them.
- The Ultimate Guide to Model Retraining - Dec 16, 2019.
Once you have deployed your machine learning model into production, differences in real-world data will result in model drift. So, retraining and redeploying will likely be required. In other words, deployment should be treated as a continuous process. This guide defines model drift and how to identify it, and includes approaches to enable model training.
- Generalization in Neural Networks - Nov 18, 2019.
When training a neural network in deep learning, its performance on processing new data is key. Improving the model's ability to generalize relies on preventing overfitting using these important methods.
- 5 Fundamental AI Principles - Oct 3, 2019.
While AI may appear magical at times, these five principles will help guide you to avoid pitfalls when leveraging this tech.
- 6 Tips for Building a Training Data Strategy for Machine Learning - Sep 2, 2019.
Without a well-defined approach for collecting and structuring training data, launching an AI initiative becomes an uphill battle. These six recommendations will help you craft a successful strategy.
- 6 Key Concepts in Andrew Ng’s “Machine Learning Yearning” - Aug 12, 2019.
If you are diving into AI and machine learning, Andrew Ng's book is a great place to start. Learn about six important concepts covered to better understand how to use these tools from one of the field's best practitioners and teachers.
- Overview of Different Approaches to Deploying Machine Learning Models in Production - Jun 12, 2019.
Learn the different methods for putting machine learning models into production, and to determine which method is best for which use case.
- How the Lottery Ticket Hypothesis is Challenging Everything we Knew About Training Neural Networks - May 30, 2019.
The training of machine learning models is often compared to winning the lottery by buying every possible ticket. But if we know how winning the lottery looks like, couldn’t we be smarter about selecting the tickets?
- Acquiring Labeled Data to Train Your Models at Low Costs - Feb 27, 2019.
We discuss groundbreaking and unique methods to acquire labeled data at low cost, including 3rd-Party Plug-and-Play AI Model, Zero-Shot Learning, and Restructuring the Existing Data Set.
- What to do when your training and testing data come from different distributions - Jan 4, 2019.
However, sometimes only a limited amount of data from the target distribution can be collected. It may not be sufficient to build the needed train/dev/test sets. What to do in such a case? Let us discuss some ideas!
- Key Takeaways from AI Conference SF, Day 2: AI and Security, Adversarial Examples, Innovation - Oct 30, 2018.
Highlights and key takeaways from selected keynote sessions on day 2 of AI Conference San Francisco 2018.
- KDnuggets™ News 18:n23, Jun 13: Did Python declare victory over R?; Master the Netflix Interview; Deep Learning Projects DIY Style - Jun 13, 2018.
Also: Command Line Tricks For Data Scientists; How (dis)similar are my train and test data?; 5 Machine Learning Projects You Should Not Overlook, June 2018; Introduction to Game Theory; Human Interpretable Machine Learning
- Why you need to improve your training data, and how to do it - Jun 11, 2018.
This article examines the way you need to improve your training data and how it can be accomplished, including speech commands, choosing the right data, picking a model fast and more.
Pages: 1 2
- How (dis)similar are my train and test data? - Jun 7, 2018.
This articles examines a scenario where your machine learning model can fail.
- How to Organize Data Labeling for Machine Learning: Approaches and Tools - May 16, 2018.
The main challenge for a data science team is to decide who will be responsible for labeling, estimate how much time it will take, and what tools are better to use.
Pages: 1 2
- Learning Curves for Machine Learning - Jan 17, 2018.
But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.
Pages: 1 2
- How (and Why) to Create a Good Validation Set - Nov 24, 2017.
The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.
- How to squeeze the most from your training data - Jul 27, 2017.
In many cases, getting enough well-labelled training data is a huge hurdle for developing accurate prediction systems. Here is an innovative approach which uses SVM to get the most from training data.
- 7 Ways to Get High-Quality Labeled Training Data at Low Cost - Jun 13, 2017.
Having labeled training data is needed for machine learning, but getting such data is not simple or cheap. We review 7 approaches including repurposing, harvesting free sources, retrain models on progressively higher quality data, and more.
- Do We Need More Training Data or More Complex Models? - Mar 23, 2015.
Do we need more training data? Which models will suffer from performance saturation as data grows large? Do we need larger models or more complicated models, and what is the difference?