Getting Deep Learning working in the wild: A Data-Centric Course
Data-centric learning resources are somewhat scattered today, and that’s why we developed a new Data Centric Deep Learning course on the co:rise education platform. It is an introduction to a set of approaches and best practices, for people who are trying to do deep learning in the wild.
By Andrew Maas, Deep Learning Leader at Apple and Lecturer at Stanford, and Mike Wu, PhD Stanford
Have you been excited by recent high profile deep learning successes, but not sure how to practically keep deep learning models working for your project? We’ve developed a distilled set of materials on data-centric deep learning approaches – which are often among the most impactful tools to get deep learning models working on new tasks.
Data-centric deep learning is a relatively new area and a broad term. For us, being data-centric means taking a different perspective on deep learning that’s centered around building and maintaining the datasets which define and evaluate deep learning models. The real-world applications and successes of deep learning systems are growing by the day. Modern deep learning image and text categorization performance is at parity with human experts, and deep learning models process complex transactional or financial data to make inferences about future behavior.
Recent years have seen a proliferation of deep learning model architectures to meet the needs of different tasks. In parallel to these model-centric developments, experience building and deploying deep learning systems has repeatedly demonstrated the need for a data-centric focus on issues like obtaining large amounts of high quality training data, efficient data labeling strategies, and monitoring/retraining deployed systems to adjust to changes in the world. From building speech recognition systems, image classifiers, and consulting on enterprise applications of machine learning, we have found building good systems always requires repeated experiments, high quality data, and iterative data improvement.
In real world applications, data is continuously evolving and not something that you just use to train a model once. Data is also critical to evaluating systems’ readiness for real-world usage. As data scientists and ML engineers, we need to make sure that things work well on average, which is what we focused on for the last ten years in deep learning. As the real-world usage of ML systems has grown in recent years, it’s become clear we also need to ensure good performance across data subgroups. These are data-centric questions. Data-centric techniques offer practical approaches to ensure our ML systems perform well over time when deployed, that our training set covers expected production settings, and that we evaluate and improve models in targeted ways.
Organizations are deploying deep learning more widely, examples include self-driving cars or deep learning recommendation systems, and there are many applications yet to be built. As a practitioner, you’ll develop best practices over time, and these standards will differ depending on your organization/project needs. Consequently, data-centric learning resources are somewhat scattered today, and that’s why we developed a new Data Centric Deep Learning course on the co:rise education platform. It is an introduction to a set of approaches and best practices, for people who are trying to do deep learning in the wild. And, instead of learning them through trial and error over years on the job, we have condensed them into four project-driven weeks.
In the course, you’ll learn through a mix of live classes and hands-on projects; the content is inspired by industry, and designed to hone skills you can use right now in your data scientist or ML engineer role. One of the projects we are excited about is students building continuous annotation pipelines. As new data comes into your deployed product, it'll be continuously annotated and used to retrain your model. Your model will then have some weaknesses that you will identify and use to collect new data for your model and so on, in a cycle. Typically, you study data annotation and model training in isolation. Tackling these challenges together in a cycle is the most practical way to learn how to make deep learning approaches work well.
As the usage of ML systems continues to expand, ensuring high quality data and accurate monitoring of production systems can help protect against accidental bias or harmful effects of ML systems. This data-centric course focuses on practical approaches to help people in any industry working with ML. We hope to see you when class Starts May 30!
co:rise is on a mission to upskill the world's workforce. We partner with top industry leaders to create courses in high demand and rapidly changing fields like ML, data science, data engineering, and Web3. We leverage community and technology to achieve outcomes at scale. Learn more: https://corise.com