- 7 of The Coolest Machine Learning Topics of 2021 at ODSC West - Nov 3, 2021.
At our upcoming event this November 16th-18th in San Francisco, ODSC West 2021 will feature a plethora of talks, workshops, and training sessions on machine learning topics, deep learning, NLP, MLOps, and so on. You can register now for 20% off all ticket types, or register for a free AI Expo Pass to see what some big names in AI are doing now.
Machine Learning, ODSC
- Visual Scoring Techniques for Classification Models - Nov 3, 2021.
Read this article assessing a model performance in a broader context.
Classification, Knime, Low-Code, Machine Learning, Metrics, Visualization
Design Patterns for Machine Learning Pipelines - Nov 2, 2021.
ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.
Data Preprocessing, ETL, Machine Learning, Pipeline
Machine Learning Model Development and Model Operations: Principles and Practices - Oct 27, 2021.
The ML model management and the delivery of highly performing model is as important as the initial build of the model by choosing right dataset. The concepts around model retraining, model versioning, model deployment and model monitoring are the basis for machine learning operations (MLOps) that helps the data science teams deliver highly performing models.
Algorithms, Deployment, Feature Engineering, Machine Learning, MLOps
- Getting Started with PyTorch Lightning - Oct 26, 2021.
As a library designed for production research, PyTorch Lightning streamlines hardware support and distributed training as well, and we’ll show how easy it is to move training to a GPU toward the end.
Deep Learning, Machine Learning, Python, PyTorch, PyTorch Lightning
- Guide To Finding The Right Predictive Maintenance Machine Learning Techniques - Oct 25, 2021.
What happens to a life so dependent on machines, when that particular machine breaks down? This is precisely why there’s a dire need for predictive maintenance with machine learning.
Machine Learning, Maintenance, Monitoring
Introduction to AutoEncoder and Variational AutoEncoder (VAE) - Oct 22, 2021.
Autoencoders and their variants are interesting and powerful artificial neural networks used in unsupervised learning scenarios. Learn how autoencoders perform in their different approaches and how to implement with Keras on the instructional data set of the MNIST digits.
Autoencoder, Deep Learning, Machine Learning, Python
- KDnuggets™ News 21:n40, Oct 20: The 20 Python Packages You Need For Machine Learning and Data Science; Ace Data Science Interviews with Portfolio Projects - Oct 20, 2021.
The 20 Python Packages You Need For Machine Learning and Data Science; How to Ace Data Science Interview by Working on Portfolio Projects; Deploying Your First Machine Learning API; Real Time Image Segmentation Using 5 Lines of Code; What is Clustering and How Does it Work?
Clustering, Computer Vision, Data Science, Image Recognition, Interview, Machine Learning, Portfolio, Python
- Real Time Image Segmentation Using 5 Lines of Code - Oct 18, 2021.
PixelLib Library is a library created to allow easy integration of object segmentation in images and videos using few lines of python code. PixelLib now provides support for PyTorch backend to perform faster, more accurate segmentation and extraction of objects in images and videos using PointRend segmentation architecture.
Computer Vision, Image Processing, Machine Learning, Python, Segmentation
- Serving ML Models in Production: Common Patterns - Oct 18, 2021.
Over the past couple years, we've seen 4 common patterns of machine learning in production: pipeline, ensemble, business logic, and online learning. In the ML serving space, implementing these patterns typically involves a tradeoff between ease of development and production readiness. Ray Serve was built to support these patterns by being both easy to develop and production ready.
FastAPI, Machine Learning, Production, Python, Ray
- How to calculate confidence intervals for performance metrics in Machine Learning using an automatic bootstrap method - Oct 15, 2021.
Are your model performance measurements very precise due to a “large” test set, or very uncertain due to a “small” or imbalanced test set?
Machine Learning, Metrics, Statistics
Deploying Your First Machine Learning API - Oct 14, 2021.
Effortless way to develop and deploy your machine learning API using FastAPI and Deta.
API, Deployment, FastAPI, Machine Learning, Python, spaCy

The 20 Python Packages You Need For Machine Learning and Data Science - Oct 14, 2021.
Do you do Python? Do you do data science and machine learning? Then, you need to do these crucial Python libraries that enable nearly all you will want to do.
Data Science, Keras, Machine Learning, Matplotlib, numpy, Pandas, Plotly, Python, PyTorch, scikit-learn, TensorFlow
- Dealing with Data Leakage - Oct 8, 2021.
Target leakage and data leakage represent challenging problems in machine learning. Be prepared to recognize and avoid these potentially messy problems.
Cross-validation, Data Science, Datasets, Machine Learning, Modeling, Training Data
- Building and Operationalizing Machine Learning Models: Three tips for success - Oct 7, 2021.
With more enterprises implementing machine learning to improve revenue and operations, properly operationalizing the ML lifecycle in a holistic way is crucial for data teams to make their projects efficient and effective.
Deployment, Machine Learning, Machine Learning Engineer, Tips
20 Machine Learning Projects That Will Get You Hired - Sep 22, 2021.
If you want to break into the machine learning and data science job market, then you will need to demonstrate the proficiency of your skills, especially if you are self-taught through online courses and bootcamps. A project portfolio is a great way to practice your new craft and offer convincing evidence that an employee should hire you over the competition.
Career, Machine Learning, Project

Nine Tools I Wish I Mastered Before My PhD in Machine Learning - Sep 22, 2021.
Whether you are building a start up or making scientific breakthroughs these tools will bring your ML pipeline to the next level.
AI, Data Science, Data Science Tools, Machine Learning, Programming
- KDnuggets™ News 21:n36, Sep 22: The Machine & Deep Learning Compendium Open Book; Easy SQL in Native Python - Sep 22, 2021.
The Machine & Deep Learning Compendium Open Book; Easy SQL in Native Python; Introduction to Automated Machine Learning; How to be a Data Scientist without a STEM degree; What Is The Real Difference Between Data Engineers and Data Scientists?
Automated Machine Learning, AutoML, Books, Data Engineer, Data Scientist, Machine Learning, Python, SQL
How to Find Weaknesses in your Machine Learning Models - Sep 20, 2021.
FreaAI: a new method from researchers at IBM.
Interpretability, Machine Learning, Modeling, Statistics
- Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV - Sep 16, 2021.
This article documents the authors' experience building their custom MLOps approach.
GitHub, Machine Learning, MLOps, Pipeline, Python, Workflow
The Machine & Deep Learning Compendium Open Book - Sep 16, 2021.
After years in the making, this extensive and comprehensive ebook resource is now available and open for data scientists and ML engineers. Learn from and contribute to this tome of valuable information to support all your work in data science from engineering to strategy to management.
Deep Learning, ebook, GitHub, Machine Learning, Open Source
- Introduction to Automated Machine Learning - Sep 15, 2021.
AutoML enables developers with limited ML expertise (and coding experience) to train high-quality models specific to their business needs. For this article, we will focus on AutoML systems which cater to everyday business and technology applications.
Automated Machine Learning, AutoML, Machine Learning, Python
Top 18 Low-Code and No-Code Machine Learning Platforms - Sep 8, 2021.
Machine learning becomes more accessible to companies and individuals when there is less coding involved. Especially if you are just starting your path in ML, then check out these low-code and no-code platforms to help expedite your capabilities in learning and applying AI.
AutoML, Data Science Platforms, Low-Code, Machine Learning, No-Code
- Math 2.0: The Fundamental Importance of Machine Learning - Sep 8, 2021.
Machine learning is not just another way to program computers; it represents a fundamental shift in the way we understand the world. It is Math 2.0.
AI, Machine Learning, Mathematics
- KDnuggets™ News 21:n34, Sep 8: Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained - Sep 8, 2021.
Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained; Data Science Cheat Sheet 2.0; 6 Cool Python Libraries That I Came Across Recently; Best Resources to Learn Natural Language Processing in 2021
AI, Cheat Sheet, Data Science, Excel, Hypothesis Testing, Machine Learning, Python, Statistics
- How Machine Learning Leverages Linear Algebra to Solve Data Problems - Sep 7, 2021.
Why you should learn the fundamentals of linear algebra.
Data Science, Linear Algebra, Machine Learning, Mathematics
- Fast AutoML with FLAML + Ray Tune - Sep 6, 2021.
Microsoft Researchers have developed FLAML (Fast Lightweight AutoML) which can now utilize Ray Tune for distributed hyperparameter tuning to scale up FLAML’s resource-efficient & easily parallelizable algorithms across a cluster.
Automated Machine Learning, AutoML, Hyperparameter, Machine Learning, Microsoft, Python, Ray
- 6 Cool Python Libraries That I Came Across Recently - Sep 3, 2021.
Check out these awesome Python libraries for Machine Learning.
Data Science, Machine Learning, Python
- How to solve machine learning problems in the real world - Sep 2, 2021.
Becoming a machine learning engineer pro is your goal? Sure, online ML courses and Kaggle-style competitions are great resources to learn the basics. However, the daily job of a ML engineer requires an additional layer of skills that you won’t master through these approaches.
Advice, Business, Data Quality, Machine Learning, SQL, Tips, XGBoost
- How is Machine Learning Beneficial in Mobile App Development? - Sep 1, 2021.
Mobile app developers have a lot to gain by implementing AI & Machine Learning from the revolutionary changes that these disruptive technologies can offer. This is due to AI and ML's potential to strengthen mobile applications, providing for smoother user experiences capable of leveraging powerful features.
App, Development, Machine Learning, Mobile
- Automated Data Labeling with Machine Learning - Aug 26, 2021.
Labeling training data is the one step in the data pipeline that has resisted automation. It’s time to change that.
Data Labeling, Data Preparation, Machine Learning
Learning Data Science and Machine Learning: First Steps After The Roadmap - Aug 24, 2021.
Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.
Data Science, Machine Learning, Mathematics, Python, Roadmap, Statistics
- Enhancing Machine Learning Personalization through Variety - Aug 19, 2021.
Personalization drives growth and is a touchstone of good customer experience. Personalization driven through machine learning can enable companies to improve this experience while improving ROI for marketing campaigns. However, challenges exist in these techniques for when personalization makes sense and how and when specific options are recommended.
Machine Learning, Personalization, Recommender Systems
- Model Drift in Machine Learning – How To Handle It In Big Data - Aug 17, 2021.
Rendezvous Architecture helps you run and choose outputs from a Champion model and many Challenger models running in parallel without many overheads. The original approach works well for smaller data sets, so how can this idea adapt to big data pipelines?
Big Data, Data Engineering, Data Preparation, Machine Learning, Model Drift
- Agile Data Labeling: What it is and why you need it - Aug 16, 2021.
The notion of Agile in software development has made waves across industries with its revolution for productivity. Can the same benefits be applied to the often arduous task of annotating data sets for machine learning?
Agile, Data Labeling, Machine Learning, Tesla
- Introduction to Statistical Learning Second Edition - Aug 13, 2021.
The second edition of the classic "An Introduction to Statistical Learning, with Applications in R" was published very recently, and is now freely-available via PDF on the book's website.
Books, Data Science, Machine Learning, R, Statistical Learning, Statistics
- MLOps And Machine Learning Roadmap - Aug 12, 2021.
A 16–20 week roadmap to review machine learning and learn MLOps.
Courses, DataRobot, Deployment, DevOps, Kubeflow, Kubernetes, Machine Learning, Microsoft Azure, MLOps
- How to Detect and Overcome Model Drift in MLOps - Aug 12, 2021.
This article has a look at model drift, and how to detect and overcome it in production MLOps.
Machine Learning, MLOps, Production
- 2021 State of Production Machine Learning Survey - Aug 11, 2021.
We invite you to take the 2021 State of Production Machine Learning survey and help shed light on the latest trends in the adoption of machine learning (ML) in the industry.
Anyscale, Machine Learning, Production, Survey
- Visualizing Bias-Variance - Aug 10, 2021.
In this article, we'll explore some different perspectives of what the bias-variance trade-off really means with the help of visualizations.
Bias, Machine Learning, Variance, Visualization
- Artificial Intelligence vs Machine Learning in Cybersecurity - Aug 5, 2021.
Artificial Intelligence and Machine Learning are the next-gen technology used in various fields. With the rise in online threats, it has become essential to include these technologies in cybersecurity. In this post, we will know what roles do AI and ML play in cybersecurity.
AI, Cybersecurity, Machine Learning, Security
- Mastering Clustering with a Segmentation Problem - Aug 3, 2021.
The one stop shop for implementing the most widely used models in Python for unsupervised clustering.
Clustering, DBSCAN, K-means, Machine Learning, Segmentation, Unsupervised Learning
- 30 Most Asked Machine Learning Questions Answered - Aug 3, 2021.
There is always a lot to learn in machine learning. Whether you are new to the field or a seasoned practitioner and ready for a refresher, understanding these key concepts will keep your skills honed in the right direction.
Beginners, Interview Questions, Machine Learning, Regression, scikit-learn
- 10 Machine Learning Model Training Mistakes - Jul 30, 2021.
These common ML model training mistakes are easy to overlook but costly to redeem.
Machine Learning, Modeling, Training
- Building Machine Learning Pipelines using Snowflake and Dask - Jul 28, 2021.
In this post, I want to share some of the tools that I have been exploring recently and show you how I use them and how they helped improve the efficiency of my workflow. The two I will talk about in particular are Snowflake and Dask. Two very different tools but ones that complement each other well especially as part of the ML Lifecycle.
Dask, Machine Learning, Pipeline, Snowflake
- Machine Learning Skills – Update Yours This Summer - Jul 27, 2021.
The process of mastering new knowledge often requires multiple passes to ensure the information is deeply understood. If you already began your journey into machine learning and data science, then you are likely ready for a refresher on topics you previously covered. This eight-week self-learning path will help you recapture the foundations and prepare you for future success in applying these skills.
Computer Vision, Data Science Skills, Deep Learning, Machine Learning, Skills
- ColabCode: Deploying Machine Learning Models From Google Colab - Jul 22, 2021.
New to ColabCode? Learn how to use it to start a VS Code Server, Jupyter Lab, or FastAPI.
Deployment, FastAPI, Google Colab, Machine Learning, Python
Design patterns in machine learning - Jul 21, 2021.
Can we abstract best practices to real design patterns yet?
Design, Machine Learning, Programming
- When to Retrain an Machine Learning Model? Run these 5 checks to decide on the schedule - Jul 20, 2021.
Machine learning models degrade with time, and need to be regularly updated. In the article, we suggest how to approach retraining and plan for it in advance.
Data Science, Deployment, Machine Learning, MLOps
- How Much Memory is your Machine Learning Code Consuming? - Jul 19, 2021.
Learn how to quickly check the memory footprint of your machine learning function/module with one line of command. Generate a nice report too.
Machine Learning, Programming, Python

Advice for Learning Data Science from Google’s Director of Research - Jul 19, 2021.
Surfing the professional career wave in data science is a hot prospect for many looking to get their start in the world. The digital revolution continues to create many exciting new opportunities. But, jumping in too fast without fully establishing your foundational skills can be detrimental to your success, as is suggested by this advice for data science newbies from Peter Norvig, the Director of Research at Google.
Advice, Beginners, Data Science, Data Science Education, Machine Learning, Peter Norvig
- How to Create Unbiased Machine Learning Models - Jul 16, 2021.
In this post we discuss the concepts of bias and fairness in the Machine Learning world, and show how ML biases often reflect existing biases in society. Additionally, We discuss various methods for testing and enforcing fairness in ML models.
AI, Bias, Ethics, Machine Learning, Trust
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 5 - Jul 16, 2021.
Training efficient deep learning models with any software tool is nothing without an infrastructure of robust and performant compute power. Here, current software and hardware ecosystems are reviewed that you might consider in your development when the highest performance possible is needed.
Deep Learning, Efficiency, Google, Hardware, Machine Learning, NVIDIA, PyTorch, Scalability, TensorFlow
- Pushing No-Code Machine Learning to the Edge - Jul 16, 2021.
Discover the power of no-code machine learning, and what it can accomplish when pushed to edge devices.
Cloud Computing, Edge Analytics, Machine Learning, No-Code
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 4 - Jul 9, 2021.
With the right software, hardware, and techniques at your fingertips, your capability to effectively develop high-performing models now hinges on leveraging automation to expedite the experimental process and building with the most efficient model architectures for your data.
Attention, Convolution, Deep Learning, Efficiency, Hyperparameter, Machine Learning, Scalability
- MLOps is an Engineering Discipline: A Beginner’s Overview - Jul 8, 2021.
MLOps = ML + DEV + OPS. MLOps is the idea of combining the long-established practice of DevOps with the emerging field of Machine Learning.
Data Engineering, Deployment, Machine Learning, MLOps, Modeling
- Predict Customer Churn (the right way) using PyCaret - Jul 5, 2021.
A step-by-step guide on how to predict customer churn the right way using PyCaret that actually optimizes the business objective and improves ROI.
Churn, Machine Learning, PyCaret, Python
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 3 - Jul 2, 2021.
Now that you are ready to efficiently build advanced deep learning models with the right software and hardware tools, the techniques involved in implementing such efforts must be explored to improve model quality and obtain the performance that your organization desires.
Compression, Deep Learning, Efficiency, Machine Learning, Scalability
- From Scratch: Permutation Feature Importance for ML Interpretability - Jun 30, 2021.
Use permutation feature importance to discover which features in your dataset are useful for prediction — implemented from scratch in Python.
Feature Selection, Interpretability, Machine Learning, Python
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 2 - Jun 25, 2021.
As your organization begins to consider building advanced deep learning models with efficiency in mind to improve the power delivered through your solutions, the software and hardware tools required for these implementations are foundational to achieving high-performance.
Deep Learning, Efficiency, Machine Learning, Scalability
- In-Warehouse Machine Learning and the Modern Data Science Stack - Jun 24, 2021.
As your organization matures its data science portfolio and capabilities, establishing a modern data stack is vital to enabling such growth. Here, we overview various in-data warehouse machine learning services, and discuss each of their benefits and requirements.
Amazon Redshift, Analytics, BigQuery, Cloud, Data Science, Data Warehouse, Machine Learning, Modern Data Stack
- Create and Deploy Dashboards using Voila and Saturn Cloud - Jun 23, 2021.
Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.
Analytics, Cloud, Dashboard, Data Science, Machine Learning, Python
- Amazing Low-Code Machine Learning Capabilities with New Ludwig Update - Jun 22, 2021.
Integration with Ray, MLflow and TabNet are among the top features of this release.
Low-Code, Machine Learning, Open Source, Uber
- High Performance Deep Learning, Part 1 - Jun 18, 2021.
Advancing deep learning techniques continue to demonstrate incredible potential to deliver exciting new AI-enhanced software and systems. But, training the most powerful models is expensive--financially, computationally, and environmentally. Increasing the efficiency of such models will have profound impacts in many ways, so developing future models with this intension in mind will only help to further expand the reach, applicability, and value of what deep learning has to offer.
Deep Learning, Efficiency, History, Machine Learning
- An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM) - Jun 16, 2021.
Understanding why your AI-based models make the decisions they do is crucial for deploying practical solutions in the real-world. Here, we review some techniques in the field of Explainable AI (XAI), why explainability is important, example models of explainable AI using LIME and SHAP, and demonstrate how Explainable Boosting Machines (EBMs) can make explainability even easier.
AI, Deep Learning, Explainability, Gradient Boosting, Interpretability, LIME, Machine Learning, SHAP
- Feature Selection – All You Ever Wanted To Know - Jun 10, 2021.
Although your data set may contain a lot of information about many different features, selecting only the "best" of these to be considered by a machine learning model can mean the difference between a model that performs well--with better performance, higher accuracy, and more computational efficiency--and one that falls flat. The process of feature selection guides you toward working with only the data that may be the most meaningful, and to accomplish this, a variety of feature selection types, methodologies, and techniques exist for you to explore.
Feature Engineering, Feature Selection, Machine Learning
- The only Jupyter Notebooks extension you truly need - Jun 8, 2021.
Now you don’t need to restart the kernel after editing the code in your custom imports.
Deployment, Jupyter, Machine Learning, Python
- 5 Data Science Open-source Projects You Should Consider Contributing to - Jun 7, 2021.
As you prepare to interview for a position in data science or are looking to jump to the next level, now is the time to enhance your skills and your resume with by working on rea, open-source projects. Here, we suggest a great selection of projects you can contribute to and help build something awesome, so, all you need to do choose one and tackle it head on.
Caffe, Data Science, Data Science Skills, GitHub, Google, Machine Learning, Open Source
- PyCaret 101: An introduction for beginners - Jun 7, 2021.
This article is a great overview of how to get started with PyCaret for all your machine learning projects.
Machine Learning, PyCaret, Python
- Machine Learning Model Interpretation - Jun 2, 2021.
Read this overview of using Skater to build machine learning visualizations.
Explainability, Interpretability, Machine Learning, Python

How I Doubled My Income with Data Science and Machine Learning - Jun 1, 2021.
Many career opportunities exist in the ever-expanding domain of data. Finding your place -- and finding your salary -- is largely up to your dedication, focus, and drive to learn. If you are an aspiring Data Scientist or have already started your professional journey, there are multiple strategies for maximizing your earning potential.
Career Advice, Data Science, Data Science Skills, Machine Learning, Salary
- Supercharge Your Machine Learning Experiments with PyCaret and Gradio - May 31, 2021.
A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.
Deployment, Machine Learning, Pipeline, PyCaret, Python
- Where Did You Apply Analytics, Data Science, Machine Learning in 2020/2021? - May 25, 2021.
Take part in the latest KDnuggets survey, and let us know where you have been applying Analytics, Data Science, Machine Learning in 2020/2021.
Analytics, Data Science, Machine Learning, Poll, Survey
- Write and train your own custom machine learning models using PyCaret - May 25, 2021.
A step-by-step, beginner-friendly tutorial on how to write and train custom machine learning models in PyCaret.
Machine Learning, Modeling, PyCaret, Python, Training
- Data Validation in Machine Learning is Imperative, Not Optional - May 24, 2021.
Before we reach model training in the pipeline, there are various components like data ingestion, data versioning, data validation, and data pre-processing that need to be executed. In this article, we will discuss data validation, why it is important, its challenges, and more.
Data Quality, Machine Learning, Production, Validation
- Easy MLOps with PyCaret + MLflow - May 18, 2021.
A beginner-friendly, step-by-step tutorial on integrating MLOps in your Machine Learning experiments using PyCaret.
Machine Learning, MLflow, MLOps, PyCaret, Python
- Best Python Books for Beginners and Advanced Programmers - May 14, 2021.
Let's take a look at nine of the best Python books for both beginners and advanced programmers, covering topics such as data science, machine learning, deep learning, NLP, and more.
Analytics, Books, Data Science, Deep Learning, Machine Learning, Python
- The Explainable Boosting Machine - May 13, 2021.
As accurate as gradient boosting, as interpretable as linear regression.
Decision Trees, Explainability, Gradient Boosting, Interpretability, Machine Learning
- A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know - May 6, 2021.
This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.
CatBoost, Ensemble Methods, Machine Learning, Python, random forests algorithm, scikit-learn, XGBoost
- Feature stores – how to avoid feeling that every day is Groundhog Day - May 6, 2021.
Feature stores stop the duplication of each task in the ML lifecycle. You can reuse features and pipelines for different models, monitor models consistently, and sidestep data leakage with this MLOps technology that everyone is talking about.
Data Preparation, Feature Store, Machine Learning, MLOps
- What makes a winning entry in a Machine Learning competition? - May 5, 2021.
So you want to show your grit in a Kaggle-style competition? Many, many others have the same idea, including domain experts and non-experts, and academic and corporate teams. What does it take for your bright ideas and skills to come out on top of thousands of competitors?
Challenge, Competition, Kaggle, Machine Learning, PyTorch, TensorFlow
- XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python - May 3, 2021.
Understand how XGBoost work with a simple 200 lines codes that implement gradient boosting for decision trees.
Algorithms, Machine Learning, Python, XGBoost
- Gradient Boosted Decision Trees – A Conceptual Explanation - Apr 30, 2021.
Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.
CatBoost, Decision Trees, Gradient Boosting, Machine Learning, Python, scikit-learn, XGBoost
- FluDemic – using AI and Machine Learning to get ahead of disease - Apr 30, 2021.
We are amidst a healthcare data explosion. AI/ML will be more vital than ever in the prevention and handling of future pandemics. Here, we walk you through the different facets of modeling infectious diseases, focusing on influenza and COVID-19.
AI, COVID-19, Healthcare, Machine Learning
- Feature Engineering of DateTime Variables for Data Science, Machine Learning - Apr 29, 2021.
Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models.
Data Science, Feature Engineering, Machine Learning, Python
- Best Podcasts for Machine Learning - Apr 28, 2021.
Podcasts, especially those featuring interviews, are great for learning about the subfields and tools of AI, as well as the rock stars and superheroes of the AI world. Here, we highlight some of the best podcasts today that are perfect for both those learning about machine learning and seasoned practitioners.
AI, Data Science, Machine Learning, Podcast
- Multiple Time Series Forecasting with PyCaret - Apr 27, 2021.
A step-by-step tutorial to forecast multiple time series with PyCaret.
Forecasting, Machine Learning, PyCaret, Python, Time Series
- Improving model performance through human participation - Apr 23, 2021.
Certain industries, such as medicine and finance, are sensitive to false positives. Using human input in the model inference loop can increase the final precision and recall. Here, we describe how to incorporate human feedback at inference time, so that Machines + Humans = Higher Precision & Recall.
Data Science Platform, Humans, Machine Learning, Model Performance, Precision, Recall
Data Science Books You Should Start Reading in 2021 - Apr 23, 2021.
Check out this curated list of the best data science books for any level.
Books, Data Science, Data Scientist, Deep Learning, Machine Learning
- The Three Edge Case Culprits: Bias, Variance, and Unpredictability - Apr 22, 2021.
Edge cases occur for three basic reasons: Bias – the ML system is too ‘simple’; Variance – the ML system is too ‘inexperienced’; Unpredictability – the ML system operates in an environment full of surprises. How do we recognize these edge cases situations, and what can we do about them?
Bias, iMerit, Machine Learning, Variance
- Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1 - Apr 22, 2021.
New to data science? Interested in the must-know machine learning algorithms in the field? Check out the first part of our list and introductory descriptions of the top 10 algorithms for data scientists to know.
Algorithms, Bagging, Data Science, Data Scientist, Decision Trees, Linear Regression, Machine Learning, SVM, Top 10
- Time Series Forecasting with PyCaret Regression Module - Apr 21, 2021.
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.
Machine Learning, PyCaret, Python, Regression, Time Series
- Free From Stanford: Machine Learning with Graphs - Apr 19, 2021.
Check out the freely-available Stanford course Machine Learning with Graphs, taught by Jure Leskovec, and see how a world renowned researcher teaches their topic of expertise. Accessible materials include slides, videos, and more.
Courses, Free, Graphs, Jure Leskovec, Machine Learning, Stanford
- 6 Mistakes To Avoid While Training Your Machine Learning Model - Apr 15, 2021.
While training the AI model, multi-stage activities are performed to utilize the training data in the best manner, so that outcomes are satisfying. So, here are the 6 common mistakes you need to understand to make sure your AI model is successful.
Computer Vision, Data Labeling, Machine Learning, Mistakes
- Continuous Training for Machine Learning – a Framework for a Successful Strategy - Apr 14, 2021.
A basic appreciation by anyone who builds machine learning models is that the model is not useful without useful data. This doesn't change after a model is deployed to production. Effectively monitoring and retraining models with updated data is key to maintaining valuable ML solutions, and can be accomplished with effective approaches to production-level continuous training that is guided by the data.
Machine Learning, MLOps, Model Performance, Production, Real-time, Training Data
- 7 Must-Haves in your Data Science CV - Apr 13, 2021.
If you are looking for a new role as a Data Scientist -- either as a first job fresh out of school, a career change, or a shift to another organization -- then check off as many of these critical points as possible to stand out in the crowd and pass the hiring manager's initial CV screen.
Business, Career Advice, Data Scientist, Machine Learning
- How Noisy Labels Impact Machine Learning Models - Apr 6, 2021.
Not all training data labeling errors have the same impact on the performance of the Machine Learning system. The structure of the labeling errors make a difference. Read iMerit’s latest blog to learn how to minimize the impact of labeling errors.
Data Labeling, Data Preparation, iMerit, Machine Learning
- How to Dockerize Any Machine Learning Application - Apr 6, 2021.
How can you -- an awesome Data Scientist -- also be known as an awesome software engineer? Docker. And these 3 simple steps to use it for your solutions over and over again.
Advice, Applications, Containers, Deployment, Docker, Machine Learning
How to deploy Machine Learning/Deep Learning models to the web - Apr 5, 2021.
The full value of your deep learning models comes from enabling others to use them. Learn how to deploy your model to the web and access it as a REST API, and begin to share the power of your machine learning development with the world.
Deep Learning, Deployment, Machine Learning, RESTful API
Awesome Tricks And Best Practices From Kaggle - Apr 5, 2021.
Easily learn what is only learned by hours of search and exploration.
Data Science, Kaggle, Machine Learning, Tips
Shapash: Making Machine Learning Models Understandable - Apr 2, 2021.
Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.
Explainability, Machine Learning, Python, SHAP
- Easy AutoML in Python - Apr 1, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
- Overview of MLOps - Mar 26, 2021.
Building a machine learning model is great, but to provide real business value, it must be made useful and maintained to remain useful over time. Machine Learning Operations (MLOps), overviewed here, is a rapidly growing space that encompasses everything required to deploy a machine learning model into production, and is a crucial aspect to delivering this sought after value.
Data Science, Deployment, Machine Learning, MLOps, Monitoring
- Data Science Curriculum for Professionals - Mar 25, 2021.
If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.
Cloud Computing, Data Science Education, Data Visualization, Machine Learning, Python, R, Roadmap, Statistics
- Top YouTube Machine Learning Channels - Mar 23, 2021.
These are the top 15 YouTube channels for machine learning as determined by our stated criteria, along with some additional data on the channels to help you decide if they may have some content useful for you.
Machine Learning, Youtube
The Best Machine Learning Frameworks & Extensions for Scikit-learn - Mar 22, 2021.
Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.
Machine Learning, Python, scikit-learn
- Learning from machine learning mistakes - Mar 19, 2021.
Read this article and discover how to find weak spots of a regression model.
Machine Learning, Mistakes, Modeling, Regression
- Data Validation and Data Verification – From Dictionary to Machine Learning - Mar 16, 2021.
In this article, we will understand the difference between data verification and data validation, two terms which are often used interchangeably when we talk about data quality. However, these two terms are distinct.
Data Quality, Machine Learning, Validation
10 Amazing Machine Learning Projects of 2020 - Mar 15, 2021.
So much progress in AI and machine learning happened in 2020, especially in the areas of AI-generating creativity and low-to-no-code frameworks. Check out these trending and popular machine learning projects released last year, and let them inspire your work throughout 2021.
Chatbot, Deep Learning, Image Processing, Machine Learning, Project, Trends
- A Beginner’s Guide to the CLIP Model - Mar 11, 2021.
CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and why CLIP is cool.
CLIP, Computer Vision, Machine Learning, NLP
A Machine Learning Model Monitoring Checklist: 7 Things to Track - Mar 11, 2021.
Once you deploy a machine learning model in production, you need to make sure it performs. In the article, we suggest how to monitor your models and open-source tools to use.
Checklist, Data Science, Deployment, Machine Learning, MLOps, Monitoring
4 Machine Learning Concepts I Wish I Knew When I Built My First Model - Mar 9, 2021.
Diving into building your first machine learning model will be an adventure -- one in which you will learn many important lessons the hard way. However, by following these four tips, your first and subsequent models will be put on a path toward excellence.
Feature Selection, Gradio, Hyperparameter, Machine Learning, Metrics, Python
- Speeding up Scikit-Learn Model Training - Mar 5, 2021.
If your scikit-learn models are taking a bit of time to train, then there are several techniques you can use to make the processing more efficient. From optimizing your model configuration to leveraging libraries to speed up training through parallelization, you can build the best scikit-learn model possible in the least amount of time.
Distributed Computing, Machine Learning, Optimization, scikit-learn
- Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret - Mar 5, 2021.
PyCaret, a low code Python ML library, offers several ways to tune the hyper-parameters of a created model. In this post, I'd like to show how Ray Tune is integrated with PyCaret, and how easy it is to leverage its algorithms and distributed computing to achieve results superior to default random search method.
Bayesian, Hyperparameter, Machine Learning, Optimization, PyCaret, Python, scikit-learn
- Reducing the High Cost of Training NLP Models With SRU++ - Mar 4, 2021.
The increasing computation time and costs of training natural language models (NLP) highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation. A single experiment training a top-performing language model on the 'Billion Word' benchmark would take 384 GPU days and as much as $36,000 using AWS on-demand instances.
Deep Learning, Machine Learning, Neural Networks, NLP
- Getting Started with Distributed Machine Learning with PyTorch and Ray - Mar 3, 2021.
Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learning applications.
Distributed Systems, Machine Learning, Python, PyTorch
Machine Learning Systems Design: A Free Stanford Course - Feb 26, 2021.
This freely-available course from Stanford should give you a toolkit for designing machine learning systems.
Courses, Deployment, Design, Machine Learning, Maintenance, Stanford
- Feature Store as a Foundation for Machine Learning - Feb 19, 2021.
With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.
Data Engineering, Data Infrastructure, Data Lake, Feature Engineering, Feature Store, Machine Learning, Metadata, MLOps, Pipeline
Approaching (Almost) Any Machine Learning Problem - Feb 18, 2021.
This freely-available book is a fantastic walkthrough of practical approaches to machine learning problems.
Deep Learning, Free ebook, Machine Learning, Python
- Distributed and Scalable Machine Learning [Webinar] - Feb 17, 2021.
Mike McCarty and Gil Forsyth work at the Capital One Center for Machine Learning, where they are building internal PyData libraries that scale with Dask and RAPIDS. For this webinar, Feb 23 @ 2 pm PST, 5pm EST, they’ll join Hugo Bowne-Anderson and Matthew Rocklin to discuss their journey to scale data science and machine learning in Python.
Capital One, Dask, Distributed, Machine Learning, Python, scikit-learn, XGBoost
- Easy, Open-Source AutoML in Python with EvalML - Feb 16, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
- How to Speed up Scikit-Learn Model Training - Feb 11, 2021.
Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?
Distributed Systems, Hyperparameter, Machine Learning, Optimization, Parallelism, Python, scikit-learn, Training
- Machine Learning – it’s all about assumptions - Feb 11, 2021.
Just as with most things in life, assumptions can directly lead to success or failure. Similarly in machine learning, appreciating the assumed logic behind machine learning techniques will guide you toward applying the best tool for the data.
Algorithms, Decision Trees, K-nearest neighbors, Linear Regression, Logistic Regression, Machine Learning, Naive Bayes, SVM, XGBoost
- A Critical Comparison of Machine Learning Platforms in an Evolving Market - Feb 11, 2021.
There’s a clear inclination towards the MLaaS model across industries, given the fact that companies today have an option to select from a wide range of solutions that can cater to diverse business needs. Here is a look at 3 of the top ML platforms for data excellence.
Google Cloud, IBM Watson, Machine Learning, Microsoft Azure, Platform
- My machine learning model does not learn. What should I do? - Feb 10, 2021.
This article presents 7 hints on how to get out of the quicksand.
Algorithms, Business Context, Data Quality, Hyperparameter, Machine Learning, Modeling, Tips
- Microsoft Explores Three Key Mysteries of Ensemble Learning - Feb 8, 2021.
A new paper studies three key puzzling characteristics of deep learning ensembles and some potential explanations.
Ensemble Methods, Machine Learning, Microsoft
- Saving and loading models in TensorFlow — why it is important and how to do it - Feb 3, 2021.
So much time and effort can go into training your machine learning models. But, shut down the notebook or system, and all those trained weights and more vanish with the memory flush. Saving your models to maximize reusability is key for efficient productivity.
Deep Learning, Machine Learning, TensorFlow
- Machine learning adversarial attacks are a ticking time bomb - Jan 29, 2021.
Software developers and cyber security experts have long fought the good fight against vulnerabilities in code to defend against hackers. A new, subtle approach to maliciously targeting machine learning models has been a recent hot topic in research, but its statistical nature makes it difficult to find and patch these so-called adversarial attacks. Such threats in the real-world are becoming imminent as the adoption of machine learning spreads, and a systematic defense must be implemented.
Adversarial, Generative Adversarial Network, Machine Learning
- Top 5 Reasons Why Machine Learning Projects Fail - Jan 28, 2021.
The rise in machine learning project implementation is coming, as is the the number of failures, due to several implementation and maintenance challenges. The first step of closing this gap lies in understanding the reasons for the failure.
Data Preparation, Data Science, Failure, Implementation, Machine Learning
- Machine learning is going real-time - Jan 28, 2021.
Extracting immediate predictions from machine learning algorithms on the spot based on brand-new data can offer a next level of interaction and potential value to its consumers. The infrastructure and tech stack required to implement such real-time systems is also next level, and many organizations -- especially in the US -- seem to be resisting. But, what even is real-time ML, and how can it deliver a better experience?
China, Machine Learning, MLOps, Real-time, Stream Processing