- 3 More Free Top Notch Natural Language Processing Courses, by Matthew Mayo - Mar 31, 2021.
Are you looking to continue your learning of natural language processing? This small collection of 3 free top notch courses will allow you to do just that.
Andrew Ng, CMU, Coursera, Courses, deeplearning.ai, Neural Networks, NLP
- Introduction to the White-Box AI: the Concept of Interpretability, by SciForce - Mar 31, 2021.
ML models interpretability can be seen as “the ability to explain or to present in understandable terms to a human.” Read this article and learn to go beyond the black box of AI, where algorithms make predictions, toward the underlying explanation remains unknown and untraceable.
AI, Explainability, Explainable AI, Sciforce
- Software Engineering Best Practices for Data Scientists, by Madison Hunter - Mar 30, 2021.
This is a crash course on how to bridge the gap between data science and software engineering.
Data Science, Data Scientist, Programming, Python, Software Engineering
- Why So Many Data Scientists Quit Good Jobs at Great Companies, by Adam Sroka - Mar 30, 2021.
The role of the Data Scientist continues to offer many great opportunities as a career. However, the 'sexiest job of the 21st century' has lost some of its appeal because of unrealized expectations and how organizations might leverage this type of work. Having a better understanding of how data science typically plays out in the business world can help you achieve the success you want.
Career Advice, Data Science Skills, Data Scientist, Jobs
- Explainable Visual Reasoning: How MIT Builds Neural Networks that can Explain Themselves, by Jesus Rodriguez - Mar 30, 2021.
New MIT research attempts to close the gap between state-of-the-art performance and interpretable models in computer vision tasks.
Explainability, Explainable AI, MIT, Neural Networks
- How to break a model in 20 days — a tutorial on production model analytics, by Dral & Samuylova - Mar 29, 2021.
This is an article on how models fail in production, and how to spot it.
Analytics, Data Science, Data Visualization, Modeling, Production
- MongoDB in the Cloud: Three Solutions for 2021, by Krueger & Franklin - Mar 26, 2021.
An overview of pricing and compatibility for MongoDB Atlas, AWS DocumentDB, Azure Cosmos DB.
Cloud, Database, MongoDB, NoSQL
- Overview of MLOps, by Steve Shwartz - Mar 26, 2021.
Building a machine learning model is great, but to provide real business value, it must be made useful and maintained to remain useful over time. Machine Learning Operations (MLOps), overviewed here, is a rapidly growing space that encompasses everything required to deploy a machine learning model into production, and is a crucial aspect to delivering this sought after value.
Data Science, Deployment, Machine Learning, MLOps, Monitoring
- Multilingual CLIP with Huggingface + PyTorch Lightning, by Sachin Abeywardana - Mar 26, 2021.
An overview of training OpenAI's CLIP on Google Colab.
CLIP, Google Colab, Hugging Face, Image Recognition, NLP, OpenAI, PyTorch, PyTorch Lightning
- Extraction of Objects In Images and Videos Using 5 Lines of Code, by Ayoola Olafenwa - Mar 25, 2021.
PixelLib is a library created for easy integration of image and video segmentation in real life applications. Learn to use PixelLib to extract objects In images and videos with minimal code.
Computer Vision, Image Processing, Object Detection, Python, Segmentation, Video
-

Top 10 Python Libraries Data Scientists should know in 2021, by Terence Shin - Mar 24, 2021.
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
Data Science, Keras, numpy, Pandas, Python, scikit-learn, Seaborn, TensorFlow
- Rejection Sampling with Python, by Michael Grogan - Mar 24, 2021.
Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.
Distribution, Probability, Python, Sampling, Statistics
- Metric Matters, Part 2: Evaluating Regression Models, by Susan Sivek - Mar 23, 2021.
In this second part review of the many options available for choosing metrics to evaluate machine learning models, learn how to select the most appropriate metric for your analysis of regression models.
Data Science, Metrics, Model Performance, Regression
- Top YouTube Machine Learning Channels, by Matthew Mayo - Mar 23, 2021.
These are the top 15 YouTube channels for machine learning as determined by our stated criteria, along with some additional data on the channels to help you decide if they may have some content useful for you.
Machine Learning, Youtube
-
The Best Machine Learning Frameworks & Extensions for Scikit-learn, by Derrick Mwiti - Mar 22, 2021.
Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.
Machine Learning, Python, scikit-learn
-
The Portfolio Guide for Data Science Beginners, by Navid Mashinchi - Mar 22, 2021.
Whether you are an aspiring or seasoned Data Scientist, establishing a clear and well-designed online portfolio presence will help you stand out in the industry, and provide potential employers a powerful understanding of your work and capabilities. These tips will help you brainstorm and launch your first data science portfolio.
Beginners, Data Science Skills, Data Scientist, Portfolio
- Teaching AI to See Like a Human, by Jesus Rodriguez - Mar 22, 2021.
DeepMind Generative Query Networks can infer knowledge as they navigate a visual environment.
Agents, AI, Humans, Training
- Learning from machine learning mistakes, by Emeli Dral - Mar 19, 2021.
Read this article and discover how to find weak spots of a regression model.
Machine Learning, Mistakes, Modeling, Regression
- How to build a DAG Factory on Airflow, by Axel Furlan - Mar 19, 2021.
A guide to building efficient DAGs with half of the code.
Data Engineering, Data Workflow, Graphs, Python, Workflow
-

More Data Science Cheatsheets, by Matthew Mayo - Mar 18, 2021.
It's time again to look at some data science cheatsheets. Here you can find a short selection of such resources which can cater to different existing levels of knowledge and breadth of topics of interest.
Cheat Sheet, Data Science, Interview Questions, Machine Learning, Probability
- How to frame the right questions to be answered using data, by Benjamin Obi Tayo - Mar 18, 2021.
Understanding your data first is a key step before going too far into any data science project. But, you can't fully understand your data until you know the right questions to ask of it.
Advice, Data Analysis, Data Exploration, Data Science, Data Visualization
- A Simple Way to Time Code in Python, by Krueger & Franklin - Mar 18, 2021.
Read on to find out how to use a decorator to time your functions.
Optimization, Programming, Python
- Automating Machine Learning Model Optimization, by Himanshu Sharma - Mar 17, 2021.
This articles presents an overview of using Bayesian Tuning and Bandits for machine learning.
Bayesian, Hyperparameter, Machine Learning, Optimization
- How to Begin Your NLP Journey, by Diego Lopez Yse - Mar 17, 2021.
In this blog post, learn how to process text using Python.
NLP, Python, Text Analytics
- Natural Language Processing Pipelines, Explained, by Ram Tavva - Mar 16, 2021.
This article presents a beginner's view of NLP, as well as an explanation of how a typical NLP pipeline might look.
Explained, NLP, NLTK, Python, Text Analytics
- Metric Matters, Part 1: Evaluating Classification Models, by Susan Sivek - Mar 16, 2021.
You have many options when choosing metrics for evaluating your machine learning models. Select the right one for your situation with this guide that considers metrics for classification models.
Accuracy, Classification, Metrics, Precision, Recall, ROC-AUC
- Data Validation and Data Verification – From Dictionary to Machine Learning, by Aggarwal & Bose - Mar 16, 2021.
In this article, we will understand the difference between data verification and data validation, two terms which are often used interchangeably when we talk about data quality. However, these two terms are distinct.
Data Quality, Machine Learning, Validation
-
10 Amazing Machine Learning Projects of 2020, by Anupam Chugh - Mar 15, 2021.
So much progress in AI and machine learning happened in 2020, especially in the areas of AI-generating creativity and low-to-no-code frameworks. Check out these trending and popular machine learning projects released last year, and let them inspire your work throughout 2021.
Chatbot, Deep Learning, Image Processing, Machine Learning, Project, Trends
- Forget Telling Stories; Help People Navigate, by Stan Pugsley - Mar 15, 2021.
When designing reporting & visualizations, think of them as part of a navigation framework rather than stand-alone information.
Data Analysis, Data Science, Infographic, KPI, Storytelling
- Kedro-Airflow: Orchestrating Kedro Pipelines with Airflow, by Jo Stitchbury - Mar 12, 2021.
The Kedro team and Astronomer have released Kedro-Airflow 0.4.0 to help you develop modular, maintainable & reproducible code with orchestration superpowers!
Data Science, Interview, Pipeline, Python, Workflow
-
Must Know for Data Scientists and Data Analysts: Causal Design Patterns, by Emily Riederer - Mar 12, 2021.
Industry is a prime setting for observational causal inference, but many companies are blind to causal measurement beyond A/B tests. This formula-free primer illustrates analysis design patterns for measuring causal effects from observational data.
Causality, Data Science, Design, Design of Experiments, Statistics
-
Know your data much faster with the new Sweetviz Python library, by Francois Bertrand - Mar 12, 2021.
One of the latest exploratory data analysis libraries is a new open-source Python library called Sweetviz, for just the purposes of finding out data types, missing information, distribution of values, correlations, etc. Find out more about the library and how to use it here.
Data Analysis, Data Exploration, Data Visualization, Python
- A Beginner’s Guide to the CLIP Model, by Matthew Brems - Mar 11, 2021.
CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and why CLIP is cool.
CLIP, Computer Vision, Machine Learning, NLP
- The Inferential Statistics Data Scientists Should Know, by Nagesh Chauhan - Mar 11, 2021.
The foundations of Data Science and machine learning algorithms are in mathematics and statistics. To be the best Data Scientists you can be, your skills in statistical understanding should be well-established. The more you appreciate statistics, the better you will understand how machine learning performs its apparent magic.
Data Science Education, Statistics
-
A Machine Learning Model Monitoring Checklist: 7 Things to Track, by Emeli Dral & Elena Samuylova - Mar 11, 2021.
Once you deploy a machine learning model in production, you need to make sure it performs. In the article, we suggest how to monitor your models and open-source tools to use.
Checklist, Data Science, Deployment, Machine Learning, MLOps, Monitoring
- How to Speed Up Pandas with Modin, by Michael Galarnyk - Mar 10, 2021.
The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
Data Science, Distributed Systems, Modin, Pandas, Python, Workflow
-

How To Overcome The Fear of Math and Learn Math For Data Science, by Arnuld On Data - Mar 10, 2021.
Many aspiring Data Scientists, especially when self-learning, fail to learn the necessary math foundations. These recommendations for learning approaches along with references to valuable resources can help you overcome a personal sense of not being "the math type" or belief that you "always failed in math."
Advice, Career, Data Science Education, Mathematics, Statistics
- DeepMind’s AlphaFold & the Protein Folding Problem, by Kevin Vu - Mar 10, 2021.
Recently, DeepMind's AlphaFold made impressive headway in the protein structure prediction problem. Read this for an overview and explanation.
AI, Biology, DeepMind, Protein
- Document Databases, Explained, by Alex Williams - Mar 9, 2021.
Out of all the NoSQL database types, document-stores are considered the most sophisticated ones. They store data in a JSON format which as opposed to a classic rows and columns structure.
Beginners, Databases, NoSQL
- Beautiful decision tree visualizations with dtreeviz, by Eryk Lewinson - Mar 8, 2021.
Improve the old way of plotting the decision trees and never go back!
Algorithms, Data Visualization, Decision Trees, Python
- 11 Essential Code Blocks for Complete EDA (Exploratory Data Analysis), by Susan Maina - Mar 5, 2021.
This article is a practical guide to exploring any data science project and gain valuable insights.
Data Analysis, Data Exploration, Data Visualization, Pandas, Python
- Speeding up Scikit-Learn Model Training, by Michael Galarnyk - Mar 5, 2021.
If your scikit-learn models are taking a bit of time to train, then there are several techniques you can use to make the processing more efficient. From optimizing your model configuration to leveraging libraries to speed up training through parallelization, you can build the best scikit-learn model possible in the least amount of time.
Distributed Computing, Machine Learning, Optimization, scikit-learn
- Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret, by Antoni Baum - Mar 5, 2021.
PyCaret, a low code Python ML library, offers several ways to tune the hyper-parameters of a created model. In this post, I'd like to show how Ray Tune is integrated with PyCaret, and how easy it is to leverage its algorithms and distributed computing to achieve results superior to default random search method.
Bayesian, Hyperparameter, Machine Learning, Optimization, PyCaret, Python, scikit-learn
- Reducing the High Cost of Training NLP Models With SRU++, by Tao Lei, PhD - Mar 4, 2021.
The increasing computation time and costs of training natural language models (NLP) highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation. A single experiment training a top-performing language model on the 'Billion Word' benchmark would take 384 GPU days and as much as $36,000 using AWS on-demand instances.
Deep Learning, Machine Learning, Neural Networks, NLP
- Evaluating Object Detection Models Using Mean Average Precision, by Ahmed Gad - Mar 3, 2021.
In this article we will see see how precision and recall are used to calculate the Mean Average Precision (mAP).
Computer Vision, Metrics, Modeling, Object Detection
- 15 common mistakes data scientists make in Python (and how to fix them), by Gerold Csendes - Mar 3, 2021.
Writing Python code that works for your data science project and performs the task you expect is one thing. Ensuring your code is readable by others (including your future self), reproducible, and efficient are entirely different challenges that can be addressed by minimizing common bad practices in your development.
Best Practices, Data Scientist, Jupyter, Mistakes, Programming, Python
- Getting Started with Distributed Machine Learning with PyTorch and Ray, by Galarnyk, Liaw & Nishihara - Mar 3, 2021.
Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learning applications.
Distributed Systems, Machine Learning, Python, PyTorch
- Speech to Text with Wav2Vec 2.0, by Dhilip Subramanian - Mar 2, 2021.
Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.
Hugging Face, NLP, Python, PyTorch, Transformer
-
3 Mathematical Laws Data Scientists Need To Know, by Cornellius Yudha Wijaya - Mar 2, 2021.
Machine learning and data science are founded on important mathematics in statistics and probability. A few interesting mathematical laws you should understand will especially help you perform better as a Data Scientist, including Benford's Law, the Law of Large Numbers, and Zipf's Law.
Benford's Law, Data Science, Mathematics, Zipf's Law
-
Google’s Model Search is a New Open Source Framework that Uses Neural Networks to Build Neural Networks, by Jesus Rodriguez - Mar 1, 2021.
The new framework brings state-of-the-art neural architecture search methods to TensorFlow.
Automated Machine Learning, AutoML, Google, Neural Networks, Open Source
-

Top YouTube Channels for Data Science, by Matthew Mayo - Mar 1, 2021.
Have a look at the top 15 YouTube channels for data science by number of subscribers, along with some additional data on the channels to help you decide if they may have some content useful for you.
Data Science, Youtube