2021 Mar Tutorials, Overviews

All (57) | Opinions (10) | Products, Services (7) | Tutorials, Overviews (40)

3 More Free Top Notch Natural Language Processing Courses

Are you looking to continue your learning of natural language processing? This small collection of 3 free top notch courses will allow you to do just that.

By Matthew Mayo on Mar 31, 2021 in Andrew Ng, CMU, Coursera, Courses, deeplearning.ai, Neural Networks, NLP
Software Engineering Best Practices for Data Scientists

This is a crash course on how to bridge the gap between data science and software engineering.

By Madison Hunter on Mar 30, 2021 in Data Science, Data Scientist, Programming, Python, Software Engineering
Why So Many Data Scientists Quit Good Jobs at Great Companies

The role of the Data Scientist continues to offer many great opportunities as a career. However, the 'sexiest job of the 21st century' has lost some of its appeal because of unrealized expectations and how organizations might leverage this type of work. Having a better understanding of how data science typically plays out in the business world can help you achieve the success you want.

By Adam Sroka on Mar 30, 2021 in Career Advice, Data Science Skills, Data Scientist, Jobs
Overview of MLOps

Building a machine learning model is great, but to provide real business value, it must be made useful and maintained to remain useful over time. Machine Learning Operations (MLOps), overviewed here, is a rapidly growing space that encompasses everything required to deploy a machine learning model into production, and is a crucial aspect to delivering this sought after value.

By Steve Shwartz on Mar 26, 2021 in Data Science, Deployment, Machine Learning, MLOps, Monitoring
Multilingual CLIP with Huggingface + PyTorch Lightning

An overview of training OpenAI's CLIP on Google Colab.

By Sachin Abeywardana on Mar 26, 2021 in CLIP, Google Colab, Hugging Face, Image Recognition, NLP, OpenAI, PyTorch, PyTorch Lightning
Top 10 Python Libraries Data Scientists should know in 2021

So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.

By Terence Shin on Mar 24, 2021 in Data Science, Keras, numpy, Pandas, Python, scikit-learn, Seaborn, TensorFlow
Top YouTube Machine Learning Channels

These are the top 15 YouTube channels for machine learning as determined by our stated criteria, along with some additional data on the channels to help you decide if they may have some content useful for you.

By Matthew Mayo on Mar 23, 2021 in Machine Learning, Youtube
The Best Machine Learning Frameworks & Extensions for Scikit-learn

Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.

By Derrick Mwiti on Mar 22, 2021 in Machine Learning, Python, scikit-learn
The Portfolio Guide for Data Science Beginners

Whether you are an aspiring or seasoned Data Scientist, establishing a clear and well-designed online portfolio presence will help you stand out in the industry, and provide potential employers a powerful understanding of your work and capabilities. These tips will help you brainstorm and launch your first data science portfolio.

By Navid Mashinchi on Mar 22, 2021 in Beginners, Data Science Skills, Data Scientist, Portfolio
Learning from machine learning mistakes

Read this article and discover how to find weak spots of a regression model.

By Emeli Dral on Mar 19, 2021 in Machine Learning, Mistakes, Modeling, Regression
How to build a DAG Factory on Airflow

A guide to building efficient DAGs with half of the code.

By Axel Furlan on Mar 19, 2021 in Data Engineering, Data Workflow, Graphs, Python, Workflow
How to frame the right questions to be answered using data

Understanding your data first is a key step before going too far into any data science project. But, you can't fully understand your data until you know the right questions to ask of it.

By Benjamin Obi Tayo on Mar 18, 2021 in Advice, Data Analysis, Data Exploration, Data Science, Data Visualization
A Simple Way to Time Code in Python

Read on to find out how to use a decorator to time your functions.

By Krueger & Franklin on Mar 18, 2021 in Optimization, Programming, Python
How to Begin Your NLP Journey

In this blog post, learn how to process text using Python.

By Diego Lopez Yse on Mar 17, 2021 in NLP, Python, Text Analytics
Natural Language Processing Pipelines, Explained

This article presents a beginner's view of NLP, as well as an explanation of how a typical NLP pipeline might look.

By Ram Tavva on Mar 16, 2021 in Explained, NLP, NLTK, Python, Text Analytics
Metric Matters, Part 1: Evaluating Classification Models

You have many options when choosing metrics for evaluating your machine learning models. Select the right one for your situation with this guide that considers metrics for classification models.

By Susan Sivek on Mar 16, 2021 in Accuracy, Classification, Metrics, Precision, Recall, ROC-AUC
Data Validation and Data Verification – From Dictionary to Machine Learning

In this article, we will understand the difference between data verification and data validation, two terms which are often used interchangeably when we talk about data quality. However, these two terms are distinct.

By Aggarwal & Bose on Mar 16, 2021 in Data Quality, Machine Learning, Validation
10 Amazing Machine Learning Projects of 2020

So much progress in AI and machine learning happened in 2020, especially in the areas of AI-generating creativity and low-to-no-code frameworks. Check out these trending and popular machine learning projects released last year, and let them inspire your work throughout 2021.

By Anupam Chugh on Mar 15, 2021 in Chatbot, Deep Learning, Image Processing, Machine Learning, Project, Trends
Forget Telling Stories; Help People Navigate

When designing reporting & visualizations, think of them as part of a navigation framework rather than stand-alone information.

By Stan Pugsley on Mar 15, 2021 in Data Analysis, Data Science, Infographic, KPI, Storytelling
Kedro-Airflow: Orchestrating Kedro Pipelines with Airflow

The Kedro team and Astronomer have released Kedro-Airflow 0.4.0 to help you develop modular, maintainable & reproducible code with orchestration superpowers!

By Jo Stitchbury on Mar 12, 2021 in Data Science, Interview, Pipeline, Python, Workflow
Must Know for Data Scientists and Data Analysts: Causal Design Patterns

Industry is a prime setting for observational causal inference, but many companies are blind to causal measurement beyond A/B tests. This formula-free primer illustrates analysis design patterns for measuring causal effects from observational data.

By Emily Riederer on Mar 12, 2021 in Causality, Data Science, Design, Design of Experiments, Statistics
Know your data much faster with the new Sweetviz Python library

One of the latest exploratory data analysis libraries is a new open-source Python library called Sweetviz, for just the purposes of finding out data types, missing information, distribution of values, correlations, etc. Find out more about the library and how to use it here.

By Francois Bertrand on Mar 12, 2021 in Data Analysis, Data Exploration, Data Visualization, Python
A Beginner’s Guide to the CLIP Model

CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and why CLIP is cool.

By Matthew Brems on Mar 11, 2021 in CLIP, Computer Vision, Machine Learning, NLP
The Inferential Statistics Data Scientists Should Know

The foundations of Data Science and machine learning algorithms are in mathematics and statistics. To be the best Data Scientists you can be, your skills in statistical understanding should be well-established. The more you appreciate statistics, the better you will understand how machine learning performs its apparent magic.

By Nagesh Chauhan on Mar 11, 2021 in Data Science Education, Statistics
A Machine Learning Model Monitoring Checklist: 7 Things to Track

Once you deploy a machine learning model in production, you need to make sure it performs. In the article, we suggest how to monitor your models and open-source tools to use.

By Emeli Dral & Elena Samuylova on Mar 11, 2021 in Checklist, Data Science, Deployment, Machine Learning, MLOps, Monitoring
How to Speed Up Pandas with Modin

The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.

By Michael Galarnyk on Mar 10, 2021 in Data Science, Distributed Systems, Modin, Pandas, Python, Workflow
DeepMind’s AlphaFold & the Protein Folding Problem

Recently, DeepMind's AlphaFold made impressive headway in the protein structure prediction problem. Read this for an overview and explanation.

By Kevin Vu on Mar 10, 2021 in AI, Biology, DeepMind, Protein
Document Databases, Explained

Out of all the NoSQL database types, document-stores are considered the most sophisticated ones. They store data in a JSON format which as opposed to a classic rows and columns structure.

By Alex Williams on Mar 9, 2021 in Beginners, Databases, NoSQL
Beautiful decision tree visualizations with dtreeviz

Improve the old way of plotting the decision trees and never go back!

By Eryk Lewinson on Mar 8, 2021 in Algorithms, Data Visualization, Decision Trees, Python
11 Essential Code Blocks for Complete EDA (Exploratory Data Analysis)

This article is a practical guide to exploring any data science project and gain valuable insights.

By Susan Maina on Mar 5, 2021 in Data Analysis, Data Exploration, Data Visualization, Pandas, Python
Speeding up Scikit-Learn Model Training

If your scikit-learn models are taking a bit of time to train, then there are several techniques you can use to make the processing more efficient. From optimizing your model configuration to leveraging libraries to speed up training through parallelization, you can build the best scikit-learn model possible in the least amount of time.

By Michael Galarnyk on Mar 5, 2021 in Distributed Computing, Machine Learning, Optimization, scikit-learn
Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret

PyCaret, a low code Python ML library, offers several ways to tune the hyper-parameters of a created model. In this post, I'd like to show how Ray Tune is integrated with PyCaret, and how easy it is to leverage its algorithms and distributed computing to achieve results superior to default random search method.

By Antoni Baum on Mar 5, 2021 in Bayesian, Hyperparameter, Machine Learning, Optimization, PyCaret, Python, scikit-learn
Reducing the High Cost of Training NLP Models With SRU++

The increasing computation time and costs of training natural language models (NLP) highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation. A single experiment training a top-performing language model on the 'Billion Word' benchmark would take 384 GPU days and as much as $36,000 using AWS on-demand instances.

By Tao Lei, PhD on Mar 4, 2021 in Deep Learning, Machine Learning, Neural Networks, NLP
Evaluating Object Detection Models Using Mean Average Precision

In this article we will see see how precision and recall are used to calculate the Mean Average Precision (mAP).

By Ahmed Gad on Mar 3, 2021 in Computer Vision, Metrics, Modeling, Object Detection
15 common mistakes data scientists make in Python (and how to fix them)

Writing Python code that works for your data science project and performs the task you expect is one thing. Ensuring your code is readable by others (including your future self), reproducible, and efficient are entirely different challenges that can be addressed by minimizing common bad practices in your development.

By Gerold Csendes on Mar 3, 2021 in Best Practices, Data Scientist, Jupyter, Mistakes, Programming, Python
Getting Started with Distributed Machine Learning with PyTorch and Ray

Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learning applications.

By Galarnyk, Liaw & Nishihara on Mar 3, 2021 in Distributed Systems, Machine Learning, Python, PyTorch
Speech to Text with Wav2Vec 2.0

Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.

By Dhilip Subramanian on Mar 2, 2021 in Hugging Face, NLP, Python, PyTorch, Transformer
3 Mathematical Laws Data Scientists Need To Know

Machine learning and data science are founded on important mathematics in statistics and probability. A few interesting mathematical laws you should understand will especially help you perform better as a Data Scientist, including Benford's Law, the Law of Large Numbers, and Zipf's Law.

By Cornellius Yudha Wijaya on Mar 2, 2021 in Benford's Law, Data Science, Mathematics, Zipf's Law
Google’s Model Search is a New Open Source Framework that Uses Neural Networks to Build Neural Networks

The new framework brings state-of-the-art neural architecture search methods to TensorFlow.

By Jesus Rodriguez on Mar 1, 2021 in Automated Machine Learning, AutoML, Google, Neural Networks, Open Source
Top YouTube Channels for Data Science

Have a look at the top 15 YouTube channels for data science by number of subscribers, along with some additional data on the channels to help you decide if they may have some content useful for you.

By Matthew Mayo on Mar 1, 2021 in Data Science, Youtube

2021 Mar Tutorials, Overviews

Latest Posts

Top Posts