- From Scratch: Permutation Feature Importance for ML Interpretability - Jun 30, 2021.
Use permutation feature importance to discover which features in your dataset are useful for prediction — implemented from scratch in Python.
Feature Selection, Interpretability, Machine Learning, Python
- KDnuggets™ News 21:n24, Jun 30: What will the demand for Data Scientists be in 10 years?; Add A New Dimension To Your Photos Using Python - Jun 30, 2021.
What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?; Add A New Dimension To Your Photos Using Python; Data Scientists are from Mars and Software Developers are from Venus; How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3; In-Warehouse Machine Learning and the Modern Data Science Stack
Data Science, Data Scientist, Data Warehouse, Image Processing, Machine Learning, NLP, Python, Software Developer
Add A New Dimension To Your Photos Using Python - Jun 28, 2021.
Read this to learn how to breathe new life into your photos with a 3D Ken Burns Effect.
Google Colab, Image Generation, Image Processing, Python
- How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3 - Jun 28, 2021.
A step-by-step guide on how to train a relation extraction classifier using Transformer and spaCy3.
BERT, NLP, Python, spaCy, Text Analytics, Transformer
- Applied Language Technology: A No-Nonsense Approach - Jun 25, 2021.
Here is a free entry-level applied natural language processing course that can fit into any beginner's roadmap to understanding NLP. Check it out.
NLP, Python, Text Analytics
- How to create an interactive 3D chart and share it easily with anyone - Jun 25, 2021.
This is a short tutorial on a great Plotly feature.
Data Visualization, Graph, Python
- 10 Python Code Snippets We Should All Know - Jun 24, 2021.
Check out these Python code snippets and start using them to solve everyday problems.
Programming, Python
- Workflow Orchestration with Prefect and Coiled - Jun 23, 2021.
Coiled helps data scientists use Python for ambitious problems, scaling to the cloud for computing power, ease, and speed—all tuned for the needs of teams and enterprises. In this demo example, see how to spin up a Coiled cluster to execute Prefect jobs during runtime.
Coiled.io, Modern Data Stack, Orchestration, Prefect, Python, Workflow
- Create and Deploy Dashboards using Voila and Saturn Cloud - Jun 23, 2021.
Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.
Analytics, Cloud, Dashboard, Data Science, Machine Learning, Python
- Fine-Tuning Transformer Model for Invoice Recognition - Jun 23, 2021.
The author presents a step-by-step guide from annotation to training.
Business Analytics, Image Classification, NLP, Python, Transformer
- KDnuggets™ News 21:n23, Jun 23: Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months - Jun 23, 2021.
Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months; A Graph-based Text Similarity Method with Named Entity Information in NLP; The Best Way to Learn Practical NLP?; An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)
Analytics, Career Advice, Data Scientist, Explainability, NLP, Pandas, Python, SQL
- How to troubleshoot memory problems in Python - Jun 21, 2021.
Memory problems are hard to diagnose and fix in Python. This post goes through a step-by-step process for how to pinpoint and fix memory leaks using popular open source python packages.
Programming, Python
- Dashboards for Interpreting & Comparing Machine Learning Models - Jun 17, 2021.
This article discusses using Interpret to create dashboards for machine learning models.
Interpretability, Machine Learning, Modeling, Python
- KDnuggets™ News 21:n22, Jun 16: Data Scientists Extinct in 10 Years? Generate Automated PDF Documents with Python - Jun 16, 2021.
Data Scientists be extinct in 10 years? How to generate PDF Documents with Python; Top 10 Data Science Projects for Beginners; Five types of thinking for a high performing data scientist; and how to get interactive plots directly with Pandas.
Career Advice, Data Scientist, PDF, Project, Python, Trends
Get Interactive Plots Directly With Pandas - Jun 14, 2021.
Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.
Bokeh, Data Visualization, Pandas, Plotly, Python
- Building a Knowledge Graph for Job Search Using BERT - Jun 14, 2021.
A guide on how to create knowledge graphs using NER and Relation Extraction.
BERT, Careers, Data Science Skills, Knowledge Graph, NLP, Python, Search, Transformer

How to Generate Automated PDF Documents with Python - Jun 10, 2021.
Discover how to leverage automation to create dazzling PDF documents effortlessly.
Data Visualization, PDF, Programming, Python
- KDnuggets™ News 21:n21, Jun 9: 5 Tasks To Automate With Python; How I Doubled My Income with Data Science and Machine Learning - Jun 9, 2021.
5 Tasks To Automate With Python; How I Doubled My Income with Data Science and Machine Learning; Will There Be a Shortage of Data Science Jobs in the Next 5 Years?; How to Make Python Code Run Incredibly Fast; Stop (and Start) Hiring Data Scientists
Automation, Career Advice, Data Science, Data Scientist, Deployment, Machine Learning, Modeling, Programming, Python
- The only Jupyter Notebooks extension you truly need - Jun 8, 2021.
Now you don’t need to restart the kernel after editing the code in your custom imports.
Deployment, Jupyter, Machine Learning, Python
- How to Fine-Tune BERT Transformer with spaCy 3 - Jun 7, 2021.
A step-by-step guide on how to create a knowledge graph using NER and Relation Extraction.
BERT, Knowledge Graph, NLP, Python, spaCy, Transformer
- PyCaret 101: An introduction for beginners - Jun 7, 2021.
This article is a great overview of how to get started with PyCaret for all your machine learning projects.
Machine Learning, PyCaret, Python

5 Tasks To Automate With Python - Jun 4, 2021.
Here are 5 tasks you can automate with Python, and how to do it.
Automation, Programming, Python
- Machine Learning Model Interpretation - Jun 2, 2021.
Read this overview of using Skater to build machine learning visualizations.
Explainability, Interpretability, Machine Learning, Python
How to Make Python Code Run Incredibly Fast - Jun 2, 2021.
In this article, I have explained some tips and tricks to optimize and speed up Python code.
Optimization, Performance, Programming, Python
- How to Create and Deploy a Simple Sentiment Analysis App via API - Jun 1, 2021.
In this article we will create a simple sentiment analysis app using the HuggingFace Transformers library, and deploy it using FastAPI.
FastAPI, Hugging Face, NLP, Python, Sentiment Analysis, Transformer
- Make Pandas 3 Times Faster with PyPolars - May 31, 2021.
Learn how to speed up your Pandas workflow using the PyPolars library.
Pandas, Performance, Python
- Supercharge Your Machine Learning Experiments with PyCaret and Gradio - May 31, 2021.
A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.
Deployment, Machine Learning, Pipeline, PyCaret, Python
- Topic Modeling with Streamlit - May 26, 2021.
What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.
Deployment, NLP, Python, spaCy, Streamlit, Text Analytics, Topic Modeling
- Write and train your own custom machine learning models using PyCaret - May 25, 2021.
A step-by-step, beginner-friendly tutorial on how to write and train custom machine learning models in PyCaret.
Machine Learning, Modeling, PyCaret, Python, Training
- How to Deal with Categorical Data for Machine Learning - May 24, 2021.
Check out this guide to implementing different types of encoding for categorical data, including a cheat sheet on when to use what type.
Data Preparation, Data Preprocessing, Feature Engineering, Machine Learning, Pandas, Python, scikit-learn
- Building RESTful APIs using Flask - May 21, 2021.
Learn about using the lightweight web framework in Python from this article.
API, Flask, Python, RESTful API
How to Determine if Your Machine Learning Model is Overtrained - May 20, 2021.
WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.
Learning, Modeling, Python, Training
- Differentiable Programming from Scratch - May 19, 2021.
In this article, we are going to explain what Differentiable Programming is by developing from scratch all the tools needed for this exciting new kind of programming.
Mathematics, Programming, Python
- KDnuggets™ News 21:n19, May 19: Vaex: Pandas but 1000x faster; The Most In Demand Skills for Data Engineers in 2021 - May 19, 2021.
Vaex: Pandas but 1000x faster; Best Python Books for Beginners and Advanced Programmers; The Most In Demand Skills for Data Engineers in 2021; The next-generation of AutoML frameworks; and more.
Data Engineer, Python, Vaex
- Animated Bar Chart Races in Python - May 18, 2021.
A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.
COVID-19, Data Science, Data Visualization, Pandas, Python, Visualization
- The Most In Demand Skills for Data Engineers in 2021 - May 18, 2021.
If you are preparing to make a career in data or are looking for opportunities to skill-up in your current data-centric role, then this analysis of in-demand skills for 2021, based on over 17,000 Data Engineer job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
Apache Spark, AWS, Data Engineer, Data Science Skills, Data Scientist, Python, Skills, SQL
- Easy MLOps with PyCaret + MLflow - May 18, 2021.
A beginner-friendly, step-by-step tutorial on integrating MLOps in your Machine Learning experiments using PyCaret.
Machine Learning, MLflow, MLOps, PyCaret, Python
- Best Python Books for Beginners and Advanced Programmers - May 14, 2021.
Let's take a look at nine of the best Python books for both beginners and advanced programmers, covering topics such as data science, machine learning, deep learning, NLP, and more.
Analytics, Books, Data Science, Deep Learning, Machine Learning, Python
- Super Charge Python with Pandas on GPUs Using Saturn Cloud - May 12, 2021.
Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.
Cloud, GPU, Pandas, Python
- KDnuggets™ News 21:n18, May 12: Data Preparation in SQL, with Cheat Sheet!; Rebuilding 7 Python Projects - May 12, 2021.
Data Preparation in SQL, with Cheat Sheet!; Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Essential Linear Algebra for Data Science and Machine Learning; Similarity Metrics in NLP
Cheat Sheet, Data Preparation, Data Science, Linear Algebra, Machine Learning, Metrics, NLP, Pandas, Project, Python, SQL

Essential Linear Algebra for Data Science and Machine Learning - May 10, 2021.
Linear algebra is foundational in data science and machine learning. Beginners starting out along their learning journey in data science--as well as established practitioners--must develop a strong familiarity with the essential concepts in linear algebra.
Data Science Education, Data Visualization, Linear Algebra, Linear Regression, Mathematics, Python
- Ensemble Methods Explained in Plain English: Bagging - May 10, 2021.
Understand the intuition behind bagging with examples in Python.
Algorithms, Bagging, Ensemble Methods, Python
Applying Python’s Explode Function to Pandas DataFrames - May 7, 2021.
Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().
Data Analysis, Pandas, Programming, Python
- A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know - May 6, 2021.
This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.
CatBoost, Ensemble Methods, Machine Learning, Python, random forests algorithm, scikit-learn, XGBoost
Rebuilding My 7 Python Projects - May 5, 2021.
This is how I rebuilt My Python Projects: Data Science, Web Development & Android Apps.
Data Science, Programming, Project, Python
- How To Generate Meaningful Sentences Using a T5 Transformer - May 3, 2021.
Read this article to see how to develop a text generation API using the T5 transformer.
API, Hugging Face, Natural Language Generation, NLP, Python, Transformer
- XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python - May 3, 2021.
Understand how XGBoost work with a simple 200 lines codes that implement gradient boosting for decision trees.
Algorithms, Machine Learning, Python, XGBoost
- Gradient Boosted Decision Trees – A Conceptual Explanation - Apr 30, 2021.
Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.
CatBoost, Decision Trees, Gradient Boosting, Machine Learning, Python, scikit-learn, XGBoost
- Feature Engineering of DateTime Variables for Data Science, Machine Learning - Apr 29, 2021.
Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models.
Data Science, Feature Engineering, Machine Learning, Python
- Multiple Time Series Forecasting with PyCaret - Apr 27, 2021.
A step-by-step tutorial to forecast multiple time series with PyCaret.
Forecasting, Machine Learning, PyCaret, Python, Time Series
- Production-Ready Machine Learning NLP API with FastAPI and spaCy - Apr 21, 2021.
Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.
API, FastAPI, NLP, Production, Python, spaCy
- Time Series Forecasting with PyCaret Regression Module - Apr 21, 2021.
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.
Machine Learning, PyCaret, Python, Regression, Time Series
- Top 10 Data Science Courses to Take in 2021 - Apr 20, 2021.
Whether you are getting started with Data Science / Machine Learning or are an experienced professional looking to learn something new, check out these top 10 data science courses for 2021.
Coursera, Data Science Education, Google Analytics, IBM, Online Education, Python, SQL, Stanford
- Data Analysis Using Tableau - Apr 20, 2021.
Read this overview of using Tableau for sale data analysis, and see how visualization can help tell the business story.
Business, Data Analysis, Ecommerce, Python, Sales, Tableau
- Essential Math for Data Science: Linear Transformation with Matrices - Apr 16, 2021.
You’ll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition.
Data Science, Linear Algebra, Mathematics, Python
The Most In-Demand Skills for Data Scientists in 2021 - Apr 15, 2021.
If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
AWS, Data Science Skills, Python, PyTorch, R, scikit-learn, SQL, TensorFlow
- Is Your Model Overtained? - Apr 14, 2021.
WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.
Learning, Modeling, Python, Training
- Automated Anomaly Detection Using PyCaret - Apr 13, 2021.
Learn to automate anomaly detection using the open source machine learning library PyCaret.
Anomaly Detection, Machine Learning, PyCaret, Python
- How to Apply Transformers to Any Length of Text - Apr 12, 2021.
Read on to find how to restore the power of NLP for long sequences.
BERT, NLP, Python, Text Analytics, Transformer
- E-commerce Data Analysis for Sales Strategy Using Python - Apr 7, 2021.
Check out this informative and concise case study applying data analysis using Python to a well-defined e-commerce scenario.
Business, Data Analysis, Ecommerce, Python, Sales
- KDnuggets™ News 21:n13, Apr 7: Top 10 Python Libraries Data Scientists should know in 2021; KDnuggets Top Blogs Reward Program; Making Machine Learning Models Understandable - Apr 7, 2021.
Top 10 Python Libraries Data Scientists should know in 2021; KDnuggets Top Blogs Reward Program; Shapash: Making Machine Learning Models Understandable; Easy AutoML in Python; The 8 Most Common Data Scientists; A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 1
Automated Machine Learning, AutoML, Data Science, Data Scientist, Explainable AI, Interpretability, Machine Learning, MLOps, Python
- Automated Text Classification with EvalML - Apr 6, 2021.
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
Automated Machine Learning, AutoML, NLP, Python, Text Analytics, Text Classification
- The Best Machine Learning Frameworks & Extensions for TensorFlow - Apr 5, 2021.
Check out this curated list of useful frameworks and extensions for TensorFlow.
Machine Learning, Python, TensorFlow
Shapash: Making Machine Learning Models Understandable - Apr 2, 2021.
Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.
Explainability, Machine Learning, Python, SHAP
- Easy AutoML in Python - Apr 1, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
- Software Engineering Best Practices for Data Scientists - Mar 30, 2021.
This is a crash course on how to bridge the gap between data science and software engineering.
Data Science, Data Scientist, Programming, Python, Software Engineering
- Data Science Curriculum for Professionals - Mar 25, 2021.
If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.
Cloud Computing, Data Science Education, Data Visualization, Machine Learning, Python, R, Roadmap, Statistics
- Extraction of Objects In Images and Videos Using 5 Lines of Code - Mar 25, 2021.
PixelLib is a library created for easy integration of image and video segmentation in real life applications. Learn to use PixelLib to extract objects In images and videos with minimal code.
Computer Vision, Image Processing, Object Detection, Python, Segmentation, Video

Top 10 Python Libraries Data Scientists should know in 2021 - Mar 24, 2021.
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
Data Science, Keras, numpy, Pandas, Python, scikit-learn, Seaborn, TensorFlow
- Rejection Sampling with Python - Mar 24, 2021.
Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.
Distribution, Probability, Python, Sampling, Statistics
The Best Machine Learning Frameworks & Extensions for Scikit-learn - Mar 22, 2021.
Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.
Machine Learning, Python, scikit-learn
- How to build a DAG Factory on Airflow - Mar 19, 2021.
A guide to building efficient DAGs with half of the code.
Data Engineering, Data Workflow, Graphs, Python, Workflow
- A Simple Way to Time Code in Python - Mar 18, 2021.
Read on to find out how to use a decorator to time your functions.
Optimization, Programming, Python
- How to Begin Your NLP Journey - Mar 17, 2021.
In this blog post, learn how to process text using Python.
NLP, Python, Text Analytics
- KDnuggets™ News 21:n11, Mar 17: Is Data Scientist still a satisfying job? How To Overcome The Fear of Math and Learn Math For Data Science - Mar 17, 2021.
Must Know for Data Scientists and Data Analysts: Causal Design Patterns; Know your data much faster with the new Sweetviz Python library; The Inferential Statistics Data Scientists Should Know; Natural Language Processing Pipelines, Explained
Career, Data Science, Data Scientist, Data Visualization, Mathematics, Python, Statistics, Survey
- Natural Language Processing Pipelines, Explained - Mar 16, 2021.
This article presents a beginner's view of NLP, as well as an explanation of how a typical NLP pipeline might look.
Explained, NLP, NLTK, Python, Text Analytics
- Kedro-Airflow: Orchestrating Kedro Pipelines with Airflow - Mar 12, 2021.
The Kedro team and Astronomer have released Kedro-Airflow 0.4.0 to help you develop modular, maintainable & reproducible code with orchestration superpowers!
Data Science, Interview, Pipeline, Python, Workflow
Know your data much faster with the new Sweetviz Python library - Mar 12, 2021.
One of the latest exploratory data analysis libraries is a new open-source Python library called Sweetviz, for just the purposes of finding out data types, missing information, distribution of values, correlations, etc. Find out more about the library and how to use it here.
Data Analysis, Data Exploration, Data Visualization, Python
- How to Speed Up Pandas with Modin - Mar 10, 2021.
The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
Data Science, Distributed Systems, Modin, Pandas, Python, Workflow
4 Machine Learning Concepts I Wish I Knew When I Built My First Model - Mar 9, 2021.
Diving into building your first machine learning model will be an adventure -- one in which you will learn many important lessons the hard way. However, by following these four tips, your first and subsequent models will be put on a path toward excellence.
Feature Selection, Gradio, Hyperparameter, Machine Learning, Metrics, Python
- Beautiful decision tree visualizations with dtreeviz - Mar 8, 2021.
Improve the old way of plotting the decision trees and never go back!
Algorithms, Data Visualization, Decision Trees, Python
- 11 Essential Code Blocks for Complete EDA (Exploratory Data Analysis) - Mar 5, 2021.
This article is a practical guide to exploring any data science project and gain valuable insights.
Data Analysis, Data Exploration, Data Visualization, Pandas, Python
- Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret - Mar 5, 2021.
PyCaret, a low code Python ML library, offers several ways to tune the hyper-parameters of a created model. In this post, I'd like to show how Ray Tune is integrated with PyCaret, and how easy it is to leverage its algorithms and distributed computing to achieve results superior to default random search method.
Bayesian, Hyperparameter, Machine Learning, Optimization, PyCaret, Python, scikit-learn
9 Skills You Need to Become a Data Engineer - Mar 4, 2021.
A data engineer is a fast-growing profession with amazing challenges and rewards. Which skills do you need to become a data engineer? In this post, we’ll take a look at both hard and soft skills.
AWS, Career Advice, Data Engineer, Data Engineering, Data Science Skills, Hadoop, NoSQL, Python, SQL
- 15 common mistakes data scientists make in Python (and how to fix them) - Mar 3, 2021.
Writing Python code that works for your data science project and performs the task you expect is one thing. Ensuring your code is readable by others (including your future self), reproducible, and efficient are entirely different challenges that can be addressed by minimizing common bad practices in your development.
Best Practices, Data Scientist, Jupyter, Mistakes, Programming, Python
- Getting Started with Distributed Machine Learning with PyTorch and Ray - Mar 3, 2021.
Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learning applications.
Distributed Systems, Machine Learning, Python, PyTorch
- Speech to Text with Wav2Vec 2.0 - Mar 2, 2021.
Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.
Hugging Face, NLP, Python, PyTorch, Transformer

Are You Still Using Pandas to Process Big Data in 2021? Here are two better options - Mar 1, 2021.
When its time to handle a lot of data -- so much that you are in the realm of Big Data -- what tools can you use to wrangle the data, especially in a notebook environment? Pandas doesn’t handle really Big Data very well, but two other libraries do. So, which one is better and faster?
Big Data, Dask, Data Preparation, Pandas, Python, Vaex
Data Science Learning Roadmap for 2021 - Feb 26, 2021.
Venturing into the world of Data Science is an exciting, interesting, and rewarding path to consider. There is a great deal to master, and this self-learning recommendation plan will guide you toward establishing a solid understanding of all that is foundational to data science as well as a solid portfolio to showcase your developed expertise.
Data Engineering, Data Preparation, Data Science, Data Science Education, Python, Roadmap, SQL
- Pandas Profiling: One-Line Magical Code for EDA - Feb 24, 2021.
EDA can be automated using a Python library called Pandas Profiling. Let’s explore Pandas profiling to do EDA in a very short time and with just a single line code.
Data Analysis, Data Exploration, Data Science, Pandas, Python
- KDnuggets™ News 21:n08, Feb 24: Powerful Exploratory Data Analysis in just two lines of code; Cartoon: Data Scientist vs Data Engineer - Feb 24, 2021.
Powerful Exploratory Data Analysis in just two lines of code; Cartoon: Data Scientist vs Data Engineer; Evaluating Deep Learning Models: The Confusion Matrix, Accuracy, Precision, and Recall; Feature Store as a Foundation for Machine Learning; Approaching (Almost) Any Machine Learning Problem
Cartoon, Data Analysis, Data Engineering, Data Science, Deep Learning, Machine Learning, Metrics, Python
Powerful Exploratory Data Analysis in just two lines of code - Feb 22, 2021.
EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!
Data Analysis, Data Exploration, Data Visualization, Python
- Multidimensional multi-sensor time-series data analysis framework - Feb 19, 2021.
This blog post provides an overview of the package “msda” useful for time-series sensor data analysis. A quick introduction about time-series data is also provided.
Data Analysis, Python, Sensors, Time Series
Approaching (Almost) Any Machine Learning Problem - Feb 18, 2021.
This freely-available book is a fantastic walkthrough of practical approaches to machine learning problems.
Deep Learning, Free ebook, Machine Learning, Python
- Distributed and Scalable Machine Learning [Webinar] - Feb 17, 2021.
Mike McCarty and Gil Forsyth work at the Capital One Center for Machine Learning, where they are building internal PyData libraries that scale with Dask and RAPIDS. For this webinar, Feb 23 @ 2 pm PST, 5pm EST, they’ll join Hugo Bowne-Anderson and Matthew Rocklin to discuss their journey to scale data science and machine learning in Python.
Capital One, Dask, Distributed, Machine Learning, Python, scikit-learn, XGBoost
- 10 resources for data science self-study - Feb 17, 2021.
Many resources exist for the self-study of data science. In our modern age of information technology, an enormous amount of free learning resources are available to anyone, and with effort and dedication, you can master the fundamentals of data science.
Data Science, Data Science Certificate, Data Science Education, Kaggle, MOOC, Python, Youtube
- Easy, Open-Source AutoML in Python with EvalML - Feb 16, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
- How to Speed up Scikit-Learn Model Training - Feb 11, 2021.
Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?
Distributed Systems, Hyperparameter, Machine Learning, Optimization, Parallelism, Python, scikit-learn, Training
- 7 Most Recommended Skills to Learn to be a Data Scientist - Feb 10, 2021.
The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.
Career Advice, Data Science Skills, Data Scientist, Data Visualization, Docker, Pandas, Python, SQL
- KDnuggets™ News 21:n06, Feb 10: The Best Data Science Project to Have in Your Portfolio; Deep learning doesn’t need to be a black box - Feb 10, 2021.
The Best Data Science Project to Have in Your Portfolio; Deep learning doesn’t need to be a black box; Build Your First Data Science Application; How to create stunning visualizations using python from scratch; How to Get Your First Job in Data Science without Any Work Experience
Career Advice, Data Science, Data Visualization, Deep Learning, Explainability, Portfolio, Python
- How to Deploy a Flask API in Kubernetes and Connect it with Other Micro-services - Feb 9, 2021.
A hands-on tutorial on how to implement your micro-service architecture using the powerful container orchestration tool Kubernetes.
API, Containers, Flask, Kubernetes, MySQL, Python, SQL
Essential Math for Data Science: Introduction to Matrices and the Matrix Product - Feb 5, 2021.
As vectors, matrices are data structures allowing you to organize numbers. They are square or rectangular arrays containing values organized in two dimensions: as rows and columns. You can think of them as a spreadsheet. Learn more here.
Data Science, Linear Algebra, Mathematics, numpy, Python
Build Your First Data Science Application - Feb 4, 2021.
Check out these seven Python libraries to make your first data science MVP application.
API, Data Science, Jupyter, Keras, numpy, Pandas, Plotly, Python, PyTorch, scikit-learn

How to create stunning visualizations using python from scratch - Feb 4, 2021.
Data science and data analytics can be beautiful things. Not only because of the insights and enhancements to decision-making they can provide, but because of the rich visualizations about the data that can be created. Following this step-by-step guide using the Matplotlib and Seaborn libraries will help you improve the presentation and effective communication of your work.
Data Visualization, Matplotlib, Python, Seaborn
Getting Started with 5 Essential Natural Language Processing Libraries - Feb 3, 2021.
This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.
Data Preparation, Data Preprocessing, Data Visualization, Hugging Face, NLP, Python, spaCy, Text Analytics, Transformer
- Working With The Lambda Layer in Keras - Jan 28, 2021.
In this tutorial we'll cover how to use the Lambda layer in Keras to build, save, and load models which perform custom operations on your data.
Architecture, Keras, Neural Networks, Python
- Mastering TensorFlow Variables in 5 Easy Steps - Jan 20, 2021.
Learn how to use TensorFlow Variables, their differences from plain Tensor objects, and when they are preferred over these Tensor objects | Deep Learning with TensorFlow 2.x.
Neural Networks, Python, TensorFlow
- Loglet Analysis: Revisiting COVID-19 Projections - Jan 20, 2021.
We will show that the decomposition of growth into S-shaped logistic components also known as Loglet analysis, is more accurate as it takes into account the evolution of multiple covid waves.
COVID-19, Data Analysis, Python
- Comprehensive Guide to the Normal Distribution - Jan 18, 2021.
Drop in for some tips on how this fundamental statistics concept can improve your data science.
Distribution, Normal Distribution, Python, SciPy, Statistics
- Snowflake and Saturn Cloud Partner To Bring 100x Faster Data Science to Millions of Python Users - Jan 15, 2021.
Snowflake the cloud data platform, is partnering, integrating products, and pursuing a joint go-to-market with Saturn Cloud to help data science teams get 100x faster results. Read more about developments and how to get started here.
Data Science, Python, Saturn Cloud, Snowflake
Cleaner Data Analysis with Pandas Using Pipes - Jan 15, 2021.
Check out this practical guide on Pandas pipes.
Data Analysis, Data Cleaning, Pandas, Pipeline, Python
- KDnuggets™ News 21:n02, Jan 13: Best Python IDEs and Code Editors; 10 Underappreciated Python Packages for Machine Learning Practitioners - Jan 13, 2021.
Best Python IDEs and Code Editors You Should Know; 10 Underappreciated Python Packages for Machine Learning Practitioners; Top 10 Computer Vision Papers 2020; CatalyzeX: A must-have browser extension for machine learning engineers and researchers
Computer Vision, Data Science, DevOps, IDE, Machine Learning, MLOps, Python, Research
- Creating Good Meaningful Plots: Some Principles - Jan 12, 2021.
Hera are some thought starters to help you create meaningful plots.
Charts, Data Visualization, Python, R
- 5 Tools for Effortless Data Science - Jan 11, 2021.
The sixth tool is coffee.
Data Science, Data Science Tools, Keras, Machine Learning, MLflow, PyCaret, Python
Best Python IDEs and Code Editors You Should Know - Jan 8, 2021.
Developing machine learning algorithms requires implementing countless libraries and integrating many supporting tools and software packages. All this magic must be written by you in yet another tool -- the IDE -- that is fundamental to all your code work and can drive your productivity. These top Python IDEs and code editors are among the best tools available for you to consider, and are reviewed with their noteworthy features.
IDE, Jupyter, PyCharm, Python, Visual Studio Code
10 Underappreciated Python Packages for Machine Learning Practitioners - Jan 7, 2021.
Here are 10 underappreciated Python packages covering neural architecture design, calibration, UI creation and dissemination.
Deployment, Neural Networks, Python, UI/UX
- KDnuggets™ News 21:n01, Jan 6: All machine learning algorithms you should know in 2021; Monte Carlo integration in Python; MuZero – the most important ML system ever created? - Jan 6, 2021.
The first issue in 2021 brings you a great blog about Monte Carlo Integration - in Python; An overview of main Machine Learning algorithms you need to know in 2021; SQL vs NoSQL: 7 Key Takeaways; Generating Beautiful Neural Network Visualizations - how to; MuZero - may be the most important Machine Learning system ever created; and much more!
Algorithms, Monte Carlo, MuZero, NoSQL, Python, SQL
15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
Automated Machine Learning, Data Science, Deep Learning, Free ebook, Machine Learning, NLP, Python, R, Statistics
Generating Beautiful Neural Network Visualizations - Dec 30, 2020.
If you are looking to easily generate visualizations of neural network architectures, PlotNeuralNet is a project you should check out.
Neural Networks, Python, Visualization
Monte Carlo integration in Python - Dec 24, 2020.
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?
Monte Carlo, Python, Simulation, Statistics
- Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance - Dec 21, 2020.
A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.
AI, Deployment, Explainable AI, Machine Learning, Modeling, Outliers, Production, Python
- Fast and Intuitive Statistical Modeling with Pomegranate - Dec 21, 2020.
Pomegranate is a delicious fruit. It can also be a super useful Python library for statistical analysis. We will show how in this article.
Distribution, Markov Chains, Probability, Python, Statistical Modeling
- How to use Machine Learning for Anomaly Detection and Conditional Monitoring - Dec 16, 2020.
This article explains the goals of anomaly detection and outlines the approaches used to solve specific use cases for anomaly detection and condition monitoring.
Anomaly Detection, Machine Learning, Python, scikit-learn, Unsupervised Learning
- KDnuggets™ News 20:n47, Dec 16: A Rising Library Beating Pandas in Performance; R or Python? Why Not Both? - Dec 16, 2020.
Also: 10 Python Skills They Don't Teach in Bootcamp; Data Science Volunteering: Ways to Help; A Journey from Software to Machine Learning Engineer; Data Science and Machine Learning: The Free eBook
Data Science, Free ebook, IDE, Machine Learning, Machine Learning Engineer, Pandas, Python, R
- Data Science and Machine Learning: The Free eBook - Dec 15, 2020.
Check out the newest addition to our free eBook collection, Data Science and Machine Learning: Mathematical and Statistical Methods, and start building your statistical learning foundation today.
Data Science, Free ebook, Machine Learning, Python
- How to Create Custom Real-time Plots in Deep Learning - Dec 14, 2020.
How to generate real-time visualizations of custom metrics while training a deep learning model using Keras callbacks.
Data Visualization, Deep Learning, Keras, Metrics, Neural Networks, Python
- Matrix Decomposition Decoded - Dec 11, 2020.
This article covers matrix decomposition, as well as the underlying concepts of eigenvalues (lambdas) and eigenvectors, as well as discusses the purpose behind using matrix and vectors in linear algebra.
Linear Algebra, Mathematics, numpy, PCA, Python
A Rising Library Beating Pandas in Performance - Dec 11, 2020.
This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.
Data Processing, Pandas, Performance, Python
- 10 Python Skills They Don’t Teach in Bootcamp - Dec 11, 2020.
Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.
Bootcamp, Programming, Python
- Implementing the AdaBoost Algorithm From Scratch - Dec 10, 2020.
AdaBoost technique follows a decision tree model with a depth equal to one. AdaBoost is nothing but the forest of stumps rather than trees. AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well. AdaBoost algorithm is developed to solve both classification and regression problem. Learn to build the algorithm from scratch here.
Adaboost, Algorithms, Ensemble Methods, Machine Learning, Python