- Parallelizing Python Code - Oct 4, 2021.
This article reviews some common options for parallelizing Python code, including process-based parallelism, specialized libraries, ipython parallel, and Ray.
Distributed Computing, Parallelism, Programming, Python, Ray
Teaching AI to Classify Time-series Patterns with Synthetic Data - Oct 1, 2021.
How to build and train an AI model to identify various common anomaly patterns in time-series data.
AI, Classification, Python, Synthetic Data, Time Series
- How to Auto-Detect the Date/Datetime Columns and Set Their Datatype When Reading a CSV File in Pandas - Oct 1, 2021.
When read_csv( ) reads e.g. “2021-03-04” and “2021-03-04 21:37:01.123” as mere “object” datatypes, often you can simply auto-convert them all at once to true datetime datatypes.
Data Processing, Pandas, Python
How To Build A Database Using Python - Sep 28, 2021.
Implement your database without handling the SQL using the Flask-SQLAlchemy library.
Databases, Flask, Python, SQL
- Building a Structured Financial Newsfeed Using Python, SpaCy and Streamlit - Sep 28, 2021.
Getting started with NLP by building a Named Entity Recognition(NER) application.
Finance, NLP, Python, spaCy, Streamlit

Path to Full Stack Data Science - Sep 27, 2021.
Start your journey toward mastering all aspects of the field of Data Science with this focused list of in-depth self-learning resources. Curated with the beginner in mind, these recommendations will help you learn efficiently, and can also offer existing professionals useful highlights for review or help filling in any gaps in skills.
Career Advice, Data Science, Data Science Education, Data Visualization, Mathematics, Python, R, Roadmap
- Zero to RAPIDS in Minutes with NVIDIA GPUs + Saturn Cloud - Sep 27, 2021.
Managing large-scale data science infrastructure presents significant challenges. With Saturn Cloud, managing GPU-based infrastructure is made easier, allowing practitioners and enterprises to focus on solving their business challenges.
GPU, NVIDIA, Python, Saturn Cloud
- How To Deal With Imbalanced Classification, Without Re-balancing the Data - Sep 23, 2021.
Before considering oversampling your skewed data, try adjusting your classification decision threshold, in Python.
Balancing Classes, Classification, Python, Unbalanced
- 9 Outstanding Reasons to Learn Python for Finance - Sep 23, 2021.
Is Python good for learning finance and working in the financial world? The answer is not only a resounding YES, but yes for nine very good reasons. This article gets into the details behind why Python is a must-know programming language for anyone who wants to work in the financial sector.
Finance, Python
- KDnuggets™ News 21:n36, Sep 22: The Machine & Deep Learning Compendium Open Book; Easy SQL in Native Python - Sep 22, 2021.
The Machine & Deep Learning Compendium Open Book; Easy SQL in Native Python; Introduction to Automated Machine Learning; How to be a Data Scientist without a STEM degree; What Is The Real Difference Between Data Engineers and Data Scientists?
Automated Machine Learning, AutoML, Books, Data Engineer, Data Scientist, Machine Learning, Python, SQL
- 15 Must-Know Python String Methods - Sep 21, 2021.
It is not always about numbers.
Data Processing, NLP, Python, Text Analytics
- If You Can Write Functions, You Can Use Dask - Sep 21, 2021.
This article is the second article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The first article in the series is about using LocalCluster.
Cloud, Dask, Python, Saturn Cloud
How to be a Data Scientist without a STEM degree - Sep 20, 2021.
Breaking into data science as a professional does require technical skills, a well-honed knack for problem-solving, and a willingness to swim in oceans of data. Maybe you are coming in as a career change or ready to take a new learning path in life--without having previously earned an advanced degree in a STEM field. Follow these tips to find your way into this high-demand and interesting field.
Career Advice, Data Science Education, Data Scientist, Project, Python, SQL
- Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV - Sep 16, 2021.
This article documents the authors' experience building their custom MLOps approach.
GitHub, Machine Learning, MLOps, Pipeline, Python, Workflow
- Introduction to Automated Machine Learning - Sep 15, 2021.
AutoML enables developers with limited ML expertise (and coding experience) to train high-quality models specific to their business needs. For this article, we will focus on AutoML systems which cater to everyday business and technology applications.
Automated Machine Learning, AutoML, Machine Learning, Python
- How to get Python PCAP Certification: Roadmap, Resources, Tips For Success, Based On My Experience - Sep 15, 2021.
Follow this journey of personal experience -- with useful tips and learning resources -- to help you achieve the PCAP Certification, one of the most reputed Python Certifications, to validate your knowledge against International Standards.
Advice, Certification, Python, Tips
- 5 Must Try Awesome Python Data Visualization Libraries - Sep 15, 2021.
The goal of data visualization is to communicate data or information clearly and effectively to readers. Here are 5 must try awesome Python libraries for helping you do so, with overviews and links to quick start guides for each.
Data Visualization, Matplotlib, Plotly, Python, Seaborn
- KDnuggets™ News 21:n35, Sep 15: A Data Science Portfolio That Will Land You The Job; Top 18 Low-Code and No-Code Machine Learning Platforms - Sep 15, 2021.
Here is a Data Science Portfolio that will land you the job; Review the top 18 Low-Code and No-Code Machine Learning platforms; Try these 8 Deep Learning Project Ideas for Beginners; Very useful - working with Python APIs for data science project.
API, Deep Learning, Low-Code, No-Code, Portfolio, Project, Python
- An Introduction to Reinforcement Learning with OpenAI Gym, RLlib, and Google Colab - Sep 14, 2021.
Get an Introduction to Reinforcement Learning by attempting to balance a virtual CartPole with OpenAI Gym, RLlib, and Google Colab.
Google Colab, OpenAI, Python, Reinforcement Learning
- The Prefect Way to Automate & Orchestrate Data Pipelines - Sep 13, 2021.
I am migrating all my ETL work from Airflow to this super-cool framework.
Airflow, Data Workflow, Pipeline, Prefect, Python
- Working with Python APIs For Data Science Project - Sep 10, 2021.
In this article, we will work with YouTube Python API to collect video statistics from our channel using the requests python library to make an API call and save it as a Pandas DataFrame.
API, Data Science, Project, Python
- How to Create an AutoML Pipeline Optimization Sandbox - Sep 9, 2021.
In this article, we will implement an automated machine learning pipeline optimization sandbox web app using Streamlit and TPOT.
Automated Machine Learning, AutoML, Python, Streamlit
- KDnuggets™ News 21:n34, Sep 8: Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained - Sep 8, 2021.
Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained; Data Science Cheat Sheet 2.0; 6 Cool Python Libraries That I Came Across Recently; Best Resources to Learn Natural Language Processing in 2021
AI, Cheat Sheet, Data Science, Excel, Hypothesis Testing, Machine Learning, Python, Statistics

How to Create Stunning Web Apps for your Data Science Projects - Sep 7, 2021.
Data scientists do not have to learn HTML, CSS, and JavaScript to build web pages.
Apps, Data Science, Python, Streamlit
- Fast AutoML with FLAML + Ray Tune - Sep 6, 2021.
Microsoft Researchers have developed FLAML (Fast Lightweight AutoML) which can now utilize Ray Tune for distributed hyperparameter tuning to scale up FLAML’s resource-efficient & easily parallelizable algorithms across a cluster.
Automated Machine Learning, AutoML, Hyperparameter, Machine Learning, Microsoft, Python, Ray
- 6 Cool Python Libraries That I Came Across Recently - Sep 3, 2021.
Check out these awesome Python libraries for Machine Learning.
Data Science, Machine Learning, Python

Do You Read Excel Files with Python? There is a 1000x Faster Way - Sep 1, 2021.
In this article, I’ll show you five ways to load data in Python. Achieving a speedup of 3 orders of magnitude.
Excel, Microsoft, Pandas, Python, Scalability
- KDnuggets™ News 21:n33, Sep 1: Top Industries Hiring Data Scientists; The Most Important Tool for Data Engineers - Sep 1, 2021.
The top industries hiring Data Scientists; The most important tool for data engineers (hint - it is not technical); How to Engineer Date Features in Python; 15 Python Snippets to Optimize your Data Science Pipeline
Data Engineer, Data Science, Hiring, Industry, Pipeline, Python
- NLP Insights for the Penguin Café Orchestra - Aug 31, 2021.
We give an example of how to use Expert.ai and Python to investigate favorite music albums.
Expert.ai, Music, NLP, Python
- CSV Files for Storage? No Thanks. There’s a Better Option - Aug 31, 2021.
Saving data to CSV’s is costing you both money and disk space. It’s time to end it.
Data Management, Pandas, Parquet, Python
- A Python Data Processing Script Template - Aug 31, 2021.
Here's a skeleton general purpose template for getting a Python command line script fleshed out as quickly as possible.
Programming, Python
- Introducing Packed BERT for 2x Training Speed-up in Natural Language Processing - Aug 30, 2021.
Check out this new BERT packing algorithm for more efficient training.
BERT, NLP, Python, Training
- How causal inference lifts augmented analytics beyond flatland - Aug 27, 2021.
In our quest to better understand and predict business outcomes, traditional predictive modeling tends to fall flat. However, causal inference techniques along with business analytics approaches can unravel what truly changes your KPIs.
Analytics, Causality, Data Science, Python, Regression
- 15 Python Snippets to Optimize your Data Science Pipeline - Aug 25, 2021.
Quick Python solutions to help your data science cycle.
Data Science, Optimization, Pipeline, Python
- KDnuggets™ News 21:n32, Aug 25: Open Source Datasets for Computer Vision; Django’s 9 Most Common Applications - Aug 25, 2021.
Open Source Datasets for Computer Vision; Django’s 9 Most Common Applications; How to Select an Initial Model for your Data Science Problem; Automate Microsoft Excel and Word Using Python; Stack Overflow Survey Data Science Highlights
Computer Vision, Datasets, Django, Microsoft, Modeling, Open Source, Python, StackOverflow
Learning Data Science and Machine Learning: First Steps After The Roadmap - Aug 24, 2021.
Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.
Data Science, Machine Learning, Mathematics, Python, Roadmap, Statistics
Django’s 9 Most Common Applications - Aug 23, 2021.
Django is a Python web application framework enjoying widespread adoption in the data science community. But what else can you use Django for? Read this article for 9 use cases where you can put Django to work.
Django, Programming, Python
- 5 Things That Make My Job as a Data Scientist Easier - Aug 23, 2021.
After working as a Data Scientist for a year, I am here to share some things I learnt along the way that I feel are helpful and have increased my efficiency. Hopefully some of these tips can help you in your journey :)
Data Science, Data Scientist, Metrics, Pandas, Plotly, Python, Time Series, Visualization
- Data Scientist’s Guide to Efficient Coding in Python - Aug 18, 2021.
Read this fantastic collection of tips and tricks the author uses for writing clean code on a day-to-day basis.
Programming, Python, Tips
- Linear Algebra for Natural Language Processing - Aug 17, 2021.
Learn about representing word semantics in vector space.
Linear Algebra, Mathematics, NLP, Python
Prefect: How to Write and Schedule Your First ETL Pipeline with Python - Aug 16, 2021.
Workflow management systems made easy — both locally and in the cloud.
Cloud, ETL, Pipeline, Python
- Writing Your First Distributed Python Application with Ray - Aug 16, 2021.
Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.
Distributed Computing, Parallelism, Python, Workflow
- How to Train a BERT Model From Scratch - Aug 13, 2021.
Meet BERT’s Italian cousin, FiliBERTo.
BERT, Hugging Face, NLP, Python, Training
How to Query Your Pandas Dataframe - Aug 9, 2021.
A Data Scientist’s perspective on SQL-like Python functions.
Data Preprocessing, Data Processing, Pandas, Python, SQL
GPU-Powered Data Science (NOT Deep Learning) with RAPIDS - Aug 2, 2021.
How to utilize the power of your GPU for regular data science and machine learning even if you do not do a lot of deep learning work.
Data Science, GPU, Python
- KDnuggets™ News 21:n28, Jul 28: Design patterns in machine learning; The Best NLP Course is Free - Jul 28, 2021.
What are the Design patterns for Machine Learning and why you should know them? For more advanced readers, how to use Kafka Connect to create an open source data pipeline for processing real-time data; The state-of-the-art NLP course is freely available; Python Data Structures Compared; Update your Machine Learning skills this summer.
Kafka, Machine Learning, NLP, Python
- Python Data Structures Compared - Jul 27, 2021.
Let's take a look at 5 different Python data structures and see how they could be used to store data we might be processing in our everyday tasks, as well as the relative memory they use for storage and time they take to create and access.
Data Science, Programming, Python
Why and how should you learn “Productive Data Science”? - Jul 26, 2021.
What is Productive Data Science and what are some of its components?
Books, Career Advice, Courses, Data Science, Python
- Top Python Data Science Interview Questions - Jul 23, 2021.
Six must-know technical concepts and two types of questions to test them.
Data Science, Interview Questions, Programming, Python
- Overview of Albumentations: Open-source library for advanced image augmentations - Jul 22, 2021.
With code snippets on augmentations and integrations with PyTorch and Tensorflow pipelines.
Image Processing, Open Source, Python, PyTorch, TensorFlow
- ColabCode: Deploying Machine Learning Models From Google Colab - Jul 22, 2021.
New to ColabCode? Learn how to use it to start a VS Code Server, Jupyter Lab, or FastAPI.
Deployment, FastAPI, Google Colab, Machine Learning, Python
- Understanding BERT with Hugging Face - Jul 20, 2021.
We don’t really understand something before we implement it ourselves. So in this post, we will implement a Question Answering Neural Network using BERT and a Hugging Face Library.
BERT, Hugging Face, NLP, Python
- How Much Memory is your Machine Learning Code Consuming? - Jul 19, 2021.
Learn how to quickly check the memory footprint of your machine learning function/module with one line of command. Generate a nice report too.
Machine Learning, Programming, Python

Top 6 Data Science Online Courses in 2021 - Jul 15, 2021.
As an aspiring data scientist, it is easy to get overwhelmed by the abundance of resources available on the Internet. With these 6 online courses, you can develop yourself from a novice to experienced in less than a year, and prepare you with the skills necessary to land a job in data science.
Data Science Education, Online Education, Programming, Python, SQL
- Date Processing and Feature Engineering in Python - Jul 15, 2021.
Have a look at some code to streamline the parsing and processing of dates in Python, including the engineering of some useful and common features.
Beginners, Data Preprocessing, Data Processing, Feature Engineering, Python, Time Series
- KDnuggets™ News 21:n26, Jul 14: Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python; 5 Python Data Processing Tips - Jul 14, 2021.
If Pandas not enough, here are a few good alternatives to processing larger and faster data in Python; 5 Python Data Processing Tips and Code Snippets; Relax! Data Scientists will not go extinct in 10 years, but the role will change; How to Get Practical Data Science Experience to be Career-Ready.
Pandas, Python, Trends
- How to Tell if You Have Trained Your Model with Enough Data - Jul 12, 2021.
WeightWatcher is an open-source, diagnostic tool for evaluating the performance of (pre)-trained and fine-tuned Deep Neural Networks. It is based on state-of-the-art research into Why Deep Learning Works.
Learning, Neural Networks, Python, Training
5 Python Data Processing Tips & Code Snippets - Jul 9, 2021.
This is a small collection of Python code snippets that a beginner might find useful for data processing.
Data Preprocessing, Data Processing, Pandas, Programming, Python

Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python - Jul 8, 2021.
While the Pandas library remains a crucial workhorse in data processing and management for data science, some limitations exist that can impact efficiencies, especially with very large data sets. Here, a few interesting alternatives to Pandas are introduced to improve your large data handling performance.
Dask, Modin, Pandas, Python, Scalability
- How to Build An Image Classifier in Few Lines of Code with Flash - Jul 7, 2021.
Introducing Flash: The high-level deep learning framework for beginners.
Deep Learning, Image Classification, Image Recognition, Neural Networks, Python
- KDnuggets™ News 21:n25, Jul 7: Data Scientists and ML Engineers Are Luxury Employees; 5 Lessons from McKinsey That Will Make You a Better Data Scientist - Jul 7, 2021.
Are Data Scientists and ML Engineers Luxury Employees? 5 Lessons McKinsey Taught Me That Will Make You a Better Data Scientist; Managing Your Reusable Python Code as a Data Scientist; GitHub Copilot: Your AI pair programmer - what is all the fuss about? and more.
Career Advice, Data Science Skills, Data Scientist, Machine Learning Engineer, Python
- ROC Curve Explained - Jul 6, 2021.
Learn to visualise a ROC curve in Python.
Data Visualization, Metrics, Python, ROC-AUC
- Predict Customer Churn (the right way) using PyCaret - Jul 5, 2021.
A step-by-step guide on how to predict customer churn the right way using PyCaret that actually optimizes the business objective and improves ROI.
Churn, Machine Learning, PyCaret, Python
- From Scratch: Permutation Feature Importance for ML Interpretability - Jun 30, 2021.
Use permutation feature importance to discover which features in your dataset are useful for prediction — implemented from scratch in Python.
Feature Selection, Interpretability, Machine Learning, Python
- KDnuggets™ News 21:n24, Jun 30: What will the demand for Data Scientists be in 10 years?; Add A New Dimension To Your Photos Using Python - Jun 30, 2021.
What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?; Add A New Dimension To Your Photos Using Python; Data Scientists are from Mars and Software Developers are from Venus; How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3; In-Warehouse Machine Learning and the Modern Data Science Stack
Data Science, Data Scientist, Data Warehouse, Image Processing, Machine Learning, NLP, Python, Software Developer
Add A New Dimension To Your Photos Using Python - Jun 28, 2021.
Read this to learn how to breathe new life into your photos with a 3D Ken Burns Effect.
Google Colab, Image Generation, Image Processing, Python
- How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3 - Jun 28, 2021.
A step-by-step guide on how to train a relation extraction classifier using Transformer and spaCy3.
BERT, NLP, Python, spaCy, Text Analytics, Transformer
- Applied Language Technology: A No-Nonsense Approach - Jun 25, 2021.
Here is a free entry-level applied natural language processing course that can fit into any beginner's roadmap to understanding NLP. Check it out.
NLP, Python, Text Analytics
- How to create an interactive 3D chart and share it easily with anyone - Jun 25, 2021.
This is a short tutorial on a great Plotly feature.
Data Visualization, Graph, Python
- 10 Python Code Snippets We Should All Know - Jun 24, 2021.
Check out these Python code snippets and start using them to solve everyday problems.
Programming, Python
- Workflow Orchestration with Prefect and Coiled - Jun 23, 2021.
Coiled helps data scientists use Python for ambitious problems, scaling to the cloud for computing power, ease, and speed—all tuned for the needs of teams and enterprises. In this demo example, see how to spin up a Coiled cluster to execute Prefect jobs during runtime.
Coiled.io, Modern Data Stack, Orchestration, Prefect, Python, Workflow
- Create and Deploy Dashboards using Voila and Saturn Cloud - Jun 23, 2021.
Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.
Analytics, Cloud, Dashboard, Data Science, Machine Learning, Python
- Fine-Tuning Transformer Model for Invoice Recognition - Jun 23, 2021.
The author presents a step-by-step guide from annotation to training.
Business Analytics, Image Classification, NLP, Python, Transformer
- KDnuggets™ News 21:n23, Jun 23: Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months - Jun 23, 2021.
Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months; A Graph-based Text Similarity Method with Named Entity Information in NLP; The Best Way to Learn Practical NLP?; An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)
Analytics, Career Advice, Data Scientist, Explainability, NLP, Pandas, Python, SQL
- How to troubleshoot memory problems in Python - Jun 21, 2021.
Memory problems are hard to diagnose and fix in Python. This post goes through a step-by-step process for how to pinpoint and fix memory leaks using popular open source python packages.
Programming, Python
- Dashboards for Interpreting & Comparing Machine Learning Models - Jun 17, 2021.
This article discusses using Interpret to create dashboards for machine learning models.
Interpretability, Machine Learning, Modeling, Python
- KDnuggets™ News 21:n22, Jun 16: Data Scientists Extinct in 10 Years? Generate Automated PDF Documents with Python - Jun 16, 2021.
Data Scientists be extinct in 10 years? How to generate PDF Documents with Python; Top 10 Data Science Projects for Beginners; Five types of thinking for a high performing data scientist; and how to get interactive plots directly with Pandas.
Career Advice, Data Scientist, PDF, Project, Python, Trends
Get Interactive Plots Directly With Pandas - Jun 14, 2021.
Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.
Bokeh, Data Visualization, Pandas, Plotly, Python
- Building a Knowledge Graph for Job Search Using BERT - Jun 14, 2021.
A guide on how to create knowledge graphs using NER and Relation Extraction.
BERT, Careers, Data Science Skills, Knowledge Graph, NLP, Python, Search, Transformer

How to Generate Automated PDF Documents with Python - Jun 10, 2021.
Discover how to leverage automation to create dazzling PDF documents effortlessly.
Data Visualization, PDF, Programming, Python
- KDnuggets™ News 21:n21, Jun 9: 5 Tasks To Automate With Python; How I Doubled My Income with Data Science and Machine Learning - Jun 9, 2021.
5 Tasks To Automate With Python; How I Doubled My Income with Data Science and Machine Learning; Will There Be a Shortage of Data Science Jobs in the Next 5 Years?; How to Make Python Code Run Incredibly Fast; Stop (and Start) Hiring Data Scientists
Automation, Career Advice, Data Science, Data Scientist, Deployment, Machine Learning, Modeling, Programming, Python
- The only Jupyter Notebooks extension you truly need - Jun 8, 2021.
Now you don’t need to restart the kernel after editing the code in your custom imports.
Deployment, Jupyter, Machine Learning, Python
- How to Fine-Tune BERT Transformer with spaCy 3 - Jun 7, 2021.
A step-by-step guide on how to create a knowledge graph using NER and Relation Extraction.
BERT, Knowledge Graph, NLP, Python, spaCy, Transformer
- PyCaret 101: An introduction for beginners - Jun 7, 2021.
This article is a great overview of how to get started with PyCaret for all your machine learning projects.
Machine Learning, PyCaret, Python
- Machine Learning Model Interpretation - Jun 2, 2021.
Read this overview of using Skater to build machine learning visualizations.
Explainability, Interpretability, Machine Learning, Python
- How to Create and Deploy a Simple Sentiment Analysis App via API - Jun 1, 2021.
In this article we will create a simple sentiment analysis app using the HuggingFace Transformers library, and deploy it using FastAPI.
FastAPI, Hugging Face, NLP, Python, Sentiment Analysis, Transformer
- Make Pandas 3 Times Faster with PyPolars - May 31, 2021.
Learn how to speed up your Pandas workflow using the PyPolars library.
Pandas, Performance, Python
- Supercharge Your Machine Learning Experiments with PyCaret and Gradio - May 31, 2021.
A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.
Deployment, Machine Learning, Pipeline, PyCaret, Python
- Topic Modeling with Streamlit - May 26, 2021.
What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.
Deployment, NLP, Python, spaCy, Streamlit, Text Analytics, Topic Modeling
- Write and train your own custom machine learning models using PyCaret - May 25, 2021.
A step-by-step, beginner-friendly tutorial on how to write and train custom machine learning models in PyCaret.
Machine Learning, Modeling, PyCaret, Python, Training
- Building RESTful APIs using Flask - May 21, 2021.
Learn about using the lightweight web framework in Python from this article.
API, Flask, Python, RESTful API
How to Determine if Your Machine Learning Model is Overtrained - May 20, 2021.
WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.
Learning, Modeling, Python, Training
- Differentiable Programming from Scratch - May 19, 2021.
In this article, we are going to explain what Differentiable Programming is by developing from scratch all the tools needed for this exciting new kind of programming.
Mathematics, Programming, Python
- KDnuggets™ News 21:n19, May 19: Vaex: Pandas but 1000x faster; The Most In Demand Skills for Data Engineers in 2021 - May 19, 2021.
Vaex: Pandas but 1000x faster; Best Python Books for Beginners and Advanced Programmers; The Most In Demand Skills for Data Engineers in 2021; The next-generation of AutoML frameworks; and more.
Data Engineer, Python, Vaex
- Animated Bar Chart Races in Python - May 18, 2021.
A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.
COVID-19, Data Science, Data Visualization, Pandas, Python, Visualization
- The Most In Demand Skills for Data Engineers in 2021 - May 18, 2021.
If you are preparing to make a career in data or are looking for opportunities to skill-up in your current data-centric role, then this analysis of in-demand skills for 2021, based on over 17,000 Data Engineer job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
Apache Spark, AWS, Data Engineer, Data Science Skills, Data Scientist, Python, Skills, SQL
- Easy MLOps with PyCaret + MLflow - May 18, 2021.
A beginner-friendly, step-by-step tutorial on integrating MLOps in your Machine Learning experiments using PyCaret.
Machine Learning, MLflow, MLOps, PyCaret, Python
- Best Python Books for Beginners and Advanced Programmers - May 14, 2021.
Let's take a look at nine of the best Python books for both beginners and advanced programmers, covering topics such as data science, machine learning, deep learning, NLP, and more.
Analytics, Books, Data Science, Deep Learning, Machine Learning, Python
- Super Charge Python with Pandas on GPUs Using Saturn Cloud - May 12, 2021.
Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.
Cloud, GPU, Pandas, Python
- KDnuggets™ News 21:n18, May 12: Data Preparation in SQL, with Cheat Sheet!; Rebuilding 7 Python Projects - May 12, 2021.
Data Preparation in SQL, with Cheat Sheet!; Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Essential Linear Algebra for Data Science and Machine Learning; Similarity Metrics in NLP
Cheat Sheet, Data Preparation, Data Science, Linear Algebra, Machine Learning, Metrics, NLP, Pandas, Project, Python, SQL

Essential Linear Algebra for Data Science and Machine Learning - May 10, 2021.
Linear algebra is foundational in data science and machine learning. Beginners starting out along their learning journey in data science--as well as established practitioners--must develop a strong familiarity with the essential concepts in linear algebra.
Data Science Education, Data Visualization, Linear Algebra, Linear Regression, Mathematics, Python
- Ensemble Methods Explained in Plain English: Bagging - May 10, 2021.
Understand the intuition behind bagging with examples in Python.
Algorithms, Bagging, Ensemble Methods, Python
Applying Python’s Explode Function to Pandas DataFrames - May 7, 2021.
Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().
Data Analysis, Pandas, Programming, Python
- A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know - May 6, 2021.
This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.
CatBoost, Ensemble Methods, Machine Learning, Python, random forests algorithm, scikit-learn, XGBoost
Rebuilding My 7 Python Projects - May 5, 2021.
This is how I rebuilt My Python Projects: Data Science, Web Development & Android Apps.
Data Science, Programming, Project, Python
- How To Generate Meaningful Sentences Using a T5 Transformer - May 3, 2021.
Read this article to see how to develop a text generation API using the T5 transformer.
API, Hugging Face, Natural Language Generation, NLP, Python, Transformer
- XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python - May 3, 2021.
Understand how XGBoost work with a simple 200 lines codes that implement gradient boosting for decision trees.
Algorithms, Machine Learning, Python, XGBoost
- Gradient Boosted Decision Trees – A Conceptual Explanation - Apr 30, 2021.
Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.
CatBoost, Decision Trees, Gradient Boosting, Machine Learning, Python, scikit-learn, XGBoost
- Feature Engineering of DateTime Variables for Data Science, Machine Learning - Apr 29, 2021.
Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models.
Data Science, Feature Engineering, Machine Learning, Python
- Multiple Time Series Forecasting with PyCaret - Apr 27, 2021.
A step-by-step tutorial to forecast multiple time series with PyCaret.
Forecasting, Machine Learning, PyCaret, Python, Time Series
- Production-Ready Machine Learning NLP API with FastAPI and spaCy - Apr 21, 2021.
Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.
API, FastAPI, NLP, Production, Python, spaCy
- Time Series Forecasting with PyCaret Regression Module - Apr 21, 2021.
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.
Machine Learning, PyCaret, Python, Regression, Time Series
- Top 10 Data Science Courses to Take in 2021 - Apr 20, 2021.
Whether you are getting started with Data Science / Machine Learning or are an experienced professional looking to learn something new, check out these top 10 data science courses for 2021.
Coursera, Data Science Education, Google Analytics, IBM, Online Education, Python, SQL, Stanford
- Data Analysis Using Tableau - Apr 20, 2021.
Read this overview of using Tableau for sale data analysis, and see how visualization can help tell the business story.
Business, Data Analysis, Ecommerce, Python, Sales, Tableau
- Essential Math for Data Science: Linear Transformation with Matrices - Apr 16, 2021.
You’ll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition.
Data Science, Linear Algebra, Mathematics, Python
The Most In-Demand Skills for Data Scientists in 2021 - Apr 15, 2021.
If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
AWS, Data Science Skills, Python, PyTorch, R, scikit-learn, SQL, TensorFlow
- Is Your Model Overtained? - Apr 14, 2021.
WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.
Learning, Modeling, Python, Training
- Automated Anomaly Detection Using PyCaret - Apr 13, 2021.
Learn to automate anomaly detection using the open source machine learning library PyCaret.
Anomaly Detection, Machine Learning, PyCaret, Python
- How to Apply Transformers to Any Length of Text - Apr 12, 2021.
Read on to find how to restore the power of NLP for long sequences.
BERT, NLP, Python, Text Analytics, Transformer
- E-commerce Data Analysis for Sales Strategy Using Python - Apr 7, 2021.
Check out this informative and concise case study applying data analysis using Python to a well-defined e-commerce scenario.
Business, Data Analysis, Ecommerce, Python, Sales
- KDnuggets™ News 21:n13, Apr 7: Top 10 Python Libraries Data Scientists should know in 2021; KDnuggets Top Blogs Reward Program; Making Machine Learning Models Understandable - Apr 7, 2021.
Top 10 Python Libraries Data Scientists should know in 2021; KDnuggets Top Blogs Reward Program; Shapash: Making Machine Learning Models Understandable; Easy AutoML in Python; The 8 Most Common Data Scientists; A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 1
Automated Machine Learning, AutoML, Data Science, Data Scientist, Explainable AI, Interpretability, Machine Learning, MLOps, Python
- Automated Text Classification with EvalML - Apr 6, 2021.
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
Automated Machine Learning, AutoML, NLP, Python, Text Analytics, Text Classification
- The Best Machine Learning Frameworks & Extensions for TensorFlow - Apr 5, 2021.
Check out this curated list of useful frameworks and extensions for TensorFlow.
Machine Learning, Python, TensorFlow
Shapash: Making Machine Learning Models Understandable - Apr 2, 2021.
Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.
Explainability, Machine Learning, Python, SHAP
- Easy AutoML in Python - Apr 1, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
- Software Engineering Best Practices for Data Scientists - Mar 30, 2021.
This is a crash course on how to bridge the gap between data science and software engineering.
Data Science, Data Scientist, Programming, Python, Software Engineering
- Data Science Curriculum for Professionals - Mar 25, 2021.
If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.
Cloud Computing, Data Science Education, Data Visualization, Machine Learning, Python, R, Roadmap, Statistics
- Extraction of Objects In Images and Videos Using 5 Lines of Code - Mar 25, 2021.
PixelLib is a library created for easy integration of image and video segmentation in real life applications. Learn to use PixelLib to extract objects In images and videos with minimal code.
Computer Vision, Image Processing, Object Detection, Python, Segmentation, Video

Top 10 Python Libraries Data Scientists should know in 2021 - Mar 24, 2021.
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
Data Science, Keras, numpy, Pandas, Python, scikit-learn, Seaborn, TensorFlow
- Rejection Sampling with Python - Mar 24, 2021.
Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.
Distribution, Probability, Python, Sampling, Statistics