- Pandas Profiling: One-Line Magical Code for EDA - Feb 24, 2021.
EDA can be automated using a Python library called Pandas Profiling. Let’s explore Pandas profiling to do EDA in a very short time and with just a single line code.
Tags: Data Analysis, Data Exploration, Data Science, Pandas, Python
- KDnuggets™ News 21:n08, Feb 24: Powerful Exploratory Data Analysis in just two lines of code; Cartoon: Data Scientist vs Data Engineer - Feb 24, 2021.
Powerful Exploratory Data Analysis in just two lines of code; Cartoon: Data Scientist vs Data Engineer; Evaluating Deep Learning Models: The Confusion Matrix, Accuracy, Precision, and Recall; Feature Store as a Foundation for Machine Learning; Approaching (Almost) Any Machine Learning Problem
Tags: Cartoon, Data Analysis, Data Engineering, Data Science, Deep Learning, Machine Learning, Metrics, Python
- Powerful Exploratory Data Analysis in just two lines of code - Feb 22, 2021.
EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!
Tags: Data Analysis, Data Exploration, Data Visualization, Python
- Multidimensional multi-sensor time-series data analysis framework - Feb 19, 2021.
This blog post provides an overview of the package “msda” useful for time-series sensor data analysis. A quick introduction about time-series data is also provided.
Tags: Data Analysis, Python, Sensors, Time Series
- Approaching (Almost) Any Machine Learning Problem - Feb 18, 2021.
This freely-available book is a fantastic walkthrough of practical approaches to machine learning problems.
Tags: Deep Learning, Free ebook, Machine Learning, Python
- Distributed and Scalable Machine Learning [Webinar] - Feb 17, 2021.
Mike McCarty and Gil Forsyth work at the Capital One Center for Machine Learning, where they are building internal PyData libraries that scale with Dask and RAPIDS. For this webinar, Feb 23 @ 2 pm PST, 5pm EST, they’ll join Hugo Bowne-Anderson and Matthew Rocklin to discuss their journey to scale data science and machine learning in Python.
Tags: Capital One, Dask, Distributed, Machine Learning, Python, scikit-learn, XGBoost
- 10 resources for data science self-study - Feb 17, 2021.
Many resources exist for the self-study of data science. In our modern age of information technology, an enormous amount of free learning resources are available to anyone, and with effort and dedication, you can master the fundamentals of data science.
Tags: Data Science, Data Science Certificate, Data Science Education, Kaggle, MOOC, Python, Youtube
- Easy, Open-Source AutoML in Python with EvalML - Feb 16, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
Tags: Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
- How to Speed up Scikit-Learn Model Training - Feb 11, 2021.
Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?
Tags: Distributed Systems, Hyperparameter, Machine Learning, Optimization, Parallelism, Python, scikit-learn, Training
- 7 Most Recommended Skills to Learn to be a Data Scientist - Feb 10, 2021.
The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.
Tags: Career Advice, Data Science Skills, Data Scientist, Data Visualization, Docker, Pandas, Python, SQL
- KDnuggets™ News 21:n06, Feb 10: The Best Data Science Project to Have in Your Portfolio; Deep learning doesn’t need to be a black box - Feb 10, 2021.
The Best Data Science Project to Have in Your Portfolio; Deep learning doesn’t need to be a black box; Build Your First Data Science Application; How to create stunning visualizations using python from scratch; How to Get Your First Job in Data Science without Any Work Experience
Tags: Career Advice, Data Science, Data Visualization, Deep Learning, Explainability, Portfolio, Python
- How to Deploy a Flask API in Kubernetes and Connect it with Other Micro-services - Feb 9, 2021.
A hands-on tutorial on how to implement your micro-service architecture using the powerful container orchestration tool Kubernetes.
Tags: API, Containers, Flask, Kubernetes, MySQL, Python, SQL
- Essential Math for Data Science: Introduction to Matrices and the Matrix Product - Feb 5, 2021.
As vectors, matrices are data structures allowing you to organize numbers. They are square or rectangular arrays containing values organized in two dimensions: as rows and columns. You can think of them as a spreadsheet. Learn more here.
Tags: Data Science, Linear Algebra, Mathematics, numpy, Python
- Build Your First Data Science Application - Feb 4, 2021.
Check out these seven Python libraries to make your first data science MVP application.
Tags: API, Data Science, Jupyter, Keras, numpy, Pandas, Plotly, Python, PyTorch, scikit-learn
How to create stunning visualizations using python from scratch - Feb 4, 2021.
Data science and data analytics can be beautiful things. Not only because of the insights and enhancements to decision-making they can provide, but because of the rich visualizations about the data that can be created. Following this step-by-step guide using the Matplotlib and Seaborn libraries will help you improve the presentation and effective communication of your work.
Tags: Data Visualization, Matplotlib, Python, Seaborn
- Getting Started with 5 Essential Natural Language Processing Libraries - Feb 3, 2021.
This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.
Tags: Data Preparation, Data Preprocessing, Data Visualization, Hugging Face, NLP, Python, spaCy, Text Analytics, Transformer
- Working With The Lambda Layer in Keras - Jan 28, 2021.
In this tutorial we'll cover how to use the Lambda layer in Keras to build, save, and load models which perform custom operations on your data.
Tags: Architecture, Keras, Neural Networks, Python
- Mastering TensorFlow Variables in 5 Easy Steps - Jan 20, 2021.
Learn how to use TensorFlow Variables, their differences from plain Tensor objects, and when they are preferred over these Tensor objects | Deep Learning with TensorFlow 2.x.
Tags: Neural Networks, Python, TensorFlow
- Loglet Analysis: Revisiting COVID-19 Projections - Jan 20, 2021.
We will show that the decomposition of growth into S-shaped logistic components also known as Loglet analysis, is more accurate as it takes into account the evolution of multiple covid waves.
Tags: COVID-19, Data Analysis, Python
- Comprehensive Guide to the Normal Distribution - Jan 18, 2021.
Drop in for some tips on how this fundamental statistics concept can improve your data science.
Tags: Distribution, Normal Distribution, Python, SciPy, Statistics
- Snowflake and Saturn Cloud Partner To Bring 100x Faster Data Science to Millions of Python Users - Jan 15, 2021.
Snowflake the cloud data platform, is partnering, integrating products, and pursuing a joint go-to-market with Saturn Cloud to help data science teams get 100x faster results. Read more about developments and how to get started here.
Tags: Data Science, Python, Saturn Cloud, Snowflake
Cleaner Data Analysis with Pandas Using Pipes - Jan 15, 2021.
Check out this practical guide on Pandas pipes.
Tags: Data Analysis, Data Cleaning, Pandas, Pipeline, Python
- KDnuggets™ News 21:n02, Jan 13: Best Python IDEs and Code Editors; 10 Underappreciated Python Packages for Machine Learning Practitioners - Jan 13, 2021.
Best Python IDEs and Code Editors You Should Know; 10 Underappreciated Python Packages for Machine Learning Practitioners; Top 10 Computer Vision Papers 2020; CatalyzeX: A must-have browser extension for machine learning engineers and researchers
Tags: Computer Vision, Data Science, DevOps, IDE, Machine Learning, MLOps, Python, Research
- Creating Good Meaningful Plots: Some Principles - Jan 12, 2021.
Hera are some thought starters to help you create meaningful plots.
Tags: Charts, Data Visualization, Python, R
- 5 Tools for Effortless Data Science - Jan 11, 2021.
The sixth tool is coffee.
Tags: Data Science, Data Science Tools, Keras, Machine Learning, MLflow, PyCaret, Python
Best Python IDEs and Code Editors You Should Know - Jan 8, 2021.
Developing machine learning algorithms requires implementing countless libraries and integrating many supporting tools and software packages. All this magic must be written by you in yet another tool -- the IDE -- that is fundamental to all your code work and can drive your productivity. These top Python IDEs and code editors are among the best tools available for you to consider, and are reviewed with their noteworthy features.
Tags: IDE, Jupyter, PyCharm, Python, Visual Studio Code
10 Underappreciated Python Packages for Machine Learning Practitioners - Jan 7, 2021.
Here are 10 underappreciated Python packages covering neural architecture design, calibration, UI creation and dissemination.
Tags: Deployment, Neural Networks, Python, UI/UX
- KDnuggets™ News 21:n01, Jan 6: All machine learning algorithms you should know in 2021; Monte Carlo integration in Python; MuZero – the most important ML system ever created? - Jan 6, 2021.
The first issue in 2021 brings you a great blog about Monte Carlo Integration - in Python; An overview of main Machine Learning algorithms you need to know in 2021; SQL vs NoSQL: 7 Key Takeaways; Generating Beautiful Neural Network Visualizations - how to; MuZero - may be the most important Machine Learning system ever created; and much more!
Tags: Algorithms, Monte Carlo, MuZero, NoSQL, Python, SQL
15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
Tags: Automated Machine Learning, Data Science, Deep Learning, Free ebook, Machine Learning, NLP, Python, R, Statistics
Generating Beautiful Neural Network Visualizations - Dec 30, 2020.
If you are looking to easily generate visualizations of neural network architectures, PlotNeuralNet is a project you should check out.
Tags: Neural Networks, Python, Visualization
Monte Carlo integration in Python - Dec 24, 2020.
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?
Tags: Monte Carlo, Python, Simulation, Statistics
- Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance - Dec 21, 2020.
A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.
Tags: AI, Deployment, Explainable AI, Machine Learning, Modeling, Outliers, Production, Python
- Fast and Intuitive Statistical Modeling with Pomegranate - Dec 21, 2020.
Pomegranate is a delicious fruit. It can also be a super useful Python library for statistical analysis. We will show how in this article.
Tags: Distribution, Markov Chains, Probability, Python, Statistical Modeling
- How to use Machine Learning for Anomaly Detection and Conditional Monitoring - Dec 16, 2020.
This article explains the goals of anomaly detection and outlines the approaches used to solve specific use cases for anomaly detection and condition monitoring.
Tags: Anomaly Detection, Machine Learning, Python, scikit-learn, Unsupervised Learning
- KDnuggets™ News 20:n47, Dec 16: A Rising Library Beating Pandas in Performance; R or Python? Why Not Both? - Dec 16, 2020.
Also: 10 Python Skills They Don't Teach in Bootcamp; Data Science Volunteering: Ways to Help; A Journey from Software to Machine Learning Engineer; Data Science and Machine Learning: The Free eBook
Tags: Data Science, Free ebook, IDE, Machine Learning, Machine Learning Engineer, Pandas, Python, R
- Data Science and Machine Learning: The Free eBook - Dec 15, 2020.
Check out the newest addition to our free eBook collection, Data Science and Machine Learning: Mathematical and Statistical Methods, and start building your statistical learning foundation today.
Tags: Data Science, Free ebook, Machine Learning, Python
- How to Create Custom Real-time Plots in Deep Learning - Dec 14, 2020.
How to generate real-time visualizations of custom metrics while training a deep learning model using Keras callbacks.
Tags: Data Visualization, Deep Learning, Keras, Metrics, Neural Networks, Python
- Matrix Decomposition Decoded - Dec 11, 2020.
This article covers matrix decomposition, as well as the underlying concepts of eigenvalues (lambdas) and eigenvectors, as well as discusses the purpose behind using matrix and vectors in linear algebra.
Tags: Linear Algebra, Mathematics, numpy, PCA, Python
A Rising Library Beating Pandas in Performance - Dec 11, 2020.
This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.
Tags: Data Processing, Pandas, Performance, Python
- 10 Python Skills They Don’t Teach in Bootcamp - Dec 11, 2020.
Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.
Tags: Bootcamp, Programming, Python
- Implementing the AdaBoost Algorithm From Scratch - Dec 10, 2020.
AdaBoost technique follows a decision tree model with a depth equal to one. AdaBoost is nothing but the forest of stumps rather than trees. AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well. AdaBoost algorithm is developed to solve both classification and regression problem. Learn to build the algorithm from scratch here.
Tags: Adaboost, Algorithms, Ensemble Methods, Machine Learning, Python
- Data Compression via Dimensionality Reduction: 3 Main Methods - Dec 10, 2020.
Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.
Tags: Compression, Dimensionality Reduction, LDA, PCA, Python
R or Python? Why Not Both? - Dec 9, 2020.
Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs.
Tags: Data Analysis, Data Science, IDE, Programming, Python, R
- Merging Pandas DataFrames in Python - Dec 8, 2020.
A quick how-to guide for merging Pandas DataFrames in Python.
Tags: Data Preparation, Data Preprocessing, Data Processing, Pandas, Python
- Change the Background of Any Video with 5 Lines of Code - Dec 7, 2020.
Learn to blur, color, grayscale and create a virtual background for a video with PixelLib.
Tags: Computer Vision, Image Processing, Machine Learning, Python, Segmentation, Video
- Pruning Machine Learning Models in TensorFlow - Dec 4, 2020.
Read this overview to learn how to make your models smaller via pruning.
Tags: Machine Learning, Modeling, Python, TensorFlow
- 10 Python Skills for Beginners - Dec 3, 2020.
Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.
Tags: Data Science, Programming, Python, Tips
- KDnuggets™ News 20:n45, Dec 2: TabPy: Combining Python and Tableau; Learn Deep Learning with this Free Course from Yann LeCun - Dec 2, 2020.
Combine Python and Tableau with TabPy; Learn Deep Learning with this Free Course from Yann LeCun; Find 15 Exciting AI Project Ideas for Beginners; Read about the Rise of the Machine Learning Engineer; See How to Incorporate Tabular Data with HuggingFace Transformers
Tags: AI, Courses, Deep Learning, Hugging Face, Machine Learning Engineer, Python, Tableau, Transformer, Yann LeCun
Object-Oriented Programming Explained Simply for Data Scientists - Dec 1, 2020.
Read this simple but effective guide to start using Classes in Python 3.
Tags: Data Science, Data Scientist, Explained, Programming, Python
- Deploying Trained Models to Production with TensorFlow Serving - Nov 30, 2020.
TensorFlow provides a way to move a trained model to a production environment for deployment with minimal effort. In this article, we’ll use a pre-trained model, save it, and serve it using TensorFlow Serving.
Tags: Deployment, Modeling, Neural Networks, Python, TensorFlow
- Data Science History and Overview - Nov 30, 2020.
In this era of big data that is only getting bigger, a huge amount of information from different fields is gathered and stored. Its analysis and extraction of value have become one of the most attractive tasks for companies and society in general, which is harnessed by the new professional role of the Data Scientist.
Tags: About Gregory Piatetsky, Data Science, Data Scientist, History, Python
- Essential Math for Data Science: Integrals And Area Under The Curve - Nov 25, 2020.
In this article, you’ll learn about integrals and the area under the curve using the practical data science example of the area under the ROC curve used to compare the performances of two machine learning models.
Tags: Machine Learning, Mathematics, Metrics, numpy, Python, Unbalanced
- How to Incorporate Tabular Data with HuggingFace Transformers - Nov 25, 2020.
In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.
Tags: Data Preparation, Deep Learning, Machine Learning, NLP, Python, Transformer
- Simple Python Package for Comparing, Plotting & Evaluating Regression Models - Nov 25, 2020.
This package is aimed to help users plot the evaluation metric graph with single line code for different widely used regression model metrics comparing them at a glance. With this utility package, it also significantly lowers the barrier for the practitioners to evaluate the different machine learning algorithms in an amateur fashion by applying it to their everyday predictive regression problems.
Tags: Data Visualization, Metrics, Modeling, Python, Regression
TabPy: Combining Python and Tableau - Nov 24, 2020.
This article demonstrates how to get started using Python in Tableau.
Tags: Data Visualization, Python, Tableau
- Computer Vision at Scale With Dask And PyTorch - Nov 23, 2020.
A tutorial on conducting image classification inference using the Resnet50 deep learning model at scale with using GPU clusters on Saturn Cloud. The results were: 40x faster computer vision that made a 3+ hour PyTorch model run in just 5 minutes.
Tags: Computer Vision, Dask, Python, PyTorch, Scalability
- Top 6 Data Science Programs for Beginners - Nov 20, 2020.
Udacity has the best industry-leading programs in data science. Here are the top six data science courses for beginners to help you get started.
Tags: Beginners, Certificate, Data Engineer, Data Science Education, Data Visualization, Online Education, Python, R, SQL, Udacity
- KDnuggets™ News 20:n44, Nov 18: How to Acquire the Most Wanted Data Science Skills; Learn to build an end to end data science project - Nov 18, 2020.
How to get the most wanted Data Science skills; How to build and end to end Data Science project; How to get into Data Science without a degree; Top Python Libraries for Deep Learning, Natural Language Processing, and Computer Vision; Is Data Science for you? 14 self-examination questions to consider; and more
Tags: Career Advice, Data Science Skills, Portfolio, Python
- Algorithms for Advanced Hyper-Parameter Optimization/Tuning - Nov 17, 2020.
In informed search, each iteration learns from the last, whereas in Grid and Random, modelling is all done at once and then the best is picked. In case for small datasets, GridSearch or RandomSearch would be fast and sufficient. AutoML approaches provide a neat solution to properly select the required hyperparameters that improve the model’s performance.
Tags: Automated Machine Learning, AutoML, Hyperparameter, Optimization, Python
- 5 Things You Are Doing Wrong in PyCaret - Nov 16, 2020.
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. Find out 5 ways to improve your usage of the library.
Tags: Machine Learning, PyCaret, Python, Tips
Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision - Nov 16, 2020.
This article compiles the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff.
Tags: Computer Vision, Data Science, Deep Learning, Machine Learning, Neural Networks, NLP, Python
-
From Y=X to Building a Complete Artificial Neural Network - Nov 13, 2020.
In this tutorial, we will start with the most simple artificial neural network (ANN) and move to something much more complex. We begin by building a machine learning model with no parameters—which is Y=X.
Tags: Bias, Neural Networks, Optimization, Python
- tensorflow + dalex = :) , or how to explain a TensorFlow model - Nov 13, 2020.
Having a machine learning model that generates interesting predictions is one thing. Understanding why it makes these predictions is another. For a tensorflow predictive model, it can be straightforward and convenient develop an explainable AI by leveraging the dalex Python package.
Tags: Dalex, Explainability, Explainable AI, Machine Learning, Python, TensorFlow
Learn to build an end to end data science project - Nov 11, 2020.
Appreciating the process you must work through for any Data Science project is valuable before you land your first job in this field. With a well-honed strategy, such as the one outlined in this example project, you will remain productive and consistently deliver valuable machine learning models.
Tags: Data Preparation, Data Science, GitHub, Portfolio, Python, Regression, Salary
- Mastering TensorFlow Tensors in 5 Easy Steps - Nov 11, 2020.
Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects.
Tags: Deep Learning, Python, Tensor, TensorFlow
- KDnuggets™ News 20:n43, Nov 11: The Best Data Science Certification You’ve Never Heard Of; Essential data science skills that no one talks about - Nov 11, 2020.
The Best Data Science Certification You've Never Heard Of; Essential data science skills that no one talks about; Pandas on Steroids: End to End Data Science in Python with Dask; How to Build a Football Dataset with Web Scraping; 2 Coding-free Ways to Extract Content From Websites to Boost Web Traffic
Tags: Certification, Courses, Dask, Data Science, Data Science Skills, Football, Pandas, Python, Soccer, Web Scraping
Every Complex DataFrame Manipulation, Explained & Visualized Intuitively - Nov 10, 2020.
Most Data Scientists might hail the power of Pandas for data preparation, but many may not be capable of leveraging all that power. Manipulating data frames can quickly become a complex task, so eight of these techniques within Pandas are presented with an explanation, visualization, code, and tricks to remember how to do it.
Tags: Data Preparation, Pandas, Python
- Change the Background of Any Image with 5 Lines of Code - Nov 9, 2020.
Blur, color, grayscale and change the background of any image with a picture using PixelLib.
Tags: Computer Vision, Image Processing, Machine Learning, Python, Segmentation
Pandas on Steroids: End to End Data Science in Python with Dask - Nov 6, 2020.
End to end parallelized data science from reading big data to data manipulation to visualisation to machine learning.
Tags: Dask, Data Science, Pandas, Python
- How to Build a Football Dataset with Web Scraping - Nov 5, 2020.
This article covers using Selenium to scrape JavaScript rendered content.
Tags: Javascript, Python, Selenium, Soccer, Web Scraping
- How to deploy PyTorch Lightning models to production - Nov 5, 2020.
A complete guide to serving PyTorch Lightning models at scale.
Tags: Deployment, Neural Networks, Production, Python, PyTorch
- KDnuggets™ News 20:n42, Nov 4: Top Python Libraries for Data Science, Data Visualization & Machine Learning; Mastering Time Series Analysis - Nov 4, 2020.
Top Python Libraries for Data Science, Data Visualization, Machine Learning; Mastering Time Series Analysis with Help From the Experts; Explaining the Explainable AI: A 2-Stage Approach; The Missing Teams For Data Scientists; and more.
Tags: Career Advice, Data Science Team, Explainable AI, Python, Time Series
- Building Deep Learning Projects with fastai — From Model Training to Deployment - Nov 4, 2020.
A getting started guide to develop computer vision application with fastai.
Tags: Deep Learning, Deployment, fast.ai, Modeling, Python, Training

Top Python Libraries for Data Science, Data Visualization & Machine Learning - Nov 2, 2020.
This article compiles the 38 top Python libraries for data science, data visualization & machine learning, as best determined by KDnuggets staff.
Tags: Automated Machine Learning, AutoML, Data Exploration, Data Processing, Data Science, Data Visualization, Explainability, Machine Learning, Python
Building Neural Networks with PyTorch in Google Colab - Oct 30, 2020.
Combining PyTorch and Google's cloud-based Colab notebook environment can be a good solution for building neural networks with free access to GPUs. This article demonstrates how to do just that.
Tags: Deep Learning, Google Colab, Neural Networks, Python, PyTorch
- Dealing with Imbalanced Data in Machine Learning - Oct 29, 2020.
This article presents tools & techniques for handling data when it's imbalanced.
Tags: Balancing Classes, Machine Learning, Python
- Stop Running Jupyter Notebooks From Your Command Line - Oct 28, 2020.
Instead, run your Jupyter Notebook as a stand alone web app.
Tags: App, Data Science, Docker, Jupyter, Python
- Which flavor of BERT should you use for your QA task? - Oct 22, 2020.
Check out this guide to choosing and benchmarking BERT models for question answering.
Tags: BERT, NLP, Python, Question answering
- 10 Underrated Python Skills - Oct 21, 2020.
Tips for feature analysis, hyperparameter tuning, data visualization and more.
Tags: Data Analysis, Data Science Skills, Data Visualization, MLflow, Pandas, Programming, Python, Time Series
- KDnuggets™ News 20:n40, Oct 21: fastcore: An Underrated Python Library; Goodhart’s Law for Data Science: what happens when a measure becomes a target? - Oct 21, 2020.
fastcore: An Underrated Python Library; Goodhart's Law for Data Science and what happens when a measure becomes a target?; Text Mining with R: The Free eBook; Free From MIT: Intro to Computational Thinking and Data Science; How to ace the data science coding challenge
Tags: Challenge, Courses, Data Science, Free ebook, Measurement, MIT, Python, R, Text Mining
- Deploying Streamlit Apps Using Streamlit Sharing - Oct 20, 2020.
Read this sneak peek into Streamlit’s new deployment platform.
Tags: Deployment, Python, Streamlit
- Data Science in the Cloud with Dask - Oct 20, 2020.
Scaling large data analyses for data science and machine learning is growing in importance. Dask and Coiled are making it easy and fast for folks to do just that. Read on to find out how.
Tags: Cloud, Dask, Data Science, Python
- Feature Ranking with Recursive Feature Elimination in Scikit-Learn - Oct 19, 2020.
This article covers using scikit-learn to obtain the optimal number of features for your machine learning project.
Tags: Feature Selection, Machine Learning, Python, scikit-learn
Roadmap to Natural Language Processing (NLP) - Oct 19, 2020.
Check out this introduction to some of the most common techniques and models used in Natural Language Processing (NLP).
Tags: Data Preprocessing, LDA, NLP, Python, Roadmap, Sentiment Analysis, Transformer, Word Embeddings
- Fast Gradient Boosting with CatBoost - Oct 16, 2020.
In this piece, we’ll take a closer look at a gradient boosting library called CatBoost.
Tags: CatBoost, Gradient Boosting, Machine Learning, Python

fastcore: An Underrated Python Library - Oct 15, 2020.
A unique python library that extends the python programming language and provides utilities that enhance productivity.
Tags: Development, fast.ai, Programming, Python
Free From MIT: Intro to Computational Thinking and Data Science - Oct 14, 2020.
This free course from MIT will help in your transition to thinking computationally, and ultimately solving complex data science problems.
Tags: Computer Science, Courses, Data Science, MIT, Python
- Getting Started with PyTorch - Oct 14, 2020.
A practical walkthrough on how to use PyTorch for data analysis and inference.
Tags: Neural Networks, Python, PyTorch
- Exploring The Brute Force K-Nearest Neighbors Algorithm - Oct 12, 2020.
This article discusses a simple approach to increasing the accuracy of k-nearest neighbors models in a particular subset of cases.
Tags: Algorithms, K-nearest neighbors, Machine Learning, Python
- Here are the Most Popular Python IDEs/Editors - Oct 6, 2020.
Jupyter Notebook continues to lead as the most popular Python IDE, but its share has declined since the last poll. The top 4 contenders have remained the same, but only one has significantly improved its share. We also examine the breakdown by employment and region.
Tags: IDE, Jupyter, Poll, PyCharm, Python, Visual Studio Code
10 Best Machine Learning Courses in 2020 - Oct 6, 2020.
If you are ready to take your career in machine learning to the next level, then these top 10 Machine Learning Courses covering both practical and theoretical work will help you excel.
Tags: Courses, DataCamp, Deep Learning, fast.ai, Machine Learning, Online Education, Python, Stanford
- Your Guide to Linear Regression Models - Oct 5, 2020.
This article explains linear regression and how to program linear regression models in Python.
Tags: Linear Regression, Python
- Key Machine Learning Technique: Nested Cross-Validation, Why and How, with Python code - Oct 5, 2020.
Selecting the best performing machine learning model with optimal hyperparameters can sometimes still end up with a poorer performance once in production. This phenomenon might be the result of tuning the model and evaluating its performance on the same sets of train and test data. So, validating your model more rigorously can be key to a successful outcome.
Tags: Cross-validation, Machine Learning, Python

Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science - Oct 1, 2020.
Data science is ever-evolving, so mastering its foundational technical and soft skills will help you be successful in a career as a Data Scientist, as well as pursue advance concepts, such as deep learning and artificial intelligence.
Tags: Algorithms, Communication, Data Preprocessing, Data Science, Data Science Skills, Data Visualization, Ethics, Mathematics, Python, R
- KDnuggets™ News 20:n37, Sep 30: Introduction to Time Series Analysis in Python; How To Improve Machine Learning Model Accuracy - Sep 30, 2020.
Learn how to work with time series in Python; Tips for improving Machine Learning model accuracy from 80% to over 90%; Geographical Plots with Python; Best methods for making Python programs blazingly fast; Read a complete guide to PyTorch; KDD Best Paper Awards and more.
Tags: Accuracy, Geospatial, KDD, Performance, Python, PyTorch, Time Series
- Looking Inside The Blackbox: How To Trick A Neural Network - Sep 28, 2020.
In this tutorial, I’ll show you how to use gradient ascent to figure out how to misclassify an input.
Tags: Neural Networks, Python, PyTorch
Geographical Plots with Python - Sep 28, 2020.
When your data includes geographical information, rich map visualizations can offer significant value for you to understand your data and for the end user when interpreting analytical results.
Tags: Choropleth, Data Visualization, Geospatial, Plotly, Python
- Making Python Programs Blazingly Fast - Sep 25, 2020.
Let’s look at the performance of our Python programs and see how to make them up to 30% faster!
Tags: Development, Optimization, Programming, Python
- Create and Deploy your First Flask App using Python and Heroku - Sep 25, 2020.
Flask is a straightforward and lightweight web application framework for Python applications. This guide walks you through how to write an application using Flask with a deployment on Heroku.
Tags: App, Deployment, Flask, Heroku, Python
Introduction to Time Series Analysis in Python - Sep 24, 2020.
Data that is updated in real-time requires additional handling and special care to prepare it for machine learning models. The important Python library, Pandas, can be used for most of this work, and this tutorial guides you through this process for analyzing time-series data.
Tags: Pandas, Python, time, Time Series
- The Most Complete Guide to PyTorch for Data Scientists - Sep 24, 2020.
All the PyTorch functionality you will ever need while doing Deep Learning. From an Experimentation/Research Perspective.
Tags: Data Science, Data Scientist, Neural Networks, Python, PyTorch
- KDnuggets™ News 20:n36, Sep 23: New Poll: What Python IDE / Editor you used the most in 2020?; Automating Every Aspect of Your Python Project - Sep 23, 2020.
New Poll: What Python IDE / Editor you used the most in 2020?; Automating Every Aspect of Your Python Project; Autograd: The Best Machine Learning Library You're Not Using?; Implementing a Deep Learning Library from Scratch in Python; Online Certificates/Courses in AI, Data Science, Machine Learning; Can Neural Networks Show Imagination?
Tags: Automation, Certificate, Courses, Data Science, Deep Learning, DeepMind, Machine Learning, Neural Networks, Python
- New Poll: What Python IDE / Editor you used the most in 2020? - Sep 22, 2020.
The latest KDnuggets polls asks which Python IDE / Editor you have used the most in 2020. Participate now, and share your experiences with the community.
Tags: Data Science, Development, IDE, Poll, Programming, Python
- Statistical and Visual Exploratory Data Analysis with One Line of Code - Sep 21, 2020.
If EDA is not executed correctly, it can cause us to start modeling with “unclean” data. See how to use Pandas Profiling to perform EDA with a single line of code.
Tags: Data Exploration, Data Visualization, Pandas, Python
Automating Every Aspect of Your Python Project - Sep 18, 2020.
Every Python project can benefit from automation using Makefile, optimized Docker images, well configured CI/CD, Code Quality Tools and more…
Tags: Development, DevOps, Docker, Programming, Python
Implementing a Deep Learning Library from Scratch in Python - Sep 17, 2020.
A beginner’s guide to understanding the fundamental building blocks of deep learning platforms.
Tags: Deep Learning, Neural Networks, Python
Autograd: The Best Machine Learning Library You’re Not Using? - Sep 16, 2020.
If there is a Python library that is emblematic of the simplicity, flexibility, and utility of differentiable programming it has to be Autograd.
Tags: Deep Learning, Neural Networks, Python, PyTorch
- KDnuggets™ News 20:n35, Sep 16: Data Science Skills: Core, Emerging, and Most Wanted; Free From MIT: Intro to CS, Programming in Python - Sep 16, 2020.
Check the analysis of latest KDnuggets Poll: which data science skills are core, which are emerging, and what is the most wanted skill readers want to learn; Free From MIT: Intro to CS and Programming in Python; 8 AI/Machine Learning Projects To Make Your Portfolio Stand Out; Statistics with Julia: The Free eBook; and more.
Tags: Deep Learning, Julia, Kaggle, MIT, Python
- Visualization Of COVID-19 New Cases Over Time In Python - Sep 15, 2020.
Inspired by another concise data visualization, the author of this article has crafted and shared the code for a heatmap which visualizes the COVID-19 pandemic in the United States over time.
Tags: Coronavirus, COVID-19, Data Visualization, Python, Time Series, Visualization
- An Introduction to NLP and 5 Tips for Raising Your Game - Sep 11, 2020.
This article is a collection of things the author would like to have known when they started out in NLP. Perhaps it will be useful for you.
Tags: Beginners, NLP, Python
Free From MIT: Intro to Computer Science and Programming in Python - Sep 9, 2020.
This free introductory computer science and programming course is available via MIT's Open Courseware platform. It's a great resource for mastering the fundamentals of one of data science's major requirements.
Tags: Computer Science, Courses, MIT, Programming, Python
Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
Tags: Communication, Data Preparation, Data Science Skills, Data Visualization, Excel, GitHub, Mathematics, Poll, Python, Reinforcement Learning, scikit-learn, SQL, Statistics
- 4 Tricks to Effectively Use JSON in Python - Sep 8, 2020.
Working with JSON in Python is a breeze, this will get you started right away.
Tags: Programming, Python, Tips
- 10 Things You Didn’t Know About Scikit-Learn - Sep 3, 2020.
Check out these 10 things you didn’t know about Scikit-Learn... until now.
Tags: Machine Learning, Python, scikit-learn
- Computer Vision Recipes: Best Practices and Examples - Sep 2, 2020.
This is an overview of a great computer vision resource from Microsoft, which demonstrates best practices and implementation guidelines for a variety of tasks and scenarios.
Tags: Best Practices, Computer Vision, Microsoft, Python
- Which methods should be used for solving linear regression? - Sep 2, 2020.
As a foundational set of algorithms in any machine learning toolbox, linear regression can be solved with a variety of approaches. Here, we discuss. with with code examples, four methods and demonstrate how they should be used.
Tags: Gradient Descent, Linear Regression, numpy, Python, Statistics, SVD
- PyCaret 2.1 is here: What’s new? - Sep 1, 2020.
PyCaret is an open-source, low-code machine learning library in Python that automates the machine learning workflow. It is an end-to-end machine learning and model management tool that speeds up the machine learning experiment cycle and makes you 10x more productive. Read about what's new in PyCaret 2.1.
Tags: Machine Learning, PyCaret, Python
- Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune - Aug 27, 2020.
With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.
Tags: Dalex, Explainability, Explainable AI, Interpretability, Python, SHAP
- Working with Spark, Python or SQL on Azure Databricks - Aug 27, 2020.
Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.
Tags: Apache Spark, Databricks, Microsoft Azure, Python, SQL
- Data Science Tools Illustrated Study Guides - Aug 25, 2020.
These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.
Tags: Cheat Sheet, Data Preprocessing, Data Processing, Data Science, Data Science Tools, Data Visualization, Python, R, SQL
- Rapid Python Model Deployment with FICO Xpress Insight - Aug 20, 2020.
The biggest hurdle in the use of data to create business value, is indeed the ability to operationalize analytics throughout the organization. Xpress Insight is geared to reduce the burden on IT and address their critical requirements while empowering business users to take ownership of decisions and change management.
Tags: AI, Deployment, FICO, Machine Learning, Optimization, Python
- Build Your Own AutoML Using PyCaret 2.0 - Aug 20, 2020.
In this post we present a step-by-step tutorial on how PyCaret can be used to build an Automated Machine Learning Solution within Power BI, thus allowing data scientists and analysts to add a layer of machine learning to their Dashboards without any additional license or software costs.
Tags: Automated Machine Learning, AutoML, Power BI, PyCaret, Python
The List of Top 10 Lists in Data Science - Aug 14, 2020.
The list of Top 10 lists that Data Scientists -- from enthusiasts to those who want to jump start a career -- must know to smoothly navigate a path through this field.
Tags: Algorithms, Data Science, Data Science Skills, Datasets, Influencers, LinkedIn, Python, Top 10
- Bring your Pandas Dataframes to life with D-Tale - Aug 13, 2020.
Bring your Pandas dataframes to life with D-Tale. D-Tale is an open-source solution for which you can visualize, analyze and learn how to code Pandas data structures. In this tutorial you'll learn how to open the grid, build columns, create charts and view code exports.
Tags: Data Exploration, Data Science, Data Visualization, Pandas, Python
5 Different Ways to Load Data in Python - Aug 13, 2020.
Data is the bread and butter of a Data Scientist, so knowing many approaches to loading data for analysis is crucial. Here, five Python techniques to bring in your data are reviewed with code examples for you to follow.
Tags: Beginners, Data Preparation, Python
- GitHub is the Best AutoML You Will Ever Need - Aug 12, 2020.
This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.
Tags: Automated Machine Learning, AutoML, GitHub, PyCaret, Python
- Will Reinforcement Learning Pave the Way for Accessible True Artificial Intelligence? - Aug 11, 2020.
Python Machine Learning, Third Edition covers the essential concepts of reinforcement learning, starting from its foundations, and how RL can support decision making in complex environments. Read more on the topic from the book's author Sebastian Raschka.
Tags: Machine Learning, Packt Publishing, Python, Reinforcement Learning, Sebastian Raschka
Setting Up Your Data Science & Machine Learning Capability in Python - Aug 4, 2020.
With the rich and dynamic ecosystem of Python continuing to be a leading programming language for data science and machine learning, establishing and maintaining a cost-effective development environment is crucial to your business impact. So, do you rent or buy? This overview considers the hidden and obvious factors involved in selecting and implementing your Python platform.
Tags: Cloud Computing, Data Science, Machine Learning, Python, Saturn Cloud
- Announcing PyCaret 2.0 - Aug 3, 2020.
PyCaret 2.0 has been released! Find out about all of the updates and see examples of how to use them right here.
Tags: Machine Learning, PyCaret, Python
- The Machine Learning Field Guide - Aug 3, 2020.
This straightforward guide offers a structured overview of all machine learning prerequisites needed to start working on your project, including the complete data pipeline from importing and cleaning data to modelling and production.
Tags: Data Preparation, Machine Learning, Pandas, Predictive Modeling, Python