Deep Learning for Detecting Pneumonia from X-ray Images - Jun 5, 2020.
This article covers an end to end pipeline for pneumonia detection from X-ray images.
Deep Learning, Healthcare, Image Recognition, Python
- Machine Learning Experiment Tracking - Jun 4, 2020.
Why is experiment tracking so important for doing real world machine learning?
Experimentation, Machine Learning, Python
- Introduction to Convolutional Neural Networks - Jun 3, 2020.
The article focuses on explaining key components in CNN and its implementation using Keras python library.
Convolutional Neural Networks, Keras, Neural Networks, Python
- Introduction to Pandas for Data Science - Jun 1, 2020.
The Pandas library is core to any Data Science work in Python. This introduction will walk you through the basics of data manipulating, and features many of Pandas important features.
Data Science, Pandas, Python
Model Evaluation Metrics in Machine Learning - May 28, 2020.
A detailed explanation of model evaluation metrics to evaluate a classification machine learning model.
Classification, Confusion Matrix, Machine Learning, Metrics, Python, Regression
- Taming Complexity in MLOps - May 28, 2020.
A greatly expanded v2.0 of the open-source Orbyter toolkit helps data science teams continue to streamline machine learning delivery pipelines, with an emphasis on seamless deployment to production.
Best Practices, Docker, MLOps, Python
- KDnuggets™ News 20:n21, May 27: The Best NLP with Deep Learning Course is Free; Your First Machine Learning Web App - May 27, 2020.
Also: Python For Everybody: The Free eBook; Complex logic at breakneck speed: Try Julia for data science; An easy guide to choose the right Machine Learning algorithm; Dataset Splitting Best Practices in Python; Appropriately Handling Missing Values for Statistical Modelling and Prediction
Algorithms, Course, Deep Learning, Free ebook, Julia, Machine Learning, NLP, Python
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
Datasets, Python, scikit-learn, Training Data, Validation
- 10 Useful Machine Learning Practices For Python Developers - May 25, 2020.
While you may be a data scientist, you are still a developer at the core. This means your code should be skillful. Follow these 10 tips to make sure you quickly deliver bug-free machine learning solutions.
Best Practices, Machine Learning Engineer, Python
Python For Everybody: The Free eBook - May 25, 2020.
Get back to fundamentals with this free eBook, Python For Everybody, approaching the learning of programming from a data analysis perspective.
Algorithms, Free ebook, Programming, Python
Build and deploy your first machine learning web app - May 22, 2020.
A beginner’s guide to train and deploy machine learning pipelines in Python using PyCaret.
App, Flask, Heroku, Machine Learning, Modeling, Open Source, Pipeline, PyCaret, Python
- Dimensionality Reduction with Principal Component Analysis (PCA) - May 21, 2020.
This article focuses on design principles of the PCA algorithm for dimensionality reduction and its implementation in Python from scratch.
Dimensionality Reduction, numpy, PCA, Python
- Pandas in action! - May 20, 2020.
Pandas is instantly familiar to anyone who’s used spreadsheet software, whether that’s Google Sheets or good old Excel. It’s got columns, it’s got grids, it’s got rows; but pandas is far more powerful. Save 40% with code nlkdpandas40 on this book, and other Manning books and videos.
Book, Manning, Pandas, Python
- Complex logic at breakneck speed: Try Julia for data science - May 20, 2020.
We show a comparative performance benchmarking of Julia with an equivalent Python code to show why Julia is great for data science and machine learning.
Benchmark, Data Science, Julia, numpy, Python
- Sparse Matrix Representation in Python - May 19, 2020.
Leveraging sparse matrix representations for your data when appropriate can spare you memory storage. Have a look at the reasons why, see how to create sparse matrices in Python using Scipy, and compare the memory requirements for standard and sparse representations of the same data.
numpy, Python, scikit-learn, SciPy, Sparse data
- Easy Text-to-Speech with Python - May 18, 2020.
Python comes with a lot of handy and easily accessible libraries and we’re going to look at how we can deliver text-to-speech with Python in this article.
NLP, Python, Speech
- 5 Great New Features in Scikit-learn 0.23 - May 15, 2020.
Check out 5 new features of the latest Scikit-learn release, including the ability to visualize estimators in notebooks, improvements to both k-means and gradient boosting, some new linear model implementations, and sample weight support for a pair of existing regressors.
Gradient Boosting, Jupyter, K-means, Machine Learning, Python, Regression, scikit-learn
- Machine Learning in Power BI using PyCaret - May 12, 2020.
Check out this step-by-step tutorial for implementing machine learning in Power BI within minutes.
Clustering, K-means, Machine Learning, Microsoft, Power BI, PyCaret, Python
- Text Mining in Python: Steps and Examples - May 12, 2020.
The majority of data exists in the textual form which is a highly unstructured format. In order to produce meaningful insights from the text data then we need to follow a method called Text Analysis.
NLP, Python, Text Mining
- Hyperparameter Optimization for Machine Learning Models - May 7, 2020.
Check out this comprehensive guide to model optimization techniques.
Hyperparameter, Machine Learning, Modeling, Optimization, Python
- Explaining “Blackbox” Machine Learning Models: Practical Application of SHAP - May 6, 2020.
Train a "blackbox" GBM model on a real dataset and make it explainable with SHAP.
Explainability, Interpretability, Python, SHAP
- KDnuggets™ News 20:n18, May 6: Five Cool Python Libraries for Data Science; NLP Recipes: Best Practices - May 6, 2020.
5 cool Python libraries for Data Science; NLP Recipes: Best Practices and Examples; Deep Learning: The Free eBook; Demystifying the AI Infrastructure Stack; and more.
Deep Learning, GIS, NLP, Python
- Getting Started with Spectral Clustering - May 5, 2020.
This post will unravel a practical example to illustrate and motivate the intuition behind each step of the spectral clustering algorithm.
Clustering, Machine Learning, Python
- Optimize Response Time of your Machine Learning API In Production - May 1, 2020.
This article demonstrates how building a smarter API serving Deep Learning models minimizes the response time.
API, Machine Learning, Optimization, Production, Python
Natural Language Processing Recipes: Best Practices and Examples - May 1, 2020.
Here is an overview of another great natural language processing resource, this time from Microsoft, which demonstrates best practices and implementation guidelines for a variety of tasks and scenarios.
Best Practices, Microsoft, NLP, Python
Five Cool Python Libraries for Data Science - Apr 30, 2020.
Check out these 5 cool Python libraries that the author has come across during an NLP project, and which have made their life easier.
Data Science, NLP, Python
Coronavirus COVID-19 Genome Analysis using Biopython - Apr 29, 2020.
So in this article, we will interpret, analyze the COVID-19 DNA sequence data and try to get as many insights regarding the proteins that made it up. Later will compare COVID-19 DNA with MERS and SARS and we’ll understand the relationship among them.
Analysis, Coronavirus, COVID-19, Python
- Announcing PyCaret 1.0.0 - Apr 21, 2020.
An open source low-code machine learning library in Python. PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient.
Machine Learning, Modeling, Open Source, PyCaret, Python
- The Benefits & Examples of Using Apache Spark with PySpark - Apr 21, 2020.
Apache Spark runs fast, offers robust, distributed, fault-tolerant data objects, and integrates beautifully with the world of machine learning and graph analytics. Learn more here.
Apache Spark, Data Management, Python, SQL
- Dockerize Jupyter with the Visual Debugger - Apr 17, 2020.
A step by step guide to enable and use visual debugging in Jupyter in a docker container.
Data Science, Docker, Jupyter, Python
- Better notebooks through CI: automatically testing documentation for graph machine learning - Apr 16, 2020.
In this article, we’ll walk through the detailed and helpful continuous integration (CI) that supports us in keeping StellarGraph’s demos current and informative.
Graphs, Integration, Jupyter, Machine Learning, Python, Software Engineering
- Top KDnuggets tweets, Apr 08-14: Mathematics for #MachineLearning: The Free eBook – KDnuggets - Apr 15, 2020.
Also Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools; A professor with 20 year experience to all high school seniors (and their parents). If you were planning to enroll in college next fall - don't.
Coronavirus, Education, Mathematics, NLP, Python, Top tweets
- Pandas in action - Apr 15, 2020.
Pandas is instantly familiar to anyone who’s used spreadsheet software, whether that’s Google Sheets or good old Excel. It’s got columns, it’s got grids, it’s got rows; but pandas is far more powerful. Save 40% with code nlkdpandas40 on this book, and other Manning books and videos.
Book, Manning, Pandas, Python
- Visualizing Decision Trees with Python (Scikit-learn, Graphviz, Matplotlib) - Apr 15, 2020.
Learn about how to visualize decision trees using matplotlib and Graphviz.
Algorithms, Decision Trees, Matplotlib, Python, Visualization
- KDnuggets™ News 20:n15, Apr 15: How to Do Hyperparameter Tuning on Any Python Script; 10 Must-read Machine Learning Articles - Apr 15, 2020.
Learn how to do hyperparameter tuning on python ML scripts; Read 10 must-read Machine Learning Articles; Understand the process for Data Science project review; see how data science is used to understand COVID-19; and stay safe and healthy!
Hyperparameter, Machine Learning, Python, Research
- Free Metis Corporate Training Series: Intro to Python, Continued - Apr 14, 2020.
Metis Corporate Training is offering Intro to Python, a free, live online training series specially created for business professionals, and an excellent way for a team to begin their Python journey. Classes are taught live, and participants will be able to ask questions in real time. Register now.
Metis, Online Education, Python
- Build PyTorch Models Easily Using torchlayers - Apr 9, 2020.
torchlayers aims to do what Keras did for TensorFlow, providing a higher-level model-building API and some handy defaults and add-ons useful for crafting PyTorch neural networks.
API, Keras, Neural Networks, Python, PyTorch
How to Do Hyperparameter Tuning on Any Python Script in 3 Easy Steps - Apr 8, 2020.
With your machine learning model in Python just working, it's time to optimize it for performance. Follow this guide to setup automated tuning using any optimization library in three steps.
Hyperparameter, Machine Learning, Optimization, Python
- KDnuggets™ News 20:n14, Apr 8: Free Mathematics for Machine Learning eBook; Epidemiology Courses for Data Scientists - Apr 8, 2020.
Stop Hurting Your Pandas!; Python for data analysis... is it really that simple?!?; Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs; Build an app to generate photorealistic faces using TensorFlow and Streamlit; 5 Ways Data Scientists Can Help Respond to COVID-19 and 5 Actions to Avoid
Anomaly Detection, Data Analysis, Data Science, Free ebook, Healthcare, Mathematics, MOOC, Pandas, Python
- Simple Question Answering (QA) Systems That Use Text Similarity Detection in Python - Apr 7, 2020.
How exactly are smart algorithms able to engage and communicate with us like humans? The answer lies in Question Answering systems that are built on a foundation of Machine Learning and Natural Language Processing. Let's build one here.
NLP, Python, Question answering, Similarity, Text Analytics
- Build an app to generate photorealistic faces using TensorFlow and Streamlit - Apr 7, 2020.
We’ll show you how to quickly build a Streamlit app to synthesize celebrity faces using GANs, Tensorflow, and st.cache.
App, GANs, Generative Adversarial Network, Python, Streamlit, TensorFlow
Stop Hurting Your Pandas! - Apr 3, 2020.
This post will address the issues that can arise when Pandas slicing is used improperly. If you see the warning that reads "A value is trying to be set on a copy of a slice from a DataFrame", this post is for you.
Pandas, Programming, Python
- Free Metis Corporate Training Series: Intro to Python - Apr 2, 2020.
Metis Corporate Training is offering Intro to Python, a free, live online training series specially created for business professionals, and an excellent way for a team to begin their Python journey. Classes are taught live, and participants will be able to ask questions in real time. Register now.
Metis, Online Education, Python
Python for data analysis… is it really that simple?!? - Apr 2, 2020.
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
Data Analysis, Pandas, Python, R, SQL
- Introduction to the K-nearest Neighbour Algorithm Using Examples - Apr 1, 2020.
Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.
Algorithms, K-nearest neighbors, Machine Learning, Python, scikit-learn
- KDnuggets™ News 20:n13, Apr 1: Effective visualizations for pandemic storytelling; Machine learning for time series forecasting - Apr 1, 2020.
This week, read about the power of effective visualizations for pandemic storytelling; see how (not) to use machine learning for time series forecasting; learn about a deep learning breakthrough: a sub-linear deep learning algorithm that does not need a GPU?; familiarize yourself with how to painlessly analyze your time series; check out what can we learn from the latest coronavirus trends; and... KDnuggets topics?!? Also, much more.
Coronavirus, Data Visualization, Deep Learning, Distributed, Machine Learning, Python, Time Series
- How To Painlessly Analyze Your Time Series - Mar 26, 2020.
The Matrix Profile is a powerful tool to help solve this dual problem of anomaly detection and motif discovery. Matrix Profile is robust, scalable, and largely parameter-free: we’ve seen it work for a wide range of metrics including website user data, order volume and other business-critical applications.
Anomaly Detection, API, Python, Time Series
- Evaluating Ray: Distributed Python for Massive Scalability - Mar 25, 2020.
If your team has started using Ray and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.
Domino, Python, Scalability
- Build an Artificial Neural Network From Scratch: Part 2 - Mar 20, 2020.
The second article in this series focuses on building an Artificial Neural Network using the Numpy Python library.
Neural Networks, numpy, Python
- The 4 Best Jupyter Notebook Environments for Deep Learning - Mar 19, 2020.
Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.
Deep Learning, Google Colab, Jupyter, Python, Saturn Cloud
- Exploring the Adoption of Python in the Workplace – Free Metis Corporate Training Webinar - Mar 18, 2020.
Metis will break down Python for data science and analytics, explain what is driving adoption in the field, and discuss how industries and companies are reacting to the shift.
Industry, Metis, Python
- Five Interesting Data Engineering Projects - Mar 17, 2020.
As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.
Dask, Data Engineering, dbt, DVC, Python
- Python Pandas For Data Discovery in 7 Simple Steps - Mar 10, 2020.
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
Beginners, Data Preparation, Pandas, Python
- Generate Realistic Human Face using GAN - Mar 10, 2020.
This article contain a brief intro to Generative Adversarial Network(GAN) and how to build a Human Face Generator.
GANs, Generative Adversarial Network, Neural Networks, Python
- Tokenization and Text Data Preparation with TensorFlow & Keras - Mar 6, 2020.
This article will look at tokenizing and further preparing text data for feeding into a neural network using TensorFlow and Keras preprocessing tools.
Data Preprocessing, Keras, NLP, Python, TensorFlow, Text Analytics, Tokenization
- TensorFlow 2.0 Tutorial: Optimizing Training Time Performance - Mar 5, 2020.
Tricks to improve TensorFlow training time with tf.data pipeline optimizations, mixed precision training and multi-GPU strategies.
Neural Networks, Optimization, Python, TensorFlow, Training
- Recreating Fingerprints using Convolutional Autoencoders - Mar 4, 2020.
The article gets you started working with fingerprints using Deep Learning.
Autoencoder, Convolutional Neural Networks, Neural Networks, Python
- KDnuggets™ News 20:n09, Mar 4: When Will AutoML replace Data Scientists (if ever) – vote; 20 AI, DS, ML Terms You Need to Know (part 2) - Mar 4, 2020.
AutoML, Data Science Education, Decision Trees, Key Terms, Probability, Python, R
- 5 Google Colaboratory Tips - Mar 2, 2020.
Are you looking for some tips for using Google Colab for your projects? This article presents five you may find useful.
Google, Google Colab, Jupyter, Python, Tips
- Hands on Hyperparameter Tuning with Keras Tuner - Feb 28, 2020.
Or how hyperparameter tuning with Keras Tuner can boost your object classification network's accuracy by 10%.
Automated Machine Learning, AutoML, Keras, Python
Python and R Courses for Data Science - Feb 26, 2020.
Since Python and R are a must for today's data scientists, continuous learning is paramount. Online courses are arguably the best and most flexible way to upskill throughout ones career.
Coursera, Data Science, edX, MOOC, Programming, Python, R
- Audio Data Analysis Using Deep Learning with Python (Part 2) - Feb 25, 2020.
This is a followup to the first article in this series. Once you are comfortable with the concepts explained in that article, you can come back and continue with this.
Audio, Data Preprocessing, Deep Learning, Python
- The Forgotten Algorithm - Feb 20, 2020.
This article explores Monte Carlo Simulation with Streamlit.
Monte Carlo, Python, Simulation, Streamlit
Audio Data Analysis Using Deep Learning with Python (Part 1) - Feb 19, 2020.
A brief introduction to audio data processing and genre classification using Neural Networks and python.
Audio, Data Processing, Deep Learning, Python
- KDnuggets™ News 20:n07, Feb 19: 20 AI, Data Science, Machine Learning Terms for 2020; Why Did I Reject a Data Scientist Job? - Feb 19, 2020.
This week on KDnuggets: 20 AI, Data Science, Machine Learning Terms You Need to Know in 2020; Why Did I Reject a Data Scientist Job?; Fourier Transformation for a Data Scientist; Math for Programmers; Deep Neural Networks; Practical Hyperparameter Optimization; and much more!
AI, API, Data Science, Data Scientist, Health, Key Terms, Machine Learning, Mathematics, Neural Networks, Python
- Using the Fitbit Web API with Python - Feb 18, 2020.
Fitbit provides a Web API for accessing data from Fitbit activity trackers. Check out this updated tutorial to accessing this Fitbit data using the API with Python.
API, Fitness, Health, Python
Fourier Transformation for a Data Scientist - Feb 14, 2020.
The article contains a brief intro into Fourier transformation mathematically and its applications in AI.
Data Science, Data Scientist, Mathematics, Python
- Adversarial Validation Overview - Feb 13, 2020.
Learn how to implement adversarial validation that builds a classifier to determine if your data is from the training or testing sets. If you can do this, then your data has issues, and your adversarial validation model can help you diagnose the problem.
Adversarial, Kaggle, Machine Learning, Python, Validation
- Practical Hyperparameter Optimization - Feb 13, 2020.
An introduction on how to fine-tune Machine and Deep Learning models using techniques such as: Random Search, Automated Hyperparameter Tuning and Artificial Neural Networks Tuning.
Automated Machine Learning, AutoML, Deep Learning, Hyperparameter, Machine Learning, Optimization, Python, scikit-learn
- Easy Image Dataset Augmentation with TensorFlow - Feb 13, 2020.
What can we do when we don't have a substantial amount of varied training data? This is a quick intro to using data augmentation in TensorFlow to perform in-memory image transformations during model training to help overcome this data impediment.
Data Preprocessing, Image Processing, Image Recognition, Python, TensorFlow
- Sharing your machine learning models through a common API - Feb 12, 2020.
DEEPaaS API is a software component developed to expose machine learning models through a REST API. In this article we describe how to do it.
API, Deep Learning, Machine Learning, Open Source, Python
- Intent Recognition with BERT using Keras and TensorFlow 2 - Feb 10, 2020.
TL;DR Learn how to fine-tune the BERT model for text classification. Train and evaluate it on a small dataset for detecting seven intents. The results might surprise you!
BERT, Keras, NLP, Python, TensorFlow
- Getting up and Running with Python: Installing Anaconda on Windows - Feb 6, 2020.
This tutorial covers how to download and install Anaconda on Windows; how to test your installation; how to fix common installation issues; and what to do after installing Anaconda.
Anaconda, Python
- Create Your Own Computer Vision Sandbox - Feb 5, 2020.
This post covers a wide array of computer vision tasks, from automated data collection to CNN model building.
Computer Vision, Convolutional Neural Networks, Python
- Audio File Processing: ECG Audio Using Python - Feb 4, 2020.
In this post, we will look into an application of audio file processing, for a good cause — Analysis of ECG Heart beat and write code in python.
Audio, Data Processing, Health, Python
How to Optimize Your Jupyter Notebook - Jan 30, 2020.
This article walks through some simple tricks on improving your Jupyter Notebook experience, and covers useful shortcuts, adding themes, automatically generated table of contents, and more.
Jupyter, Optimization, Python
- Generating English Pronoun Questions Using Neural Coreference Resolution - Jan 29, 2020.
This post will introduce a practical method for generating English pronoun questions from any story or article. Learn how to take an additional step toward computationally understanding language.
NLP, Python, spaCy, Text Analytics
- Exoplanet Hunting Using Machine Learning - Jan 28, 2020.
Search for exoplanets — those planets beyond our own solar system — using machine learning, and implement these searches in Python.
Cosmology, Machine Learning, Python
- The 5 Most Useful Techniques to Handle Imbalanced Datasets - Jan 22, 2020.
This post is about explaining the various techniques you can use to handle imbalanced datasets.
Balancing Classes, Datasets, Metrics, Python, Sampling, Unbalanced
- Random Forest® — A Powerful Ensemble Learning Algorithm - Jan 22, 2020.
The article explains the Random Forest algorithm and how to build and optimize a Random Forest classifier.
Algorithms, Ensemble Methods, Python, random forests algorithm
- 10 Python String Processing Tips & Tricks - Jan 20, 2020.
Pursuing a text analytics path but don't know where to start? Try this string processing primer to first gain an understanding of using Python to manipulate and process strings at a basic level.
Data Processing, Programming, Python, Text Analytics
- Geovisualization with Open Data - Jan 15, 2020.
In this post I want to show how to use public available (open) data to create geo visualizations in python. Maps are a great way to communicate and compare information when working with geolocation data. There are many frameworks to plot maps, here I focus on matplotlib and geopandas (and give a glimpse of mplleaflet).
Germany, Maps, Open Data, Python, Visualization
- KDnuggets™ News 20:n02, Jan 15: Top 5 Must-have Data Science Skills; Learn Machine Learning with THIS Book - Jan 15, 2020.
This week: learn the 5 must-have data science skills for the new year; find out which book is THE book to get started learning machine learning; pick up some Python tips and tricks; learn SQL, but learn it the hard way; and find an introductory guide to learning common NLP techniques.
Books, Data Science, Data Science Skills, Machine Learning, NLP, Programming, Python, SQL, Tips
- KDnuggets™ News 20:n01, Jan 8: How to “Ultralearn” Data Science; How teams do AutoML? - Jan 8, 2020.
First issue of 2020 brings you a summary of how to "Ultralearn" Data Science - for those in a hurry; Explains how teams work on AutoML project; Why Python is a preferred language for Data Science; and a cartoon on teaching ethics to AI.
AutoML, Data Science Team, Python, Ultralearn
10 Python Tips and Tricks You Should Learn Today - Jan 8, 2020.
Check out this collection of 10 Python snippets that can be taken as a reference for your daily work.
Programming, Python, Tips
- H2O Framework for Machine Learning - Jan 6, 2020.
This article is an overview of H2O, a scalable and fast open-source platform for machine learning. We will apply it to perform classification tasks.
Automated Machine Learning, AutoML, H2O, Machine Learning, Python
- How to Convert a Picture to Numbers - Jan 6, 2020.
Reducing images to numbers makes them amenable to computation. Let's take a look at the why and the how using Python and Numpy.
Computer Vision, Image Processing, numpy, Python
- Why Python is One of the Most Preferred Languages for Data Science? - Jan 3, 2020.
Why do most data scientists love Python? Learn more about how so many well-developed Python packages can help you accomplish your crucial data science tasks.
Data Exploration, Data Science, Programming Languages, Python
Predict Electricity Consumption Using Time Series Analysis - Jan 2, 2020.
Time series forecasting is a technique for the prediction of events through a sequence of time. In this post, we will be taking a small forecasting problem and try to solve it till the end learning time series forecasting alongside.
ARIMA, Electricity, Python, Time Series
- Top KDnuggets tweets, Dec 18-30: A Gentle Introduction to Math Behind Neural Networks - Dec 31, 2019.
A Gentle Introduction to #Math Behind #NeuralNetworks; Learn How to Quickly Create UIs in Python; I wanna be a data scientist, but... how!?; I created my own deepfake in two weeks
Career, Deepfakes, Mathematics, Neural Networks, Python, Top tweets
- Fighting Overfitting in Deep Learning - Dec 27, 2019.
This post outlines an attack plan for fighting overfitting in neural networks.
Deep Learning, Keras, Neural Networks, Overfitting, Python, Regularization, Transfer Learning
- Market Basket Analysis: A Tutorial - Dec 24, 2019.
This article is about Market Basket Analysis & the Apriori algorithm that works behind it.
Apriori, Association Rules, Data Mining, Python
- How to Convert an RGB Image to Grayscale - Dec 18, 2019.
This post is about working with a mixture of color and grayscale images and needing to transform them into a uniform format - all grayscale. We'll be working in Python using the Pillow, Numpy, and Matplotlib packages.
Compression, Computer Vision, Image Processing, Python
- KDnuggets™ News 19:n48, Dec 18: Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; Poll on AutoML - Dec 18, 2019.
Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; New Poll: Does AutoML work? Ultralearn Data Science; Python Dictionary How-To; Top stories of 2019 and more.
2020 Predictions, Data Science Education, Pandas, Python
- Pedestrian Detection Using Non Maximum Suppression Algorithm - Dec 17, 2019.
Read this overview of a complete pipeline for detecting pedestrians on the road.
Computer Vision, Image Recognition, Object Detection, Python
- Let’s Build an Intelligent Chatbot - Dec 17, 2019.
Check out this step by step approach to building an intelligent chatbot in Python.
Chatbot, NLP, NLTK, Python
Build Pipelines with Pandas Using pdpipe - Dec 13, 2019.
We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.
Data Preparation, Data Preprocessing, Pandas, Pipeline, Python
Plotnine: Python Alternative to ggplot2 - Dec 12, 2019.
Python's plotting libraries such as matplotlib and seaborn does allow the user to create elegant graphics as well, but lack of a standardized syntax for implementing the grammar of graphics compared to the simple, readable and layering approach of ggplot2 in R makes it more difficult to implement in Python.
Data Science, Data Visualization, Python, R
- Python Dictionary Guide: 10 Python Dictionary Methods & Examples - Dec 12, 2019.
Master Python Dictionaries and their essential functions in 15 minutes with this introductory guide.
Programming, Python
- Top KDnuggets tweets, Dec 04-10: AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020 - Dec 11, 2019.
AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments and Key Trends; Down with technical debt! Clean #Python for #DataScientists; Calculate Similarity - the most relevant Metrics in a Nutshell.
2020 Predictions, Metrics, Python, Similarity, Top tweets
- Interpretability: Cracking open the black box, Part 2 - Dec 11, 2019.
The second part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers post-hoc interpretation that is useful when the model is not transparent.
Explainability, Explainable AI, Feature Selection, Interpretability, Python
- 5 Great New Features in Latest Scikit-learn Release - Dec 10, 2019.
From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.
Data Preparation, Data Preprocessing, Ensemble Methods, Feature Selection, Gradient Boosting, K-nearest neighbors, Machine Learning, Missing Values, Python, scikit-learn, Visualization
10 Free Top Notch Machine Learning Courses - Dec 6, 2019.
Are you interested in studying machine learning over the holidays? This collection of 10 free top notch courses will allow you to do just that, with something for every approach to improving your machine learning skills.
Books, Computer Vision, Courses, Deep Learning, Explainability, Graph Analytics, Interpretability, Machine Learning, NLP, Python
- Lit BERT: NLP Transfer Learning In 3 Steps - Nov 29, 2019.
PyTorch Lightning is a lightweight framework which allows anyone using PyTorch to scale deep learning code easily while making it reproducible. In this tutorial we’ll use Huggingface's implementation of BERT to do a finetuning task in Lightning.
BERT, NLP, Python, PyTorch Lightning, Transfer Learning
Open Source Projects by Google, Uber and Facebook for Data Science and AI - Nov 28, 2019.
Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public.
Advice, AI, Data Science, Data Scientist, Data Visualization, Deep Learning, Facebook, Google, Open Source, Python, Uber
Getting Started with Automated Text Summarization - Nov 28, 2019.
This article will walk through an extractive text summarization process, using a simple word frequency approach, implemented in Python.
NLP, Python, Text Analytics, Text Summarization
- KDnuggets™ News 19:n45, Nov 27: Interpretable vs black box models; Advice for New and Junior Data Scientists - Nov 27, 2019.
This week: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead; Advice for New and Junior Data Scientists; Python Tuples and Tuple Methods; Can Neural Networks Develop Attention? Google Thinks they Can; Three Methods of Data Pre-Processing for Text Classification
Advice, Attention, Data Scientist, Machine Learning, Modeling, Neural Networks, NLP, Programming, Python, Text Classification
- Content-based Recommender Using Natural Language Processing (NLP) - Nov 26, 2019.
A guide to build a content-based movie recommender model based on NLP.
Movies, Netflix, NLP, Python, Recommender Systems
Automated Machine Learning Project Implementation Complexities - Nov 22, 2019.
To demonstrate the implementation complexity differences along the AutoML highway, let's have a look at how 3 specific software projects approach the implementation of just such an AutoML "solution," namely Keras Tuner, AutoKeras, and automl-gs.
Automated Machine Learning, Keras, Pipeline, Python
Python, Selenium & Google for Geocoding Automation: Free and Paid - Nov 21, 2019.
This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API.
Automation, Geocode, Geoscience, Geospatial, Google, Python, Selenium, Web Scraping
- The Notebook Anti-Pattern - Nov 21, 2019.
This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way.
Jupyter, Python
- Python Tuples and Tuple Methods - Nov 21, 2019.
Brush up on your Python basics with this post on creating, using, and manipulating tuples.
Programming, Python
Data Science for Managers: Programming Languages - Nov 19, 2019.
In this article, we are going to talk about popular languages for Data Science and briefly describe each of them.
Data Science, Manager, MATLAB, Octave, Programming Languages, Python, R, Scala
- GitHub Repo Raider and the Automation of Machine Learning - Nov 18, 2019.
Since X never, ever marks the spot, this article raids the GitHub repos in search of quality automated machine learning resources. Read on for projects and papers to help understand and implement AutoML.
Automated Machine Learning, GitHub, Machine Learning, Movies, Python
- Python Lists and List Manipulation - Nov 15, 2019.
In Python, lists store an ordered collection of items which can be of different types. This post is an overview of lists and their manipulation.
Programming, Python
- How to Visualize Data in Python (and R) - Nov 14, 2019.
Producing accessible data visualizations is a key data science skill. The following guidelines will help you create the best representations of your data using R and Python's Pandas library.
Data Visualization, Matplotlib, Python, R, SuperDataScience
- Testing Your Machine Learning Pipelines - Nov 14, 2019.
Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.
Machine Learning, Pipeline, Python
- Python Workout / Practices of a Python Pro / Classic Computer Science Problems in Python - Nov 13, 2019.
Whether you’re a beginner or an expert, there’s always new ways you can improve your Python coding. Save 40% off this trio of Manning Python books today! Just enter the code nlpropython40 at checkout when you buy from manning.com.
Book, Manning, Python
- Beginners Guide to the Three Types of Machine Learning - Nov 13, 2019.
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.
Beginners, Classification, Machine Learning, Python, Regression, scikit-learn, Supervised Learning, Unsupervised Learning
- KDnuggets™ News 19:n43, Nov 13: Dynamic Reports in Python and R; Creating NLP Vocabularies; What is Data Science? - Nov 13, 2019.
On KDnuggets this week: Orchestrating Dynamic Reports in Python and R with Rmd Files; How to Create a Vocabulary for NLP Tasks in Python; What is Data Science?; The Complete Data Science LinkedIn Profile Guide; Set Operations Applied to Pandas DataFrames; and much, much more.
Data Science, Deep Learning, Facebook, LinkedIn, NLP, Pandas, Python, R, Reinforcement Learning, Report

How to Speed up Pandas by 4x with one line of code - Nov 12, 2019.
While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.
Data Preparation, Data Preprocessing, Modin, Pandas, Python
Understanding Boxplots - Nov 8, 2019.
A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
Data Visualization, Matplotlib, Pandas, Python, Seaborn
- Orchestrating Dynamic Reports in Python and R with Rmd Files - Nov 8, 2019.
Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis
Python, R, Report
- Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.
Beginners, Data Cleaning, Data Preprocessing, Pandas, Python, Sciforce
- Set Operations Applied to Pandas DataFrames - Nov 7, 2019.
In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.
Data Preparation, Data Science, Pandas, Python
- How to Create a Vocabulary for NLP Tasks in Python - Nov 7, 2019.
This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.
Data Preparation, Data Preprocessing, NLP, Python
- Customer Segmentation Using K Means Clustering - Nov 4, 2019.
Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.
Clustering, Customer Analytics, K-means, Python, Segmentation
- Build an Artificial Neural Network From Scratch: Part 1 - Nov 1, 2019.
This article focused on building an Artificial Neural Network using the Numpy Python library.
Neural Networks, numpy, Python
- How to Build Your Own Logistic Regression Model in Python - Oct 31, 2019.
A hands on guide to Logistic Regression for aspiring data scientist and machine learning engineer.
Logistic Regression, Machine Learning, Python
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
Apache Spark, Data Analytics, Feature Selection, Knime, NLP, Pandas, Python, scikit-learn, Time Series