Activation maps for deep learning models in a few lines of code - Oct 10, 2019.
We illustrate how to show the activation maps of various layers in a deep CNN model with just a couple of lines of code.
Architecture, Deep Learning, Neural Networks, Python
The 4 Quadrants of Data Science Skills and 7 Principles for Creating a Viral Data Visualization - Oct 7, 2019.
As a data scientist, your most important skill is creating meaningful visualizations to disseminate knowledge and impact your organization or client. These seven principals will guide you toward developing charts with clarity, as exemplified with data from a recent KDnuggets poll.
Data Science, Data Science Skills, Data Visualization, Excel, Java, Python, Skills, TensorFlow
- What is Hierarchical Clustering? - Sep 27, 2019.
The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.
Clustering, Machine Learning, Python
- Natural Language in Python using spaCy: An Introduction - Sep 26, 2019.
This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.
NLP, Paco Nathan, Python, spaCy
- A Single Function to Streamline Image Classification with Keras - Sep 23, 2019.
We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.
Image Classification, Image Recognition, Keras, Python
- A Gentle Introduction to PyTorch 1.2 - Sep 20, 2019.
This comprehensive tutorial aims to introduce the fundamentals of PyTorch building blocks for training neural networks.
Neural Networks, Python, PyTorch
- Applying Data Science to Cybersecurity Network Attacks & Events - Sep 19, 2019.
Check out this detailed tutorial on applying data science to the cybersecurity domain, written by an individual with backgrounds in both fields.
Cybersecurity, Data Science, Machine Learning, Python, Security
- 5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python - Sep 19, 2019.
“I want to learn machine learning and artificial intelligence, where do I start?” Here.
Beginners, Data Science, Machine Learning, Python
Which Data Science Skills are core and which are hot/emerging ones? - Sep 17, 2019.
We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.
Career, Data Science Skills, Data Visualization, Deep Learning, Excel, Machine Learning, Poll, Python, PyTorch, Scala, Skills, Statistics, TensorFlow
Explore the world of Bioinformatics with Machine Learning - Sep 17, 2019.
The article contains a brief introduction of Bioinformatics and how a machine learning classification algorithm can be used to classify the type of cancer in each patient by their gene expressions.
Bioinformatics, Machine Learning, Python
- 5 Step Guide to Scalable Deep Learning Pipelines with d6tflow - Sep 16, 2019.
How to turn a typical pytorch script into a scalable d6tflow DAG for faster research & development.
Deep Learning, Pipeline, Python, PyTorch, Workflow
- Ensemble Methods for Machine Learning: AdaBoost - Sep 12, 2019.
It turned out that, if we ask the weak algorithm to create a whole bunch of classifiers (all weak for definition), and then combine them all, what may figure out is a stronger classifier.
Adaboost, Ensemble Methods, Machine Learning, Python
Train sklearn 100x Faster - Sep 11, 2019.
As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training. One of those solutions is to combine elements from Spark and scikit-learn into our own hybrid solution.
Distributed Systems, Machine Learning, Python, scikit-learn, Training
The 5 Graph Algorithms That Data Scientists Should Know - Sep 10, 2019.
In this post, I am going to be talking about some of the most important graph algorithms you should know and how to implement them using Python.
Algorithms, Data Science, Data Scientist, Graph, Python
- OpenStreetMap Data to ML Training Labels for Object Detection - Sep 9, 2019.
I am really interested in creating a tight, clean pipeline for disaster relief applications, where we can use something like crowd sourced building polygons from OSM to train a supervised object detector to discover buildings in an unmapped location.
Geospatial, Machine Learning, Object Detection, Python

10 Great Python Resources for Aspiring Data Scientists - Sep 9, 2019.
This is a collection of 10 interesting resources in the form of articles and tutorials for the aspiring data scientist new to Python, meant to provide both insight and practical instruction when starting on your journey.
Data Science, Data Scientist, Programming, Python
- Build Your First Voice Assistant - Sep 6, 2019.
Hone your practical speech recognition application skills with this overview of building a voice assistant using Python.
Machine Learning, NLP, Python, Speech Recognition
- An Easy Introduction to Machine Learning Recommender Systems - Sep 4, 2019.
Recommender systems are an important class of machine learning algorithms that offer "relevant" suggestions to users. Categorized as either collaborative filtering or a content-based system, check out how these approaches work along with implementations to follow from example code.
Beginners, Machine Learning, Python, Recommendation Engine, Recommender Systems
Python Libraries for Interpretable Machine Learning - Sep 4, 2019.
In the following post, I am going to give a brief guide to four of the most established packages for interpreting and explaining machine learning models.
Bias, Interpretability, LIME, Machine Learning, Python, SHAP
- An Overview of Topics Extraction in Python with Latent Dirichlet Allocation - Sep 4, 2019.
A recurring subject in NLP is to understand large corpus of texts through topics extraction. Whether you analyze users’ online reviews, products’ descriptions, or text entered in search bars, understanding key topics will always come in handy.
LDA, NLP, Python, Text Analytics, Topic Modeling
- Automate your Python Scripts with Task Scheduler: Windows Task Scheduler to Scrape Alternative Data - Sep 3, 2019.
In this tutorial, you will learn how to run task scheduler to web scrape data from Lazada (eCommerce) website and dump it into SQLite RDBMS Database.
Data Science, Python, Web Scraping
Object-oriented programming for data scientists: Build your ML estimator - Aug 30, 2019.
Implement some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better.
Data Scientist, Machine Learning, Programming, Python
- 4 Tips for Advanced Feature Engineering and Preprocessing - Aug 29, 2019.
Techniques for creating new features, detecting outliers, handling imbalanced data, and impute missing values.
Data Preprocessing, Feature Engineering, Python, Tips
Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch - Aug 23, 2019.
Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.
Backpropagation, Neural Networks, numpy, Python
- Comparing Decision Tree Algorithms: Random Forest® vs. XGBoost - Aug 21, 2019.
Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.
ActiveState, Decision Trees, Python, random forests algorithm, XGBoost
- Understanding Decision Trees for Classification in Python - Aug 21, 2019.
This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.
Classification, Decision Trees, Python, scikit-learn
- Automate Stacking In Python: How to Boost Your Performance While Saving Time - Aug 21, 2019.
Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.
Algorithms, Big Data, Data Science, Python
- An Overview of Python’s Datatable package - Aug 20, 2019.
Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.
Big Data, Data Science, Python
Deep Learning for NLP: Creating a Chatbot with Keras! - Aug 19, 2019.
Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant?
Chatbot, Deep Learning, Keras, NLP, Python
- Pytorch Lightning vs PyTorch Ignite vs Fast.ai - Aug 16, 2019.
Here, I will attempt an objective comparison between all three frameworks. This comparison comes from laying out similarities and differences objectively found in tutorials and documentation of all three frameworks.
fast.ai, Neural Networks, Python, PyTorch, PyTorch Lightning
- Learn how to use PySpark in under 5 minutes (Installation + Tutorial) - Aug 13, 2019.
Apache Spark is one of the hottest and largest open source project in data processing framework with rich high-level APIs for the programming languages like Scala, Python, Java and R. It realizes the potential of bringing together both Big Data and machine learning.
Apache Spark, Big Data, Data Science, Python
- A 2019 Guide to Semantic Segmentation - Aug 12, 2019.
Semantic segmentation refers to the process of linking each pixel in an image to a class label. These labels could include a person, car, flower, piece of furniture, etc., just to mention a few. We’ll now look at a number of research papers on covering state-of-the-art approaches to building semantic segmentation models.
Pages: 1 2
Image Classification, Image Recognition, Python, Segmentation
- Keras Callbacks Explained In Three Minutes - Aug 9, 2019.
A gentle introduction to callbacks in Keras. Learn about EarlyStopping, ModelCheckpoint, and other callback functions with code examples.
Explained, Keras, Neural Networks, Python
- Introduction to Image Segmentation with K-Means clustering - Aug 9, 2019.
Image segmentation is the classification of an image into different groups. Many kinds of research have been done in the area of image segmentation using clustering. In this article, we will explore using the K-Means clustering algorithm to read an image and cluster different regions of the image.
Clustering, Computer Vision, Image Recognition, K-means, Python, Segmentation
- Exploratory Data Analysis Using Python - Aug 7, 2019.
In this tutorial, you’ll use Python and Pandas to explore a dataset and create visual distributions, identify and eliminate outliers, and uncover correlations between two datasets.
ActiveState, Data Analysis, Data Exploration, Pandas, Python
- Feature selection by random search in Python - Aug 6, 2019.
Feature selection is one of the most important tasks in machine learning. Learn how to use a simple random search in Python to get good results in less time.
Collinearity, Cross-validation, Feature Selection, Python, Random
- 25 Tricks for Pandas - Aug 6, 2019.
Check out this video (and Jupyter notebook) which outlines a number of Pandas tricks for working with and manipulating data, covering topics such as string manipulations, splitting and filtering DataFrames, combining and aggregating data, and more.
Pandas, Python, Tips
- Lagrange multipliers with visualizations and code - Aug 6, 2019.
In this story, we’re going to take an aerial tour of optimization with Lagrange multipliers. When do we need them? Whenever we have an optimization problem with constraints.
Analytics, Mathematics, Optimization, Python
- Pytorch Cheat Sheet for Beginners and Udacity Deep Learning Nanodegree - Aug 2, 2019.
This cheatsheet should be easier to digest than the official documentation and should be a transitional tool to get students and beginners to get started reading documentations soon.
Beginners, Cheat Sheet, Deep Learning, Google Colab, Python, PyTorch, Udacity
- How a simple mix of object-oriented programming can sharpen your deep learning prototype - Aug 1, 2019.
By mixing simple concepts of object-oriented programming, like functionalization and class inheritance, you can add immense value to a deep learning prototyping code.
Deep Learning, Keras, Programming, Python
- Here’s how you can accelerate your Data Science on GPU - Jul 30, 2019.
Data Scientists need computing power. Whether you’re processing a big dataset with Pandas or running some computation on a massive matrix with Numpy, you’ll need a powerful machine to get the job done in a reasonable amount of time.
Big Data, Data Science, DBSCAN, Deep Learning, GPU, NVIDIA, Python
Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras - Jul 26, 2019.
Different neural network architectures excel in different tasks. This particular article focuses on crafting convolutional neural networks in Python using TensorFlow and Keras.
Convolutional Neural Networks, Keras, Neural Networks, Python, TensorFlow
- Easy, One-Click Jupyter Notebooks - Jul 24, 2019.
All of the setup for software, networking, security, and libraries is automatically taken care of by the Saturn Cloud system. Data Scientists can then focus on the actual Data Science and not the tedious infrastructure work that falls around it
Big Data, Cloud, Data Science, Data Scientist, DevOps, Jupyter, Python, Saturn Cloud
- Kaggle Kernels Guide for Beginners: A Step by Step Tutorial - Jul 23, 2019.
This is an attempt to hold the hands of a complete beginner and walk them through the world of Kaggle Kernels — for them to get started.
Kaggle, Python, R
- Computer Vision for Beginners: Part 1 - Jul 17, 2019.
Image processing is performing some operations on images to get an intended manipulation. Think about what we do when we start a new data analysis. We do some data preprocessing and feature engineering. It’s the same with image processing.
Computer Vision, Deep Learning, Image Processing, Python
Dealing with categorical features in machine learning - Jul 16, 2019.
Many machine learning algorithms require that their input is numerical and therefore categorical features must be transformed into numerical features before we can use any of these algorithms.
Data Cleaning, Data Preprocessing, Feature Engineering, Machine Learning, Python
Training a Neural Network to Write Like Lovecraft - Jul 11, 2019.
In this post, the author attempts to train a neural network to generate Lovecraft-esque prose, known to be awkward and irregular at best. Did it end in success? If not, any suggestions on how it might have? Read on to find out.
Keras, LSTM, Natural Language Generation, Neural Networks, Python, TensorFlow
- 10 Simple Hacks to Speed up Your Data Analysis in Python - Jul 11, 2019.
This article lists some curated tips for working with Python and Jupyter Notebooks, covering topics such as easily profiling data, formatting code and output, debugging, and more. Hopefully you can find something useful within.
Data Analysis, Jupyter, Pandas, Python, Tips
- A Gentle Guide to Starting Your NLP Project with AllenNLP - Jul 10, 2019.
For those who aren’t familiar with AllenNLP, I will give a brief overview of the library and let you know the advantages of integrating it to your project.
Allen Institute, NLP, Python, Sentiment Analysis
- Practical Speech Recognition with Python: The Basics - Jul 9, 2019.
Do you fear implementing speech recognition in your Python apps? Read this tutorial for a simple approach to getting practical with speech recognition using open source Python libraries.
Google, NLP, Python, Speech Recognition
- Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps - Jul 9, 2019.
A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.
Data Visualization, Python, Statistics
- XGBoost and Random Forest® with Bayesian Optimisation - Jul 8, 2019.
This article will explain how to use XGBoost and Random Forest with Bayesian Optimisation, and will discuss the main pros and cons of these methods.
Bayesian, Optimization, Python, random forests algorithm, XGBoost
- Classifying Heart Disease Using K-Nearest Neighbors - Jul 8, 2019.
I have written this post for the developers and assumes no background in statistics or mathematics. The focus is mainly on how the k-NN algorithm works and how to use it for predictive modeling problems.
Pages: 1 2
Healthcare, K-nearest neighbors, Machine Learning, Medical, Python
- Building a Recommender System, Part 2 - Jul 3, 2019.
This post explores an technique for collaborative filtering which uses latent factor models, a which naturally generalizes to deep learning approaches. Our approach will be implemented using Tensorflow and Keras.
Movies, Python, Recommendation Engine, Recommender Systems
- How do you check the quality of your regression model in Python? - Jul 2, 2019.
Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.
Data Science, Multicollinearity, Python, Regression, Statistics
- PySyft and the Emergence of Private Deep Learning - Jun 27, 2019.
PySyft is an open-source framework that enables secured, private computations in deep learning, by combining federated learning and differential privacy in a single programming model integrated into different deep learning frameworks such as PyTorch, Keras or TensorFlow.
Deep Learning, Differential Privacy, Privacy, Python, Security
- An Overview of Outlier Detection Methods from PyOD – Part 1 - Jun 27, 2019.
PyOD is an outlier detection package developed with a comprehensive API to support multiple techniques. This post will showcase Part 1 of an overview of techniques that can be used to analyze anomalies in data.
Algorithms, Big Data, Outliers, Python
- Optimization with Python: How to make the most amount of money with the least amount of risk? - Jun 26, 2019.
Learn how to apply Python data science libraries to develop a simple optimization problem based on a Nobel-prize winning economic theory for maximizing investment profits while minimizing risk.
Finance, Investment, Optimization, Python, Risk Modeling, Stocks
7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition - Jun 24, 2019.
Interested in mastering data preparation with Python? Follow these 7 steps which cover the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
7 Steps, Data Preparation, Data Preprocessing, Data Science, Data Wrangling, Machine Learning, Pandas, Python
- One Simple Trick for Speeding up your Python Code with Numpy - Jun 19, 2019.
Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.
Big Data, numpy, Python
- K-means Clustering with Dask: Image Filters for Cat Pictures - Jun 18, 2019.
How to recreate an original cat image with least possible colors. An interesting use case of Unsupervised Machine Learning with K Means Clustering in Python.
Clustering, Dask, Image Classification, Image Recognition, K-means, Python, Unsupervised Learning

Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS - Jun 17, 2019.
Data science jobs continue to grow in 2019, and this report shares the change and spread of jobs by software over recent years.
Data Science, indeed, Jobs, Python, R, SAS, TensorFlow
- How to Use Python’s datetime - Jun 17, 2019.
Python's datetime package is a convenient set of tools for working with dates and times. With just the five tricks that I’m about to show you, you can handle most of your datetime processing needs.
Programming, Python, Time Series
How to Learn Python for Data Science the Right Way - Jun 14, 2019.
The biggest mistake you can make while learning Python for data science is to learn Python programming from courses meant for programmers. Avoid this mistake, and learn Python the right way by following this approach.
Advice, Data Science, Jupyter, Matplotlib, Pandas, Python, scikit-learn, StatsModels
- Become a Pro at Pandas, Python’s Data Manipulation Library - Jun 13, 2019.
Pandas is one of the most popular Python libraries for cleaning, transforming, manipulating and analyzing data. Learn how to efficiently handle large amounts of data using Pandas.
Matplotlib, numpy, Pandas, Python, SQL
- Scalable Python Code with Pandas UDFs: A Data Science Application - Jun 13, 2019.
There is still a gap between the corpus of libraries that developers want to apply in a scalable runtime and the set of libraries that support distributed execution. This post discusses how to bridge this gap using the the functionality provided by Pandas UDFs in Spark 2.3+
Apache Spark, Big Data, Pandas, Python
- How to Automate Hyperparameter Optimization - Jun 12, 2019.
A step-by-step guide into performing a hyperparameter optimization task on a deep learning model by employing Bayesian Optimization that uses the Gaussian Process. We used the gp_minimize package provided by the Scikit-Optimize (skopt) library to perform this task.
Bayesian, Deep Learning, Hyperparameter, Machine Learning, Neural Networks, Optimization, Python, TensorFlow
What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem - Jun 10, 2019.
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
Anaconda, Apache Spark, Big Data Software, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, Tableau, TensorFlow
PyViz: Simplifying the Data Visualisation Process in Python - Jun 6, 2019.
There are python libraries suitable for basic data visualizations but not for complicated ones, and there are libraries suitable only for complex visualizations. Is there a single library that handles both these tasks efficiently? The answer is yes. It's PyViz
Data Visualization, GitHub, Matplotlib, Python
- The Whole Data Science World in Your Hands - Jun 5, 2019.
Testing MatrixDS capabilities on different languages and tools: Python, R and Julia. If you work with data you have to check this out.
Data Science, Data Scientist, Julia, Jupyter, MatrixDS, Python, R
- KDnuggets™ News 19:n21, Jun 5: Transitioning your Career to Data Science; 11 top Data Science, Machine Learning platforms; 7 Steps to Mastering Intermediate ML w. Python - Jun 5, 2019.
The results of KDnuggets 20th Annual Software Poll; How to transition to a Data Science career; Mastering Intermediate Machine Learning with Python ; Understanding Natural Language Processing (NLP); Backprop as applied to LSTM, and much more.
Backpropagation, Data Science Platform, LSTM, Machine Learning, NLP, Python
- The Hitchhiker’s Guide to Feature Extraction - Jun 3, 2019.
Check out this collection of tricks and code for Kaggle and everyday work.
Feature Engineering, Feature Extraction, Feature Selection, Kaggle, Python
7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
7 Steps, Classification, Cross-validation, Dimensionality Reduction, Feature Engineering, Feature Selection, Image Classification, K-nearest neighbors, Machine Learning, Modeling, Naive Bayes, numpy, Pandas, PCA, Python, scikit-learn, Transfer Learning

Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Pages: 1 2
Anaconda, Apache Spark, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, TensorFlow
- Who is your Golden Goose?: Cohort Analysis - May 30, 2019.
Step-by-step tutorial on how to perform customer segmentation using RFM analysis and K-Means clustering in Python.
Pages: 1 2
Clustering, Data Analysis, K-means, Python, Retail
- Animations with Matplotlib - May 30, 2019.
Animations make even more sense when depicting time series data like stock prices over the years, climate change over the past decade, seasonalities and trends since we can then see how a particular parameter behaves with time.
Data Science, Data Visualization, Matplotlib, Python
- Boost Your Image Classification Model - May 27, 2019.
Check out this collection of tricks to improve the accuracy of your classifier.
fast.ai, Generative Adversarial Network, Image Classification, Image Recognition, Python
- Analyzing Tweets with NLP in Minutes with Spark, Optimus and Twint - May 24, 2019.
Social media has been gold for studying the way people communicate and behave, in this article I’ll show you the easiest way of analyzing tweets without the Twitter API and scalable for Big Data.
Pages: 1 2
Apache Spark, Big Data, Deep Learning, Machine Learning, NLP, Optimus, Python, Twint
- PyCharm for Data Scientists - May 17, 2019.
This article is a discussion of some of PyCharm's features, and a comparison with Spyder, an another popular IDE for Python. Read on to find the benefits and drawbacks of PyCharm, and an outline of when to prefer it to Spyder and vice versa.
Data Science, Data Scientist, Programming, PyCharm, Python
- A Complete Exploratory Data Analysis and Visualization for Text Data: Combine Visualization and NLP to Generate Insights - May 9, 2019.
Visually representing the content of a text document is one of the most important tasks in the field of text mining as a Data Scientist or NLP specialist. However, there are some gaps between visualizing unstructured (text) data and structured data.
Pages: 1 2
Data Visualization, NLP, Plotly, Python, Text Analytics
- [White Paper] Unlocking the Power of Data Science & Machine Learning with Python - May 8, 2019.
This guide from ActiveState provides an executive overview of how you can implement Python for your team’s data science and machine learning initiatives.
ActiveState, Data Science, Machine Learning, Python, White Paper
- Linear Programming and Discrete Optimization with Python using PuLP - May 8, 2019.
Knowledge of such optimization techniques is extremely useful for data scientists and machine learning (ML) practitioners as discrete and continuous optimization lie at the heart of modern ML and AI systems as well as data-driven business analytics processes.
Pages: 1 2
Linear Programming, Optimization, Python
- Naive Bayes: A Baseline Model for Machine Learning Classification Performance - May 7, 2019.
We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.
Pages: 1 2
Algorithms, Data Science, Machine Learning, Naive Bayes, Python, scikit-learn, Statistics
- How to Automate Tasks on GitHub With Machine Learning for Fun and Profit - May 3, 2019.
Check this tutorial on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets.
Datasets, GitHub, Python, TensorFlow
- Modeling Price with Regularized Linear Model & XGBoost - May 2, 2019.
We are going to implement regularization techniques for linear regression of house pricing data. Our goal in price modeling is to model the pattern and ignore the noise.
Modeling, Python, Regularization, XGBoost
- Which Deep Learning Framework is Growing Fastest? - May 1, 2019.
In September 2018, I compared all the major deep learning frameworks in terms of demand, usage, and popularity. TensorFlow was the champion of deep learning frameworks and PyTorch was the youngest framework. How has the landscape changed?
Data Science, Data Scientist, Deep Learning, fast.ai, Keras, Python, PyTorch, TensorFlow
- Build Your First Chatbot Using Python & NLTK - May 1, 2019.
Today we will learn to create a simple chat assistant or chatbot using Python’s NLTK library.
Chatbot, NLP, NLTK, Python
Pandas DataFrame Indexing - Apr 29, 2019.
The goal of this post is identify a single strategy for pulling data from a DataFrame using the Pandas Python library that is straightforward to interpret and produces reliable results.
Data Science, Pandas, Python
- Graduating in GANs: Going From Understanding Generative Adversarial Networks to Running Your Own - Apr 25, 2019.
Read how generative adversarial networks (GANs) research and evaluation has developed then implement your own GAN to generate handwritten digits.
Pages: 1 2
Deep Learning, GANs, Generative Adversarial Network, Generative Models, MNIST, Neural Networks, Python

Data Visualization in Python: Matplotlib vs Seaborn - Apr 19, 2019.
Seaborn and Matplotlib are two of Python's most powerful visualization libraries. Seaborn uses fewer syntax and has stunning default themes and Matplotlib is more easily customizable through accessing the classes.
Advice, Data Visualization, Matplotlib, Python, Seaborn
- Building a Flask API to Automatically Extract Named Entities Using SpaCy - Apr 17, 2019.
This article discusses how to use the Named Entity Recognition module in spaCy to identify people, organizations, or locations in text, then deploy a Python API with Flask.
API, Flask, NLP, Python
Data Science with Optimus Part 2: Setting your DataOps Environment - Apr 16, 2019.
Breaking down data science with Python, Spark and Optimus. Today: Data Operations for Data Science. Here we’ll learn to set-up Git, Travis CI and DVC for our project.
Apache Spark, Data Operations, Data Science, Python, Workflow
- Data Science with Optimus Part 1: Intro - Apr 15, 2019.
With Optimus you can clean your data, prepare it, analyze it, create profilers and plots, and perform machine learning and deep learning, all in a distributed fashion, because on the back-end we have Spark, TensorFlow, Sparkling Water and Keras. It’s super easy to use.
Apache Spark, Data Science, Python, Workflow
- All you need to know about text preprocessing for NLP and Machine Learning - Apr 9, 2019.
We present a comprehensive introduction to text preprocessing, covering the different techniques including stemming, lemmatization, noise removal, normalization, with examples and explanations into when you should use each of them.
Data Preprocessing, Machine Learning, NLP, Python, Text Analysis, Text Mining
- Building a Recommender System - Apr 4, 2019.
A beginners guide to building a recommendation system, with a step-by-step guide on how to create a content-based filtering system to recommend movies for a user to watch.
Movies, Python, Recommendation Engine, Recommender Systems
Predict Age and Gender Using Convolutional Neural Network and OpenCV - Apr 4, 2019.
Age and gender estimation from a single face image are important tasks in intelligent applications. As such, let's build a simple age and gender detection model in this detailed article.
Computer Vision, Convolutional Neural Networks, OpenCV, Python
- Which Face is Real? - Apr 2, 2019.
Which Face Is Real? was developed based on Generative Adversarial Networks as a web application in which users can select which image they believe is a true person and which was synthetically generated. The person in the synthetically generated photo does not exist.
Deep Learning, GANs, Generative Adversarial Network, Neural Networks, NVIDIA, Python
Explaining Random Forest® (with Python Implementation) - Mar 29, 2019.
We provide an in-depth introduction to Random Forest, with an explanation to how it works, its advantages and disadvantages, important hyperparameters and a full example Python implementation.
Explained, Machine Learning, Python, random forests algorithm
- A Beginner’s Guide to Linear Regression in Python with Scikit-Learn - Mar 29, 2019.
What linear regression is and how it can be implemented for both two variables and multiple variables using Scikit-Learn, which is one of the most popular machine learning libraries for Python.
Pages: 1 2
Beginners, Linear Regression, Python, scikit-learn
- Data Pipelines, Luigi, Airflow: Everything you need to know - Mar 27, 2019.
This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.
Data Workflow, Pipeline, Python, Workflow
R vs Python for Data Visualization - Mar 25, 2019.
This article demonstrates creating similar plots in R and Python using two of the most prominent data visualization packages on the market, namely ggplot2 and Seaborn.
Data Visualization, ggplot2, Matplotlib, Python, Python vs R, R, Seaborn
- Feature Reduction using Genetic Algorithm with Python - Mar 25, 2019.
This tutorial discusses how to use the genetic algorithm (GA) for reducing the feature vector extracted from the Fruits360 dataset in Python mainly using NumPy and Sklearn.
Pages: 1 2
Deep Learning, Feature Engineering, Genetic Algorithm, Neural Networks, numpy, Python, scikit-learn
- Deploy your PyTorch model to Production - Mar 20, 2019.
This tutorial aims to teach you how to deploy your recently trained model in PyTorch as an API using Python.
Data Science Education, Data Scientist, Deep Learning, Flask, Programming, Python, PyTorch
- Mastering Fast Gradient Boosting on Google Colaboratory with free GPU - Mar 19, 2019.
CatBoost is a fast implementation of GBDT with GPU support out-of-the-box. Google Colaboratory is a very useful tool with free GPU support.
CatBoost, Google Colab, GPU, Gradient Boosting, Machine Learning, Python, Yandex
- How to Train a Keras Model 20x Faster with a TPU for Free - Mar 19, 2019.
This post shows how to train an LSTM Model using Keras and Google CoLaboratory with TPUs to exponentially reduce training time compared to a GPU on your local machine.
Deep Learning, Google Colab, Keras, Python, TensorFlow, TPU
Artificial Neural Networks Optimization using Genetic Algorithm with Python - Mar 18, 2019.
This tutorial explains the usage of the genetic algorithm for optimizing the network weights of an Artificial Neural Network for improved performance.
Pages: 1 2
AI, Algorithms, Deep Learning, Machine Learning, Neural Networks, numpy, Optimization, Python
- Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision - Mar 15, 2019.
In this blog, I’ll walk you through a personal project in which I cheaply built a classifier to detect anti-semitic tweets, with no public dataset available, by combining weak supervision and transfer learning.
Pages: 1 2
Bias, fast.ai, NLP, Python, Text Classification, Transfer Learning, Twitter, ULMFiT
- Object Detection with Luminoth - Mar 13, 2019.
In this article you will learn about Luminoth, an open source computer vision library which sits atop Sonnet and TensorFlow and provides object detection for images and video.
Computer Vision, Image Recognition, Object Detection, Python

Who is a typical Data Scientist in 2019? - Mar 11, 2019.
We investigate what a typical data scientist looks like and see how this differs from this time last year, looking at skill set, programming languages, industry of employment, country of employment, and more.
Career, Data Science Skills, Data Scientist, Industry, MATLAB, Python, R, SQL
- Neural Networks with Numpy for Absolute Beginners: Introduction - Mar 5, 2019.
In this tutorial, you will get a brief understanding of what Neural Networks are and how they have been developed. In the end, you will gain a brief intuition as to how the network learns.
Beginners, Neural Networks, numpy, Python
4 Reasons Why Your Machine Learning Code is Probably Bad - Feb 26, 2019.
Your current ML workflow probably chains together several functions executed linearly. Instead of linearly chaining functions, data science code is better written as a set of tasks with dependencies between them. That is your data science workflow should be a DAG.
Data Science, Machine Learning, Programming, Python, Workflow
- Simple Yet Practical Data Cleaning Codes - Feb 26, 2019.
Real world data is messy and needs to be cleaned before it can be used for analysis. Industry experts say the data preprocessing step can easily take 70% to 80% of a data scientist's time on a project.
Data Cleaning, Data Preprocessing, Python
Artificial Neural Network Implementation using NumPy and Image Classification - Feb 21, 2019.
This tutorial builds artificial neural network in Python using NumPy from scratch in order to do an image classification application for the Fruits360 dataset
Pages: 1 2
Deep Learning, Machine Learning, Neural Networks, numpy, Python
Python Data Science for Beginners - Feb 20, 2019.
Python’s syntax is very clean and short in length. Python is open-source and a portable language which supports a large standard library. Buy why Python for data science? Read on to find out more.
Beginners, Data Science, Matplotlib, numpy, Pandas, Python, scikit-learn, SciPy
Running R and Python in Jupyter - Feb 19, 2019.
The Jupyter Project began in 2014 for interactive and scientific computing. Fast forward 5 years and now Jupyter is one of the most widely adopted Data Science IDE's on the market and gives the user access to Python and R
IPython, Jupyter, Python, R
How to Setup a Python Environment for Machine Learning - Feb 18, 2019.
In this tutorial, you will learn how to set up a stable Python Machine Learning development environment. You’ll be able to get right down into the ML and never have to worry about installing packages ever again.
Machine Learning, Programming, Python
An Introduction to Scikit Learn: The Gold Standard of Python Machine Learning - Feb 13, 2019.
If you’re going to do Machine Learning in Python, Scikit Learn is the gold standard. Scikit-learn provides a wide selection of supervised and unsupervised learning algorithms. Best of all, it’s by far the easiest and cleanest ML library.
Machine Learning, Python, scikit-learn
- From Good to Great Data Science, Part 1: Correlations and Confidence - Feb 5, 2019.
With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.
Correlation, Data Science, Python, Statistics
Intuitive Visualization of Outlier Detection Methods - Feb 5, 2019.
Check out this visualization for outlier detection methods, and the Python project from which it comes, a toolkit for easily implementing outlier detection methods on your own.
Cheat Sheet, Outliers, Python
- Exploring Python Basics - Jan 31, 2019.
This free eBook is a great resource for any beginner, providing a good introduction into Python, a look at the basics of learning a programming language and explores modelling and predictions.
Beginners, Book, Manning, Python
- ELMo: Contextual Language Embedding - Jan 31, 2019.
Create a semantic search engine using deep contextualised language representations from ELMo and why context is everything in NLP.
Data Visualization, NLP, Plotly, Python, Word Embeddings

7 Steps to Mastering Basic Machine Learning with Python — 2019 Edition - Jan 29, 2019.
With a new year upon us, I thought it would be a good time to revisit the concept and put together a new learning path for mastering machine learning with Python. With these 7 steps you can master basic machine learning with Python!
7 Steps, Classification, Clustering, Jupyter, Machine Learning, Python, Regression
- Automated Machine Learning in Python - Jan 18, 2019.
An organization can also reduce the cost of hiring many experts by applying AutoML in their data pipeline. AutoML also reduces the amount of time it would take to develop and test a machine learning model.
Automated Machine Learning, AutoML, H2O, Keras, Machine Learning, Python, scikit-learn
How to build an API for a machine learning model in 5 minutes using Flask - Jan 17, 2019.
Flask is a micro web framework written in Python. It can create a REST API that allows you to send data, and receive a prediction as a response.
API, Flask, Machine Learning, Python
- The 6 Most Useful Machine Learning Projects of 2018 - Jan 15, 2019.
Let’s take a look at the top 6 most practically useful ML projects over the past year. These projects have published code and datasets that allow individual developers and smaller teams to learn and immediately create value.
Automated Machine Learning, Facebook, fast.ai, Google, Keras, Machine Learning, Object Detection, Python, Reinforcement Learning, Word Embeddings
- Python Patterns: max Instead of if - Jan 10, 2019.
I often have to loop over a set of objects to find the one with the greatest score. You can use an if statement and a placeholder, but there are more elegant ways!
Programming, Python
- 3 More Google Colab Environment Management Tips - Jan 2, 2019.
This is a short collection of lessons learned using Colab as my main coding learning environment for the past few months. Some tricks are Colab specific, others as general Jupyter tips, and still more are filesystem related, but all have proven useful for me.
Google, Google Colab, Jupyter, Machine Learning, Python
- Synthetic Data Generation: A must-have skill for new data scientists - Dec 27, 2018.
A brief rundown of methods/packages/ideas to generate synthetic data for self-driven data science projects and deep diving into machine learning methods.
Pages: 1 2
Classification, Clustering, Datasets, Machine Learning, Python, Synthetic Data
A Guide to Decision Trees for Machine Learning and Data Science - Dec 24, 2018.
What makes decision trees special in the realm of ML models is really their clarity of information representation. The “knowledge” learned by a decision tree through training is directly formulated into a hierarchical structure.
Algorithms, Data Science, Decision Trees, Machine Learning, Python, scikit-learn
Top Python Libraries in 2018 in Data Science, Deep Learning, Machine Learning - Dec 19, 2018.
Here are the top 15 Python libraries across Data Science, Data Visualization. Deep Learning, and Machine Learning.
Data Science, Deep Learning, Machine Learning, Pandas, Python, PyTorch, TensorFlow