- KDnuggets™ News 19:n28, Jul 31: Top 13 Skills To Become a Rockstar Data Scientist; Best Podcasts on AI, Analytics, Data Science - Jul 31, 2019.
Learn the essential skills needed to become a Data Science rockstar; Understand CNNs with Python + Tensorflow + Keras tutorial; Discover the best podcasts about AI, Analytics, Data Science; and find out where you can get the best Certificates in the field
Convolutional Neural Networks, Data Preparation, Data Science Certificate, Data Science Skills, Podcast, Python, TensorFlow
- Here’s how you can accelerate your Data Science on GPU - Jul 30, 2019.
Data Scientists need computing power. Whether you’re processing a big dataset with Pandas or running some computation on a massive matrix with Numpy, you’ll need a powerful machine to get the job done in a reasonable amount of time.
Big Data, Data Science, DBSCAN, Deep Learning, GPU, NVIDIA, Python
- Exploring Python Basics. - Jul 29, 2019.
This free ebook is a great resource for data science beginners, providing a good introduction into Python, coding with Raspberry Pi, and using Python to building predictive models.
Beginners, Book, Manning, Python
Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras - Jul 26, 2019.
Different neural network architectures excel in different tasks. This particular article focuses on crafting convolutional neural networks in Python using TensorFlow and Keras.
Convolutional Neural Networks, Keras, Neural Networks, Python, TensorFlow
- Easy, One-Click Jupyter Notebooks - Jul 24, 2019.
All of the setup for software, networking, security, and libraries is automatically taken care of by the Saturn Cloud system. Data Scientists can then focus on the actual Data Science and not the tedious infrastructure work that falls around it
Big Data, Cloud, Data Science, Data Scientist, DevOps, Jupyter, Python, Saturn Cloud
- Kaggle Kernels Guide for Beginners: A Step by Step Tutorial - Jul 23, 2019.
This is an attempt to hold the hands of a complete beginner and walk them through the world of Kaggle Kernels — for them to get started.
Kaggle, Python, R
- Things I Learned From the SciPy 2019 Lightning Talks - Jul 22, 2019.
This post summarizes the interesting aspects of the Day One of the SciPy 2019 lightning talks, a flash round of a dozen ~3 minute talks covering a wide variety of topics.
Presentation, Python, SciPy
- Computer Vision for Beginners: Part 1 - Jul 17, 2019.
Image processing is performing some operations on images to get an intended manipulation. Think about what we do when we start a new data analysis. We do some data preprocessing and feature engineering. It’s the same with image processing.
Computer Vision, Deep Learning, Image Processing, Python
Dealing with categorical features in machine learning - Jul 16, 2019.
Many machine learning algorithms require that their input is numerical and therefore categorical features must be transformed into numerical features before we can use any of these algorithms.
Data Cleaning, Data Preprocessing, Feature Engineering, Machine Learning, Python
Training a Neural Network to Write Like Lovecraft - Jul 11, 2019.
In this post, the author attempts to train a neural network to generate Lovecraft-esque prose, known to be awkward and irregular at best. Did it end in success? If not, any suggestions on how it might have? Read on to find out.
Keras, LSTM, Natural Language Generation, Neural Networks, Python, TensorFlow
- 10 Simple Hacks to Speed up Your Data Analysis in Python - Jul 11, 2019.
This article lists some curated tips for working with Python and Jupyter Notebooks, covering topics such as easily profiling data, formatting code and output, debugging, and more. Hopefully you can find something useful within.
Data Analysis, Jupyter, Pandas, Python, Tips
- How to Learn Python without First Needing to Learn Python - Jul 10, 2019.
Learn how data scientists and anyone coding with Python can set up a made-to-order runtime in minutes - not days. Read the 3-minute blog post.
ActiveState, Python
- A Gentle Guide to Starting Your NLP Project with AllenNLP - Jul 10, 2019.
For those who aren’t familiar with AllenNLP, I will give a brief overview of the library and let you know the advantages of integrating it to your project.
Allen Institute, NLP, Python, Sentiment Analysis
- Practical Speech Recognition with Python: The Basics - Jul 9, 2019.
Do you fear implementing speech recognition in your Python apps? Read this tutorial for a simple approach to getting practical with speech recognition using open source Python libraries.
Google, NLP, Python, Speech Recognition
- Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps - Jul 9, 2019.
A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.
Data Visualization, Python, Statistics
- XGBoost and Random Forest® with Bayesian Optimisation - Jul 8, 2019.
This article will explain how to use XGBoost and Random Forest with Bayesian Optimisation, and will discuss the main pros and cons of these methods.
Bayesian, Optimization, Python, random forests algorithm, XGBoost
- Classifying Heart Disease Using K-Nearest Neighbors - Jul 8, 2019.
I have written this post for the developers and assumes no background in statistics or mathematics. The focus is mainly on how the k-NN algorithm works and how to use it for predictive modeling problems.
Pages: 1 2
Healthcare, K-nearest neighbors, Machine Learning, Medical, Python
- Building a Recommender System, Part 2 - Jul 3, 2019.
This post explores an technique for collaborative filtering which uses latent factor models, a which naturally generalizes to deep learning approaches. Our approach will be implemented using Tensorflow and Keras.
Movies, Python, Recommendation Engine, Recommender Systems
- How do you check the quality of your regression model in Python? - Jul 2, 2019.
Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.
Data Science, Multicollinearity, Python, Regression, Statistics
- Make your Data Talk! - Jun 28, 2019.
Matplotlib and Seaborn are two of the most powerful and popular data visualization libraries in Python. Read on to learn how to create some of the most frequently used graphs and charts using Matplotlib and Seaborn.
Pages: 1 2 3
Data Visualization, Matplotlib, Python, Seaborn, Storytelling
- PySyft and the Emergence of Private Deep Learning - Jun 27, 2019.
PySyft is an open-source framework that enables secured, private computations in deep learning, by combining federated learning and differential privacy in a single programming model integrated into different deep learning frameworks such as PyTorch, Keras or TensorFlow.
Deep Learning, Differential Privacy, Privacy, Python, Security
- An Overview of Outlier Detection Methods from PyOD – Part 1 - Jun 27, 2019.
PyOD is an outlier detection package developed with a comprehensive API to support multiple techniques. This post will showcase Part 1 of an overview of techniques that can be used to analyze anomalies in data.
Algorithms, Big Data, Outliers, Python
- Top KDnuggets Tweets, Jun 19 – 25: Learn how to efficiently handle large amounts of data using #Pandas; The biggest mistake while learning #Python for #datascience - Jun 26, 2019.
Also: Data Science Jobs Report 2019; Harvard CS109 #DataScience Course, Resources #Free and Online; Google launches TensorFlow; Mastering SQL for Data Science
Advice, Pandas, Python, Top tweets
- Optimization with Python: How to make the most amount of money with the least amount of risk? - Jun 26, 2019.
Learn how to apply Python data science libraries to develop a simple optimization problem based on a Nobel-prize winning economic theory for maximizing investment profits while minimizing risk.
Finance, Investment, Optimization, Python, Risk Modeling, Stocks
- KDnuggets™ News 19:n24, Jun 26: Understand Cloud Services; Pandas Tips & Tricks; Master Data Preparation w/ Python - Jun 26, 2019.
Happy summer! This week on KDnuggets: Understanding Cloud Data Services; How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat; 7 Steps to Mastering Data Preparation for Machine Learning with Python; Examining the Transformer Architecture: The OpenAI GPT-2 Controversy; Data Literacy: Using the Socratic Method; and much more!
Cloud, Data Preparation, Machine Learning, NLP, OpenAI, Pandas, Python
7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition - Jun 24, 2019.
Interested in mastering data preparation with Python? Follow these 7 steps which cover the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
7 Steps, Data Preparation, Data Preprocessing, Data Science, Data Wrangling, Machine Learning, Pandas, Python
- Top KDnuggets Tweets, Jun 12 – 18: The biggest mistake while learning #Python for #datascience; 5 practical statistical concepts for data scientists - Jun 19, 2019.
Also: Resources for developers transitioning into data science; Best Data Visualization Techniques for small and large data; Top Data Science and Machine Learning Methods Used in 2018, 2019
Advice, Python, Statistics, Top tweets
- One Simple Trick for Speeding up your Python Code with Numpy - Jun 19, 2019.
Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.
Big Data, numpy, Python
- KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report - Jun 19, 2019.
This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!
Data Science, Data Scientist, Machine Learning, Pandas, Python, R, Report, SAS, Scalability, Statistics, TensorFlow
- Python Users Come From All Sorts of Backgrounds - Jun 18, 2019.
Python users come from all sorts of backgrounds, but computer science skills make the difference between a Python apprentice and a Python master. Save 50% off Classic Computer Science Problems in Python today, using the code kdcsprob50 when you buy from manning.com.
Book, Computer Science, Manning, Python
- K-means Clustering with Dask: Image Filters for Cat Pictures - Jun 18, 2019.
How to recreate an original cat image with least possible colors. An interesting use case of Unsupervised Machine Learning with K Means Clustering in Python.
Clustering, Dask, Image Classification, Image Recognition, K-means, Python, Unsupervised Learning

Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS - Jun 17, 2019.
Data science jobs continue to grow in 2019, and this report shares the change and spread of jobs by software over recent years.
Data Science, indeed, Jobs, Python, R, SAS, TensorFlow
- How to Use Python’s datetime - Jun 17, 2019.
Python's datetime package is a convenient set of tools for working with dates and times. With just the five tricks that I’m about to show you, you can handle most of your datetime processing needs.
Programming, Python, Time Series
How to Learn Python for Data Science the Right Way - Jun 14, 2019.
The biggest mistake you can make while learning Python for data science is to learn Python programming from courses meant for programmers. Avoid this mistake, and learn Python the right way by following this approach.
Advice, Data Science, Jupyter, Matplotlib, Pandas, Python, scikit-learn, StatsModels
- Become a Pro at Pandas, Python’s Data Manipulation Library - Jun 13, 2019.
Pandas is one of the most popular Python libraries for cleaning, transforming, manipulating and analyzing data. Learn how to efficiently handle large amounts of data using Pandas.
Matplotlib, numpy, Pandas, Python, SQL
- Scalable Python Code with Pandas UDFs: A Data Science Application - Jun 13, 2019.
There is still a gap between the corpus of libraries that developers want to apply in a scalable runtime and the set of libraries that support distributed execution. This post discusses how to bridge this gap using the the functionality provided by Pandas UDFs in Spark 2.3+
Apache Spark, Big Data, Pandas, Python
- Top KDnuggets Tweets, Jun 5 – 11: A New Extension to Organize your Code on Jupyter Notebooks; Data Science Cheat Sheet - Jun 12, 2019.
Also: Cognitive Biases are Making Sure You Aren’t So Smart; 3 Machine Learning Books that Helped me Level Up as a Data Scientist; Mastering Intermediate Machine Learning with Python
Cheat Sheet, Jupyter, Python, Top tweets
- How to Automate Hyperparameter Optimization - Jun 12, 2019.
A step-by-step guide into performing a hyperparameter optimization task on a deep learning model by employing Bayesian Optimization that uses the Gaussian Process. We used the gp_minimize package provided by the Scikit-Optimize (skopt) library to perform this task.
Bayesian, Deep Learning, Hyperparameter, Machine Learning, Neural Networks, Optimization, Python, TensorFlow
What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem - Jun 10, 2019.
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
Anaconda, Apache Spark, Big Data Software, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, Tableau, TensorFlow
PyViz: Simplifying the Data Visualisation Process in Python - Jun 6, 2019.
There are python libraries suitable for basic data visualizations but not for complicated ones, and there are libraries suitable only for complex visualizations. Is there a single library that handles both these tasks efficiently? The answer is yes. It's PyViz
Data Visualization, GitHub, Matplotlib, Python
- The Whole Data Science World in Your Hands - Jun 5, 2019.
Testing MatrixDS capabilities on different languages and tools: Python, R and Julia. If you work with data you have to check this out.
Data Science, Data Scientist, Julia, Jupyter, MatrixDS, Python, R
- KDnuggets™ News 19:n21, Jun 5: Transitioning your Career to Data Science; 11 top Data Science, Machine Learning platforms; 7 Steps to Mastering Intermediate ML w. Python - Jun 5, 2019.
The results of KDnuggets 20th Annual Software Poll; How to transition to a Data Science career; Mastering Intermediate Machine Learning with Python ; Understanding Natural Language Processing (NLP); Backprop as applied to LSTM, and much more.
Backpropagation, Data Science Platform, LSTM, Machine Learning, NLP, Python
- The Hitchhiker’s Guide to Feature Extraction - Jun 3, 2019.
Check out this collection of tricks and code for Kaggle and everyday work.
Feature Engineering, Feature Extraction, Feature Selection, Kaggle, Python
7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
7 Steps, Classification, Cross-validation, Dimensionality Reduction, Feature Engineering, Feature Selection, Image Classification, K-nearest neighbors, Machine Learning, Modeling, Naive Bayes, numpy, Pandas, PCA, Python, scikit-learn, Transfer Learning

Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Pages: 1 2
Anaconda, Apache Spark, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, TensorFlow
- Who is your Golden Goose?: Cohort Analysis - May 30, 2019.
Step-by-step tutorial on how to perform customer segmentation using RFM analysis and K-Means clustering in Python.
Pages: 1 2
Clustering, Data Analysis, K-means, Python, Retail
- Animations with Matplotlib - May 30, 2019.
Animations make even more sense when depicting time series data like stock prices over the years, climate change over the past decade, seasonalities and trends since we can then see how a particular parameter behaves with time.
Data Science, Data Visualization, Matplotlib, Python
- Boost Your Image Classification Model - May 27, 2019.
Check out this collection of tricks to improve the accuracy of your classifier.
fast.ai, Generative Adversarial Network, Image Classification, Image Recognition, Python
- Stylight: Sr Python Engineer – Evolving Systems (d/f/m) [Munich, Germany] - May 24, 2019.
Seeking an experienced python engineer to join our team in Munich. You’re interested in further optimizing and automating our pipeline as well as delivering high quality software? Then Stylight is your place to be.
Engineer, Germany, Munich, Python, Stylight
- Analyzing Tweets with NLP in Minutes with Spark, Optimus and Twint - May 24, 2019.
Social media has been gold for studying the way people communicate and behave, in this article I’ll show you the easiest way of analyzing tweets without the Twitter API and scalable for Big Data.
Pages: 1 2
Apache Spark, Big Data, Deep Learning, Machine Learning, NLP, Optimus, Python, Twint
- Extracting Knowledge from Knowledge Graphs Using Facebook’s Pytorch-BigGraph - May 22, 2019.
We are using the state-of-the-art Deep Learning tools to build a model for predict a word using the surrounding words as labels.
Pages: 1 2
Deep Learning, Facebook, Machine Learning, NLP, Python, PyTorch, word2vec
- PyCharm for Data Scientists - May 17, 2019.
This article is a discussion of some of PyCharm's features, and a comparison with Spyder, an another popular IDE for Python. Read on to find the benefits and drawbacks of PyCharm, and an outline of when to prefer it to Spyder and vice versa.
Data Science, Data Scientist, Programming, PyCharm, Python
Mathematical programming — Key Habit to Build Up for Advancing Data Science - May 15, 2019.
We show how, by simulating the random throw of a dart, you can compute the value of pi approximately. This is a small step towards building the habit of mathematical programming, which should be a key skill in the repertoire of a budding data scientist.
Data Science, Mathematics, Programming, Python
- What my first Silver Medal taught me about Text Classification and Kaggle in general? - May 13, 2019.
A first-hand account of ideas tried by a competitor at the recent kaggle competition 'Quora Insincere questions classification', with a brief summary of some of the other winning solutions.
Advice, Competition, Cross-validation, Kaggle, Python, Text Classification
- A Complete Exploratory Data Analysis and Visualization for Text Data: Combine Visualization and NLP to Generate Insights - May 9, 2019.
Visually representing the content of a text document is one of the most important tasks in the field of text mining as a Data Scientist or NLP specialist. However, there are some gaps between visualizing unstructured (text) data and structured data.
Pages: 1 2
Data Visualization, NLP, Plotly, Python, Text Analytics
- [White Paper] Unlocking the Power of Data Science & Machine Learning with Python - May 8, 2019.
This guide from ActiveState provides an executive overview of how you can implement Python for your team’s data science and machine learning initiatives.
ActiveState, Data Science, Machine Learning, Python, White Paper
- Linear Programming and Discrete Optimization with Python using PuLP - May 8, 2019.
Knowledge of such optimization techniques is extremely useful for data scientists and machine learning (ML) practitioners as discrete and continuous optimization lie at the heart of modern ML and AI systems as well as data-driven business analytics processes.
Pages: 1 2
Linear Programming, Optimization, Python
- Naive Bayes: A Baseline Model for Machine Learning Classification Performance - May 7, 2019.
We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.
Pages: 1 2
Algorithms, Data Science, Machine Learning, Naive Bayes, Python, scikit-learn, Statistics
- How to Automate Tasks on GitHub With Machine Learning for Fun and Profit - May 3, 2019.
Check this tutorial on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets.
Datasets, GitHub, Python, TensorFlow
- Modeling Price with Regularized Linear Model & XGBoost - May 2, 2019.
We are going to implement regularization techniques for linear regression of house pricing data. Our goal in price modeling is to model the pattern and ignore the noise.
Modeling, Python, Regularization, XGBoost
- Which Deep Learning Framework is Growing Fastest? - May 1, 2019.
In September 2018, I compared all the major deep learning frameworks in terms of demand, usage, and popularity. TensorFlow was the champion of deep learning frameworks and PyTorch was the youngest framework. How has the landscape changed?
Data Science, Data Scientist, Deep Learning, fast.ai, Keras, Python, PyTorch, TensorFlow
- Build Your First Chatbot Using Python & NLTK - May 1, 2019.
Today we will learn to create a simple chat assistant or chatbot using Python’s NLTK library.
Chatbot, NLP, NLTK, Python
- KDnuggets™ News 19:n17, May 1: The most desired skill in data science; Seeking KDnuggets Editors, work remotely - May 1, 2019.
This week, find out about the most desired skill in data science, learn which projects to include in your portfolio, identify a single strategy for pulling data from a Pandas DataFrame (once and for all), read the results of our Top Data Science and Machine Learning Methods poll, and much more.
Algorithms, Data Science, Generative Adversarial Network, Machine Learning, Pandas, Portfolio, Python, Recurrent Neural Networks
- Powerful like your local notebook. Sharable like a Google Doc. - Apr 30, 2019.
Mode is the only analytics platform with native Python and R Notebooks. Get everyone up and running in minutes by delivering Notebook-powered results right in your browser. Now anyone on your team can re-run R- and Python-powered reports themselves—without ever touching code.
Mode Analytics, Python, R, SQL
Normalization vs Standardization — Quantitative analysis - Apr 30, 2019.
Stop using StandardScaler from Sklearn as a default feature scaling method can get you a boost of 7% in accuracy, even when you hyperparameters are tuned!
Pages: 1 2
Data Preprocessing, Data Science, Feature Engineering, Machine Learning, Normalization, Python, Standardization
Pandas DataFrame Indexing - Apr 29, 2019.
The goal of this post is identify a single strategy for pulling data from a DataFrame using the Pandas Python library that is straightforward to interpret and produces reliable results.
Data Science, Pandas, Python
- Graduating in GANs: Going From Understanding Generative Adversarial Networks to Running Your Own - Apr 25, 2019.
Read how generative adversarial networks (GANs) research and evaluation has developed then implement your own GAN to generate handwritten digits.
Pages: 1 2
Deep Learning, GANs, Generative Adversarial Network, Generative Models, MNIST, Neural Networks, Python
- Top 10 Python Use Cases - Apr 24, 2019.
This paper covers 10 of the most common use cases by industry for Python that ActiveState has witnessed implemented by its customers.
ActiveState, ebook, Python
- KDnuggets™ News 19:n16, Apr 24: Data Visualization in Python with Matplotlib & Seaborn; Getting Into Data Science: The Ultimate Q&A - Apr 24, 2019.
Best Data Visualization Techniques for small and large data; The Rise of Generative Adversarial Networks; Approach pre-trained deep learning models with caution; How Optimization Works; Building a Flask API to Automatically Extract Named Entities Using SpaCy
Data Science, Data Visualization, Generative Adversarial Network, Matplotlib, Optimization, Python, R, Seaborn

Data Visualization in Python: Matplotlib vs Seaborn - Apr 19, 2019.
Seaborn and Matplotlib are two of Python's most powerful visualization libraries. Seaborn uses fewer syntax and has stunning default themes and Matplotlib is more easily customizable through accessing the classes.
Advice, Data Visualization, Matplotlib, Python, Seaborn
- Unleash a faster Python on your data - Apr 18, 2019.
Intel’s optimized Python packages deliver quick repeatable results compared to standard Python packages. Intel offers optimized Scikit-learn, Numpy, and SciPy to help data scientists get rapid results on their Intel® hardware. Download now.
Data Science, Intel, numpy, Python, scikit-learn, SciPy
- Building a Flask API to Automatically Extract Named Entities Using SpaCy - Apr 17, 2019.
This article discusses how to use the Named Entity Recognition module in spaCy to identify people, organizations, or locations in text, then deploy a Python API with Flask.
API, Flask, NLP, Python
Data Science with Optimus Part 2: Setting your DataOps Environment - Apr 16, 2019.
Breaking down data science with Python, Spark and Optimus. Today: Data Operations for Data Science. Here we’ll learn to set-up Git, Travis CI and DVC for our project.
Apache Spark, Data Operations, Data Science, Python, Workflow
- Data Science with Optimus Part 1: Intro - Apr 15, 2019.
With Optimus you can clean your data, prepare it, analyze it, create profilers and plots, and perform machine learning and deep learning, all in a distributed fashion, because on the back-end we have Spark, TensorFlow, Sparkling Water and Keras. It’s super easy to use.
Apache Spark, Data Science, Python, Workflow
- Because analysis is more than just dashboards - Apr 11, 2019.
Where traditional BI tools often make it easy to build dashboards, Mode makes it easy for you to answer any follow-up questions when you see changes in those dashboards. Choose the level of abstraction you want for a given dataset and quickly get to the story behind the change.
Analysis, Dashboard, Data Visualization, Mode Analytics, Python, R, SQL
- Build Python for Data Science in Just a Few Clicks - Apr 10, 2019.
There is only one Python distro that lets you add new versions of packages, remove unused packages, and rebuild in minutes. Yes, for free. Download ActiveState Python 3.6 build now.
ActiveState, free download, Python
- All you need to know about text preprocessing for NLP and Machine Learning - Apr 9, 2019.
We present a comprehensive introduction to text preprocessing, covering the different techniques including stemming, lemmatization, noise removal, normalization, with examples and explanations into when you should use each of them.
Data Preprocessing, Machine Learning, NLP, Python, Text Analysis, Text Mining
- Advanced Keras — Constructing Complex Custom Losses and Metrics - Apr 8, 2019.
In this tutorial I cover a simple trick that will allow you to construct custom loss functions in Keras which can receive arguments other than y_true
and y_pred
.
Keras, Metrics, Neural Networks, Python
- Building a Recommender System - Apr 4, 2019.
A beginners guide to building a recommendation system, with a step-by-step guide on how to create a content-based filtering system to recommend movies for a user to watch.
Movies, Python, Recommendation Engine, Recommender Systems
Predict Age and Gender Using Convolutional Neural Network and OpenCV - Apr 4, 2019.
Age and gender estimation from a single face image are important tasks in intelligent applications. As such, let's build a simple age and gender detection model in this detailed article.
Computer Vision, Convolutional Neural Networks, OpenCV, Python
- Which Face is Real? - Apr 2, 2019.
Which Face Is Real? was developed based on Generative Adversarial Networks as a web application in which users can select which image they believe is a true person and which was synthetically generated. The person in the synthetically generated photo does not exist.
Deep Learning, GANs, Generative Adversarial Network, Neural Networks, NVIDIA, Python
Explaining Random Forest® (with Python Implementation) - Mar 29, 2019.
We provide an in-depth introduction to Random Forest, with an explanation to how it works, its advantages and disadvantages, important hyperparameters and a full example Python implementation.
Explained, Machine Learning, Python, random forests algorithm
- A Beginner’s Guide to Linear Regression in Python with Scikit-Learn - Mar 29, 2019.
What linear regression is and how it can be implemented for both two variables and multiple variables using Scikit-Learn, which is one of the most popular machine learning libraries for Python.
Pages: 1 2
Beginners, Linear Regression, Python, scikit-learn
- [PDF] Python: The Programmer’s Lingua Franca - Mar 27, 2019.
This paper presents the case that Python is the language best suited to becoming a programmer’s lingua franca.
ActiveState, ebook, Python
- Data Pipelines, Luigi, Airflow: Everything you need to know - Mar 27, 2019.
This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.
Data Workflow, Pipeline, Python, Workflow
R vs Python for Data Visualization - Mar 25, 2019.
This article demonstrates creating similar plots in R and Python using two of the most prominent data visualization packages on the market, namely ggplot2 and Seaborn.
Data Visualization, ggplot2, Matplotlib, Python, Python vs R, R, Seaborn
- Feature Reduction using Genetic Algorithm with Python - Mar 25, 2019.
This tutorial discusses how to use the genetic algorithm (GA) for reducing the feature vector extracted from the Fruits360 dataset in Python mainly using NumPy and Sklearn.
Pages: 1 2
Deep Learning, Feature Engineering, Genetic Algorithm, Neural Networks, numpy, Python, scikit-learn
- Deploy your PyTorch model to Production - Mar 20, 2019.
This tutorial aims to teach you how to deploy your recently trained model in PyTorch as an API using Python.
Data Science Education, Data Scientist, Deep Learning, Flask, Programming, Python, PyTorch
- Mastering Fast Gradient Boosting on Google Colaboratory with free GPU - Mar 19, 2019.
CatBoost is a fast implementation of GBDT with GPU support out-of-the-box. Google Colaboratory is a very useful tool with free GPU support.
CatBoost, Google Colab, GPU, Gradient Boosting, Machine Learning, Python, Yandex
- How to Train a Keras Model 20x Faster with a TPU for Free - Mar 19, 2019.
This post shows how to train an LSTM Model using Keras and Google CoLaboratory with TPUs to exponentially reduce training time compared to a GPU on your local machine.
Deep Learning, Google Colab, Keras, Python, TensorFlow, TPU
Artificial Neural Networks Optimization using Genetic Algorithm with Python - Mar 18, 2019.
This tutorial explains the usage of the genetic algorithm for optimizing the network weights of an Artificial Neural Network for improved performance.
Pages: 1 2
AI, Algorithms, Deep Learning, Machine Learning, Neural Networks, numpy, Optimization, Python
- Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision - Mar 15, 2019.
In this blog, I’ll walk you through a personal project in which I cheaply built a classifier to detect anti-semitic tweets, with no public dataset available, by combining weak supervision and transfer learning.
Pages: 1 2
Bias, fast.ai, NLP, Python, Text Classification, Transfer Learning, Twitter, ULMFiT
- Advanced Keras — Accurately Resuming a Training Process - Mar 14, 2019.
This article on practical advanced Keras use covers handling nontrivial cases where custom callbacks are used.
Keras, Neural Networks, Python, Training
- Object Detection with Luminoth - Mar 13, 2019.
In this article you will learn about Luminoth, an open source computer vision library which sits atop Sonnet and TensorFlow and provides object detection for images and video.
Computer Vision, Image Recognition, Object Detection, Python

Who is a typical Data Scientist in 2019? - Mar 11, 2019.
We investigate what a typical data scientist looks like and see how this differs from this time last year, looking at skill set, programming languages, industry of employment, country of employment, and more.
Career, Data Science Skills, Data Scientist, Industry, MATLAB, Python, R, SQL
- Neural Networks with Numpy for Absolute Beginners — Part 2: Linear Regression - Mar 7, 2019.
In this tutorial, you will learn to implement Linear Regression for prediction using Numpy in detail and also visualize how the algorithm learns epoch by epoch. In addition to this, you will explore two layer Neural Networks.
Pages: 1 2
Gradient Descent, Linear Regression, Neural Networks, numpy, Python
- Top KDnuggets tweets, Feb 27 – Mar 05: How to Setup a Python Environment for Machine Learning; How to do Everything in Computer Vision - Mar 6, 2019.
Also Python Data Science for Beginners; Deep Learning for Natural Language Processing (NLP) - using RNNs and CNNs.
NLP, Python, Top tweets
- Neural Networks with Numpy for Absolute Beginners: Introduction - Mar 5, 2019.
In this tutorial, you will get a brief understanding of what Neural Networks are and how they have been developed. In the end, you will gain a brief intuition as to how the network learns.
Beginners, Neural Networks, numpy, Python
- Python 2 support ends this year. Are you ready to migrate? - Feb 27, 2019.
Python 2 ends on Jan 1, 2020. Migrating from Python 2 to 3 can be a scary process, so get this solution sheet with different options for moving your existing packages and applications from Python 2 to 3, along with best practice guidelines.
ActiveState, Python
4 Reasons Why Your Machine Learning Code is Probably Bad - Feb 26, 2019.
Your current ML workflow probably chains together several functions executed linearly. Instead of linearly chaining functions, data science code is better written as a set of tasks with dependencies between them. That is your data science workflow should be a DAG.
Data Science, Machine Learning, Programming, Python, Workflow
- Simple Yet Practical Data Cleaning Codes - Feb 26, 2019.
Real world data is messy and needs to be cleaned before it can be used for analysis. Industry experts say the data preprocessing step can easily take 70% to 80% of a data scientist's time on a project.
Data Cleaning, Data Preprocessing, Python
- Don’t do analysis in a vacuum - Feb 22, 2019.
Traditional tools force analysts to play the import-and-export game, so it's difficult to keep data fresh and accessible. Every Mode report or dashboard lives at a unique URL for future sharing, iterating, and building upon. Mode brings your entire team together in one platform.
Analytics, Dashboard, Mode Analytics, Platform, Python, R
Artificial Neural Network Implementation using NumPy and Image Classification - Feb 21, 2019.
This tutorial builds artificial neural network in Python using NumPy from scratch in order to do an image classification application for the Fruits360 dataset
Pages: 1 2
Deep Learning, Machine Learning, Neural Networks, numpy, Python
Python Data Science for Beginners - Feb 20, 2019.
Python’s syntax is very clean and short in length. Python is open-source and a portable language which supports a large standard library. Buy why Python for data science? Read on to find out more.
Beginners, Data Science, Matplotlib, numpy, Pandas, Python, scikit-learn, SciPy
- KDnuggets™ News 19:n08, Feb 20: The Gold Standard of Python Machine Learning; The Analytics Engineer – new role in the data team - Feb 20, 2019.
Intro to scikit-learn; how to set up a Python ML environment; why there should be a new role in the Data Science team; how to learn one of the hardest parts of being a Data Scientist; and how explainable is BERT?
BERT, Python, scikit-learn
Running R and Python in Jupyter - Feb 19, 2019.
The Jupyter Project began in 2014 for interactive and scientific computing. Fast forward 5 years and now Jupyter is one of the most widely adopted Data Science IDE's on the market and gives the user access to Python and R
IPython, Jupyter, Python, R
How to Setup a Python Environment for Machine Learning - Feb 18, 2019.
In this tutorial, you will learn how to set up a stable Python Machine Learning development environment. You’ll be able to get right down into the ML and never have to worry about installing packages ever again.
Machine Learning, Programming, Python
An Introduction to Scikit Learn: The Gold Standard of Python Machine Learning - Feb 13, 2019.
If you’re going to do Machine Learning in Python, Scikit Learn is the gold standard. Scikit-learn provides a wide selection of supervised and unsupervised learning algorithms. Best of all, it’s by far the easiest and cleanest ML library.
Machine Learning, Python, scikit-learn
- 10 Trending Data Science Topics at ODSC East 2019 - Feb 7, 2019.
ODSC East 2019, Boston, Apr 30 - May 3, will host over 300+ of the leading experts in data science and AI. Here are a few standout topics and presentations in this rapidly evolving field. Register for ODSC East at 50% off till Feb 8.
Apache Spark, Boston, Data Science, LSTM, Machine Learning, ODSC, Python
- Top KDnuggets tweets, Jan 30 – Feb 05: state-of-the-art in #AI, #MachineLearning - Feb 6, 2019.
Also Brilliant tour-de-force! Reinforcement Learning to solve Rubiks Cube; Dask, Pandas, and GPUs: first steps; Neural network AI is simple. So Stop pretending you are a genius.
Dask, GPU, Pandas, Python, Reinforcement Learning, Top tweets
- From Good to Great Data Science, Part 1: Correlations and Confidence - Feb 5, 2019.
With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.
Correlation, Data Science, Python, Statistics
Intuitive Visualization of Outlier Detection Methods - Feb 5, 2019.
Check out this visualization for outlier detection methods, and the Python project from which it comes, a toolkit for easily implementing outlier detection methods on your own.
Cheat Sheet, Outliers, Python
- Exploring Python Basics - Jan 31, 2019.
This free eBook is a great resource for any beginner, providing a good introduction into Python, a look at the basics of learning a programming language and explores modelling and predictions.
Beginners, Book, Manning, Python
- ELMo: Contextual Language Embedding - Jan 31, 2019.
Create a semantic search engine using deep contextualised language representations from ELMo and why context is everything in NLP.
Data Visualization, NLP, Plotly, Python, Word Embeddings
- KDnuggets™ News 19:n05, Jan 30: Your AI skills are worth less than you think; 7 Steps to Mastering Basic Machine Learning - Jan 30, 2019.
Also: Logistic Regression: A Concise Technical Overview; AI is a Big Fat Lie; How To Fine Tune Your Machine Learning Models To Improve Forecasting Accuracy; Airbnb Rental Listings Dataset Mining; Data Science Project Flow for Startups
AI, Hype, Logistic Regression, Machine Learning, Modeling, Python, Skills, Workflow

7 Steps to Mastering Basic Machine Learning with Python — 2019 Edition - Jan 29, 2019.
With a new year upon us, I thought it would be a good time to revisit the concept and put together a new learning path for mastering machine learning with Python. With these 7 steps you can master basic machine learning with Python!
7 Steps, Classification, Clustering, Jupyter, Machine Learning, Python, Regression
- KDnuggets™ News 19:n04, Jan 23: Top 7 Python Libraries for Data Science and AI; Ontology and Data Science - Jan 23, 2019.
Also Cartoon: Is this how you do the blockchain thing?; Data Scientist's Dilemma: The Cold Start Problem; Why Ice Cream Is Linked to Shark Attacks - Correlation/Causation Smackdown.
Data Science, Flask, Ontology, Python
2018’s Top 7 Python Libraries for Data Science and AI - Jan 21, 2019.
This is a list of the best libraries that changed our lives this year, compiled from my weekly digests.
Pages: 1 2
AI, AutoML, Data Science, Python, SHAP, spaCy
- On Points Insights: Senior Python Developer with Big Data skills [Remote, US] - Jan 18, 2019.
Seeking a Senior Python Developer with Big Data skills (work remotely), to interpret internal or external business issues and recommend best practices, solve complex problems, and take a broad perspective to identify innovative solutions.
Big Data, Developer, On Points Insights, Python, Telecommute
- Automated Machine Learning in Python - Jan 18, 2019.
An organization can also reduce the cost of hiring many experts by applying AutoML in their data pipeline. AutoML also reduces the amount of time it would take to develop and test a machine learning model.
Automated Machine Learning, AutoML, H2O, Keras, Machine Learning, Python, scikit-learn
How to build an API for a machine learning model in 5 minutes using Flask - Jan 17, 2019.
Flask is a micro web framework written in Python. It can create a REST API that allows you to send data, and receive a prediction as a response.
API, Flask, Machine Learning, Python
- The 6 Most Useful Machine Learning Projects of 2018 - Jan 15, 2019.
Let’s take a look at the top 6 most practically useful ML projects over the past year. These projects have published code and datasets that allow individual developers and smaller teams to learn and immediately create value.
Automated Machine Learning, Facebook, fast.ai, Google, Keras, Machine Learning, Object Detection, Python, Reinforcement Learning, Word Embeddings
- Python Patterns: max Instead of if - Jan 10, 2019.
I often have to loop over a set of objects to find the one with the greatest score. You can use an if statement and a placeholder, but there are more elegant ways!
Programming, Python
- 3 More Google Colab Environment Management Tips - Jan 2, 2019.
This is a short collection of lessons learned using Colab as my main coding learning environment for the past few months. Some tricks are Colab specific, others as general Jupyter tips, and still more are filesystem related, but all have proven useful for me.
Google, Google Colab, Jupyter, Machine Learning, Python
- Manning Countdown to 2019 – Big Deals on AI, Data Science, Machine Learning books and videos - Dec 28, 2018.
Introducing the Manning countdown to 2019, where each day you’ll be able to get a different one day deal on some of their biggest books and video courses.
Book, Deep Learning, ebook, Manning, Python
- Synthetic Data Generation: A must-have skill for new data scientists - Dec 27, 2018.
A brief rundown of methods/packages/ideas to generate synthetic data for self-driven data science projects and deep diving into machine learning methods.
Pages: 1 2
Classification, Clustering, Datasets, Machine Learning, Python, Synthetic Data
A Guide to Decision Trees for Machine Learning and Data Science - Dec 24, 2018.
What makes decision trees special in the realm of ML models is really their clarity of information representation. The “knowledge” learned by a decision tree through training is directly formulated into a hierarchical structure.
Algorithms, Data Science, Decision Trees, Machine Learning, Python, scikit-learn
Top Python Libraries in 2018 in Data Science, Deep Learning, Machine Learning - Dec 19, 2018.
Here are the top 15 Python libraries across Data Science, Data Visualization. Deep Learning, and Machine Learning.
Data Science, Deep Learning, Machine Learning, Pandas, Python, PyTorch, TensorFlow
- Exploring the Data Jungle Free eBook - Dec 18, 2018.
This free eBook by Brian Godsey will provide you with real-world examples in Python, R, and other languages suitable for data science.
Data Preparation, Data Science, Data Visualization, Free ebook, Manning, Python, R
- Solve any Image Classification Problem Quickly and Easily - Dec 13, 2018.
This article teaches you how to use transfer learning to solve image classification problems. A practical example using Keras and its pre-trained models is given for demonstration purposes.
Pages: 1 2
Classification, Computer Vision, Image Recognition, Keras, Python