2019 Nov

All (61) | News (3) | Opinions (15) | Tutorials, Overviews (43)

Markov Chains: How to Train Text Generation to Write Like George R. R. Martin

Read this article on training Markov chains to generate George R. R. Martin style text.

on Nov 29, 2019 in Generative Models, Markov Chains, NLP, Text Analytics
Lit BERT: NLP Transfer Learning In 3 Steps

PyTorch Lightning is a lightweight framework which allows anyone using PyTorch to scale deep learning code easily while making it reproducible. In this tutorial we’ll use Huggingface's implementation of BERT to do a finetuning task in Lightning.

on Nov 29, 2019 in BERT, NLP, Python, PyTorch Lightning, Transfer Learning
Open Source Projects by Google, Uber and Facebook for Data Science and AI

Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public.

on Nov 28, 2019 in Advice, AI, Data Science, Data Scientist, Data Visualization, Deep Learning, Facebook, Google, Open Source, Python, Uber
A Doomed Marriage of Machine Learning and Agile

Sebastian Thrun, the founder of Udacity, ruined my machine learning project and wedding.

on Nov 28, 2019 in Agile, Machine Learning, Udacity
The Future of Careers in Data Science & Analysis

As the fields of data science and analysis continue to expand, the next crop of bright minds is always needed. Learn more about the nuances of these jobs and find where you can fit in for a rewarding and interesting career.

on Nov 27, 2019 in Careers, Data Analyst, Data Science, Data Scientist
Spark NLP 101: LightPipeline

A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.

on Nov 27, 2019 in Apache Spark, NLP, Pipeline, Spark NLP
Task-based effectiveness of basic visualizations

This is a summary of a recent paper on an age-old topic: what visualisation should I use? No prizes for guessing “it depends!” Is this the paper to finally settle the age-old debate surrounding pie-charts??

on Nov 27, 2019 in Charts, Data Visualization, Visualization
Machine Learning 101: The What, Why, and How of Weighting

Weighting is a technique for improving models. In this article, learn more about what weighting is, why you should (and shouldn’t) use it, and how to choose optimal weights to minimize business costs.

on Nov 26, 2019 in Accuracy, Balancing Classes, Machine Learning, Model Performance, Sports
Content-based Recommender Using Natural Language Processing (NLP)

A guide to build a content-based movie recommender model based on NLP.

on Nov 26, 2019 in Movies, Netflix, NLP, Python, Recommender Systems
Probability Learning: Naive Bayes

This post will describe various simplifications of Bayes' Theorem, that make it more practical and applicable to real world problems: these simplifications are known by the name of Naive Bayes. Also, to clarify everything we will see a very illustrative example of how Naive Bayes can be applied for classification.

on Nov 26, 2019 in Bayes Theorem, Learning, Naive Bayes, Probability
Top 8 Data Science Use Cases in Marketing

In this article, we want to highlight some key data science use cases in marketing. Let us concentrate on several instances that present particular interest and managed to prove their efficiency in the course of time.

on Nov 25, 2019 in Data Science, Marketing, Use Cases
Can Neural Networks Develop Attention? Google Thinks they Can

Google recently published some work about modeling attention mechanisms in deep neural networks.

on Nov 25, 2019 in Attention, Google, Neural Networks
Advice for New and Junior Data Scientists

If you are a new Data Scientist early in your professional journey, and you’re a bit confused and lost, then follow this advice to figure out how to best contribute to your company.

on Nov 22, 2019 in Advice, Beginners, Career, Data Scientist
Automated Machine Learning Project Implementation Complexities

To demonstrate the implementation complexity differences along the AutoML highway, let's have a look at how 3 specific software projects approach the implementation of just such an AutoML "solution," namely Keras Tuner, AutoKeras, and automl-gs.

on Nov 22, 2019 in Automated Machine Learning, Keras, Pipeline, Python
Text Encoding: A Review

We will focus here exactly on that part of the analysis that transforms words into numbers and texts into number vectors: text encoding.

on Nov 22, 2019 in Data Preprocessing, NLP, Representation, Rosaria Silipo, Text Analytics, Word Embeddings
Three Methods of Data Pre-Processing for Text Classification

This blog shows how text data representations can be used to build a classifier to predict a developer’s deep learning framework of choice based on the code that they wrote, via examples of TensorFlow and PyTorch projects.

on Nov 21, 2019 in Data Preparation, IBM, Text Classification
Python, Selenium & Google for Geocoding Automation: Free and Paid

This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API.

on Nov 21, 2019 in Automation, Geocode, Geoscience, Geospatial, Google, Python, Selenium, Web Scraping
Neural Networks 201: All About Autoencoders

Autoencoders can be a very powerful tool for leveraging unlabeled data to solve a variety of problems, such as learning a "feature extractor" that helps build powerful classifiers, finding anomalies, or doing a Missing Value Imputation.

on Nov 21, 2019 in Autoencoder, Machine Learning, Missing Values, Neural Networks
The Notebook Anti-Pattern

This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way.

on Nov 21, 2019 in Jupyter, Python
Python Tuples and Tuple Methods

Brush up on your Python basics with this post on creating, using, and manipulating tuples.

on Nov 21, 2019 in Programming, Python
Pro Tips: How to deal with Class Imbalance and Missing Labels

Your spectacularly-performing machine learning model could be subject to the common culprits of class imbalance and missing labels. Learn how to handle these challenges with techniques that remain open areas of new research for addressing real-world machine learning problems.

on Nov 20, 2019 in Balancing Classes, Data Preparation, Missing Values, Tips, Unbalanced
Deep Learning for Image Classification with Less Data

In this blog I will be demonstrating how deep learning can be applied even if we don’t have enough data.

on Nov 20, 2019 in Deep Learning, Image Classification, Neural Networks, Small Data
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

The two main takeaways from this paper: firstly, a sharpening of my understanding of the difference between explainability and interpretability, and why the former may be problematic; and secondly some great pointers to techniques for creating truly interpretable models.

on Nov 20, 2019 in Interpretability, Machine Learning, Modeling
Why write for KDnuggets? Calling for original blogs and new authors

KDnuggets is calling for original blogs and contributions from new authors on AI, Data Science, Machine Learning, and related topics. The authors of most popular such blogs in December will be profiled in KDnuggets.

on Nov 19, 2019 in About KDnuggets, Blogs, Top bloggers
Reproducibility, Replicability, and Data Science

As cornerstones of scientific processes, reproducibility and replicability ensure results can be verified and trusted. These two concepts are also crucial in data science, and as a data scientist, you must follow the same rigor and standards in your projects.

on Nov 19, 2019 in Best Practices, Data Science, Overfitting, Reproducibility, Trust, Validation
The Math Behind Bayes

This post will be dedicated to explaining the maths behind Bayes Theorem, when its application makes sense, and its differences with Maximum Likelihood.

on Nov 19, 2019 in Bayes Theorem, Mathematics, Probability
Generalization in Neural Networks

When training a neural network in deep learning, its performance on processing new data is key. Improving the model's ability to generalize relies on preventing overfitting using these important methods.

on Nov 18, 2019 in Complexity, Deep Learning, Dropout, Neural Networks, Overfitting, Regularization, Training Data
The Reinforcement-Learning Methods that Allow AlphaStar to Outcompete Almost All Human Players at StarCraft II

The new AlphaStar achieved Grandmaster level at StarCraft II overcoming some of the limitations of the previous version. How did it do it?

on Nov 18, 2019 in DeepMind, Reinforcement Learning
GitHub Repo Raider and the Automation of Machine Learning

Since X never, ever marks the spot, this article raids the GitHub repos in search of quality automated machine learning resources. Read on for projects and papers to help understand and implement AutoML.

on Nov 18, 2019 in Automated Machine Learning, GitHub, Machine Learning, Movies, Python
On the sensationalism of artificial intelligence news

With artificial intelligence and machine learning now a mainstay of our daily awareness, news organizations have been seen to overstate the reality behind progress in the field. Learn more about recent examples of media hyperbole and explore why this may be happening.

on Nov 15, 2019 in AI, Hype, Media, Misconceptions
Python Lists and List Manipulation

In Python, lists store an ordered collection of items which can be of different types. This post is an overview of lists and their manipulation.

on Nov 15, 2019 in Programming, Python
How to Visualize Data in Python (and R)

Producing accessible data visualizations is a key data science skill. The following guidelines will help you create the best representations of your data using R and Python's Pandas library.

on Nov 14, 2019 in Data Visualization, Matplotlib, Python, R, SuperDataScience
Topics Extraction and Classification of Online Chats

This article provides covers how to automatically identify the topics within a corpus of textual data by using unsupervised topic modelling, and then apply a supervised classification algorithm to assign topic labels to each textual document by using the result of the previous step as target labels.

on Nov 14, 2019 in Chat, NLP, Topic Modeling
Testing Your Machine Learning Pipelines

Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.

on Nov 14, 2019 in Machine Learning, Pipeline, Python
Python Workout / Practices of a Python Pro / Classic Computer Science Problems in Python

Whether you’re a beginner or an expert, there’s always new ways you can improve your Python coding. Save 40% off this trio of Manning Python books today! Just enter the code nlpropython40 at checkout when you buy from manning.com.

on Nov 13, 2019 in Book, Manning, Python
Transfer Learning Made Easy: Coding a Powerful Technique

While the revolution of deep learning now impacts our daily lives, these networks are expensive. Approaches in transfer learning promise to ease this burden by enabling the re-use of trained models -- and this hands-on tutorial will walk you through a transfer learning technique you can run on your laptop.

on Nov 13, 2019 in Accuracy, Deep Learning, Image Classification, Keras, Machine Learning, TensorFlow, Transfer Learning
Beginners Guide to the Three Types of Machine Learning

The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.

on Nov 13, 2019 in Beginners, Classification, Machine Learning, Python, Regression, scikit-learn, Supervised Learning, Unsupervised Learning
How to Speed up Pandas by 4x with one line of code

While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.

By George Seif on Nov 12, 2019 in Data Preparation, Data Preprocessing, Modin, Pandas, Python
Understanding NLP and Topic Modeling Part 1

In this post, we seek to understand why topic modeling is important and how it helps us as data scientists.

on Nov 12, 2019 in NLP, Topic Modeling
Research Guide for Depth Estimation with Deep Learning

In this guide, we’ll look at papers aimed at solving the problems of depth estimation using deep learning.

on Nov 12, 2019 in Deep Learning, Neural Networks, Research
How to Extract Google Maps Coordinates

In this article, I will show you how to quickly extract Google Maps coordinates with a simple and easy method.

on Nov 11, 2019 in Google, Maps, Octoparse, Web Scraping
The Complete Data Science LinkedIn Profile Guide

With so many Data Scientists showing up on LinkedIn, it's time to make sure your profile is top-notch because your talent is still highly sought after. Recruitment specialists want to find you fast, and this guide will help you create the best profile to feature your expertise.

on Nov 11, 2019 in Career, Career Advice, Data Science Skills, Data Scientist, LinkedIn, Recruitment
How Data Analytics Can Assist in Fraud Detection

A primary advantage of data analytics tools is that they can handle massive quantities of information at once. These solutions typically learn what's normal within a collection of information and how to spot anomalies.

on Nov 11, 2019 in Analytics, Fraud, Fraud Detection
Facebook Adds This New Framework to It’s Reinforcement Learning Arsenal

ReAgent is a new framework that streamlines the implementation of reasoning systems.

on Nov 11, 2019 in Facebook, Reinforcement Learning
What is Data Science?

Data Science is pitched as a modern and exciting job offering high satisfaction. Does its reality really live up to the hype? Here, we show what it's really like to work as a Data Scientist.

on Nov 8, 2019 in Career, Data Science, Data Science Skills, Explained
Understanding Boxplots

A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

on Nov 8, 2019 in Data Visualization, Matplotlib, Pandas, Python, Seaborn
Orchestrating Dynamic Reports in Python and R with Rmd Files

Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis

on Nov 8, 2019 in Python, R, Report
Data Cleaning and Preprocessing for Beginners

Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

on Nov 7, 2019 in Beginners, Data Cleaning, Data Preprocessing, Pandas, Python, Sciforce
Set Operations Applied to Pandas DataFrames

In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.

on Nov 7, 2019 in Data Preparation, Data Science, Pandas, Python
How to Create a Vocabulary for NLP Tasks in Python

This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

on Nov 7, 2019 in Data Preparation, Data Preprocessing, NLP, Python
An Eight-Step Checklist for An Analytics Project

Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.

on Nov 6, 2019 in Analytics, Checklist, Deployment, Feature Selection, Statistics
Research Guide: Advanced Loss Functions for Machine Learning Models

This guide explores research centered on a variety of advanced loss functions for machine learning models.

on Nov 6, 2019 in Machine Learning, Research
10 Free Must-read Books on AI

Artificial Intelligence continues to fill the media headlines while scientists and engineers rapidly expand its capabilities and applications. With such explosive growth in the field, there is a great deal to learn. Dive into these 10 free books that are must-reads to support your AI study and work.

on Nov 5, 2019 in AI, Books, ebook, Free ebook
Probability Learning: Maximum Likelihood

The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.

on Nov 5, 2019 in Learning, Probability, Statistics
How to Become a Successful Healthcare Data Analyst

Are you interested in starting your career in the data analysis domain? Read this informative blog on how to get your career off the ground.

on Nov 5, 2019 in Career Advice, Data Analyst, Healthcare
Designing Your Neural Networks

Check out this step-by-step walk through of some of the more confusing aspects of neural nets to guide you to making smart decisions about your neural network architecture.

on Nov 4, 2019 in Beginners, Classification, Dropout, Gradient Descent, Neural Networks, Regression
Customer Segmentation Using K Means Clustering

Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.

on Nov 4, 2019 in Clustering, Customer Analytics, K-means, Python, Segmentation
Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch

The new release of PyTorch includes some impressive open source projects for deep learning researchers and developers.

on Nov 4, 2019 in Deep Learning, Facebook, PyTorch
Top Machine Learning Software Tools for Developers

As a developer who is excited about leveraging machine learning for faster and more effective development, these software tools are worth trying out.

on Nov 1, 2019 in Developers, Machine Learning
Build an Artificial Neural Network From Scratch: Part 1

This article focused on building an Artificial Neural Network using the Numpy Python library.

on Nov 1, 2019 in Neural Networks, numpy, Python
What is Machine Learning on Code?

Not only can MLonCode help companies streamline their codebase and software delivery processes, but it also helps organizations better understand and manage their engineering talents.

on Nov 1, 2019 in Machine Learning, Programming, Software

2019 Nov

Latest Posts

Top Posts