2018 Jan Tutorials, Overviews

Data Structures Related to Machine Learning Algorithms

If you want to solve some real-world problems and design a cool product or algorithm, then having machine learning skills is not enough. You would need good working knowledge of data structures.

on Jan 30, 2018 in Machine Learning, Mathematics, Programming, Statsbot
Using AutoML to Generate Machine Learning Pipelines with TPOT

This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.

on Jan 29, 2018 in Automated Machine Learning, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
How To Grow As A Data Scientist

In order for a data scientist to grow, they need to be challenged beyond the technical aspects of their jobs. They need to question their data sources, be concise in their insights, know their business and help guide their leaders.

on Jan 25, 2018 in Advice, Career, Data Scientist
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches

In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.

on Jan 24, 2018 in Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
Training and Visualising Word Vectors

In this tutorial I want to show how you can implement a skip gram model in tensorflow to generate word vectors for any text you are working with and then use tensorboard to visualize them.

on Jan 23, 2018 in Natural Language Processing, Text Mining, Visualization, word2vec
Deep Learning in H2O using R

This article is about implementing Deep Learning (DL) using the H2O package in R. We start with a background on DL, followed by some features of H2O's DL framework, followed by an implementation using R.

on Jan 22, 2018 in Backpropagation, Deep Learning, Gradient Descent, H2O, Machine Learning, R
Using Genetic Algorithm for Optimizing Recurrent Neural Networks

In this tutorial, we will see how to apply a Genetic Algorithm (GA) for finding an optimal window size and a number of units in Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN).

on Jan 22, 2018 in Automated Machine Learning, Genetic Algorithm, Keras, Neural Networks, Python, Recurrent Neural Networks
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search

Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.

on Jan 19, 2018 in Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
Propensity Score Matching in R

Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.

on Jan 18, 2018 in Bias, R, Statistics
Gradient Boosting in TensorFlow vs XGBoost

For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2016. It's probably as close to an out-of-the-box machine learning algorithm as you can get today.

on Jan 18, 2018 in Gradient Boosting, Python, TensorFlow, XGBoost
Learning Curves for Machine Learning

But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.

on Jan 17, 2018 in Bias, Machine Learning, Metrics, Training Data, Variance
Governance in Data Science

Governance roles for data science and analytics teams are becoming more common... One of the key functions of this role is to perform analysis and validation of data sets in order to build confidence in the underlying data sets.

on Jan 16, 2018 in Data Governance, Data Preparation, Data Science
A Day in the Life of an AI Developer

This is the narrative of a typical AI Sunday, where I decided to look at building a sequence to sequence (seq2seq) model based chatbot using some already available sample code and data from the Cornell movie database.

on Jan 16, 2018 in AI, Developer, TensorFlow
Elasticsearch for Dummies

In this blog, you’ll get to know the basics of Elasticsearch, its advantages, how to install it and indexing the documents using Elasticsearch.

on Jan 12, 2018 in Elasticsearch, NLP, Text Mining
A Primer on Web Scraping in R

If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.

on Jan 12, 2018 in Data Cleaning, Data Curation, R, Web Scraping
Democratizing Artificial Intelligence, Deep Learning, Machine Learning with Dell EMC Ready Solutions

Democratization is defined as the action/development of making something accessible to everyone, to the “common masses.” AI | ML | DL technology stacks are complicated systems to tune and maintain, expertise is limited, and one minimal change of the stack can lead to failure.

on Jan 11, 2018 in AI, Deep Learning, Dell, EMC, Machine Learning
Regularization in Machine Learning

Regularization is a technique that helps to avoid overfitting and also make a predictive model more understandable.

on Jan 10, 2018 in Machine Learning, Overfitting, Regularization
Training Sets, Test Sets, and 10-fold Cross-validation

More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.

on Jan 9, 2018 in Cross-validation, Data Mining, Datasets, Machine Learning
Custom Optimizer in TensorFlow

How to customize the optimizers to speed-up and improve the process of finding a (local) minimum of the loss function using TensorFlow.

on Jan 8, 2018 in Deep Learning, Optimization, TensorFlow
Supercharging Visualization with Apache Arrow

Interactive visualization of large datasets on the web has traditionally been impractical. Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale.

on Jan 5, 2018 in Apache Arrow, Big Data, Data Analytics, Data Visualization, Dremio, GPU, Graphistry, Open Source
10 Tools to Help You Learn R

There are several tools to help you grasp the foundational principles and more. The list below gives you an idea of what’s available and how much it costs.

on Jan 4, 2018 in R, Tools, Training
Docker for Data Science

Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.

on Jan 2, 2018 in Data Science, Docker

2018 Jan Tutorials, Overviews

Latest Posts

Top Posts