If you want to solve some real-world problems and design a cool product or algorithm, then having machine learning skills is not enough. You would need good working knowledge of data structures.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
As a data scientist — or someone interested in the field — you know the industry is constantly evolving. If you want to remain competitive, you need to keep up with popular trends.
In order for a data scientist to grow, they need to be challenged beyond the technical aspects of their jobs. They need to question their data sources, be concise in their insights, know their business and help guide their leaders.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
In this tutorial, we are going to show you how to work with Excel files in pandas, covering computer setup, reading in data from Excel files into pandas, data exploration in pandas, and more.
In this tutorial I want to show how you can implement a skip gram model in tensorflow to generate word vectors for any text you are working with and then use tensorboard to visualize them.
This article is about implementing Deep Learning (DL) using the H2O package in R. We start with a background on DL, followed by some features of H2O's DL framework, followed by an implementation using R.
In this tutorial, we will see how to apply a Genetic Algorithm (GA) for finding an optimal window size and a number of units in Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN).
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.
For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2016. It's probably as close to an out-of-the-box machine learning algorithm as you can get today.
But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.
Governance roles for data science and analytics teams are becoming more common... One of the key functions of this role is to perform analysis and validation of data sets in order to build confidence in the underlying data sets.
This is the narrative of a typical AI Sunday, where I decided to look at building a sequence to sequence (seq2seq) model based chatbot using some already available sample code and data from the Cornell movie database.
In this article, we’ll explain GANs by applying them to the task of generating images. One of the few successful techniques in unsupervised machine learning, and are quickly revolutionizing our ability to perform generative tasks.
If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.
Democratization is defined as the action/development of making something accessible to everyone, to the “common masses.” AI | ML | DL technology stacks are complicated systems to tune and maintain, expertise is limited, and one minimal change of the stack can lead to failure.
I wrote this quick primer so you don’t have to parse all the information out there and instead can learn the things you need to know to quickly get started.
More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.
Quantum Machine Learning (Quantum ML) is the interdisciplinary area combining Quantum Physics and Machine Learning(ML). It is a symbiotic association- leveraging the power of Quantum Computing to produce quantum versions of ML algorithms, and applying classical ML algorithms to analyze quantum systems. Read this article for an introduction to Quantum ML.
Interactive visualization of large datasets on the web has traditionally been impractical. Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale.
This article wants to give a flavor of the potentialities realized at the intersection of AI and Blockchain and discuss standard definitions, challenges, and benefits of this alliance, as well as about some interesting player in this space.
There are several tools to help you grasp the foundational principles and more. The list below gives you an idea of what’s available and how much it costs.
Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.