2018 Jan Tutorials, Overviews
All (108) | Courses, Education (11) | Meetings (8) | News, Features (11) | Opinions, Interviews (31) | Top Stories, Tweets (11) | Tutorials, Overviews (30) | Webcasts & Webinars (6)
-
Data Structures Related to Machine Learning Algorithms - Jan 30, 2018.
If you want to solve some real-world problems and design a cool product or algorithm, then having machine learning skills is not enough. You would need good working knowledge of data structures. - Using AutoML to Generate Machine Learning Pipelines with TPOT
- Jan 29, 2018.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
- 5 Key Data Science Job Market Trends
- Jan 26, 2018.
As a data scientist — or someone interested in the field — you know the industry is constantly evolving. If you want to remain competitive, you need to keep up with popular trends.
-
A Beginner’s Guide to Data Engineering – Part I - Jan 25, 2018.
Data Engineering: The Close Cousin of Data Science. -
How To Grow As A Data Scientist - Jan 25, 2018.
In order for a data scientist to grow, they need to be challenged beyond the technical aspects of their jobs. They need to question their data sources, be concise in their insights, know their business and help guide their leaders. - Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches
- Jan 24, 2018.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
- Using Excel with Pandas
- Jan 23, 2018.
In this tutorial, we are going to show you how to work with Excel files in pandas, covering computer setup, reading in data from Excel files into pandas, data exploration in pandas, and more.
- Training and Visualising Word Vectors
- Jan 23, 2018.
In this tutorial I want to show how you can implement a skip gram model in tensorflow to generate word vectors for any text you are working with and then use tensorboard to visualize them.
- Deep Learning in H2O using R
- Jan 22, 2018.
This article is about implementing Deep Learning (DL) using the H2O package in R. We start with a background on DL, followed by some features of H2O's DL framework, followed by an implementation using R.
- Using Genetic Algorithm for Optimizing Recurrent Neural Networks
- Jan 22, 2018.
In this tutorial, we will see how to apply a Genetic Algorithm (GA) for finding an optimal window size and a number of units in Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN).
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search
- Jan 19, 2018.
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
- Propensity Score Matching in R
- Jan 18, 2018.
Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.
- Gradient Boosting in TensorFlow vs XGBoost
- Jan 18, 2018.
For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2016. It's probably as close to an out-of-the-box machine learning algorithm as you can get today.
- Learning Curves for Machine Learning
- Jan 17, 2018.
But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.
- Governance in Data Science
- Jan 16, 2018.
Governance roles for data science and analytics teams are becoming more common... One of the key functions of this role is to perform analysis and validation of data sets in order to build confidence in the underlying data sets.
-
A Day in the Life of an AI Developer - Jan 16, 2018.
This is the narrative of a typical AI Sunday, where I decided to look at building a sequence to sequence (seq2seq) model based chatbot using some already available sample code and data from the Cornell movie database. - Generative Adversarial Networks, an overview
- Jan 15, 2018.
In this article, we’ll explain GANs by applying them to the task of generating images. One of the few successful techniques in unsupervised machine learning, and are quickly revolutionizing our ability to perform generative tasks.
- Elasticsearch for Dummies
- Jan 12, 2018.
In this blog, you’ll get to know the basics of Elasticsearch, its advantages, how to install it and indexing the documents using Elasticsearch.
- A Primer on Web Scraping in R
- Jan 12, 2018.
If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.
- Democratizing Artificial Intelligence, Deep Learning, Machine Learning with Dell EMC Ready Solutions
- Jan 11, 2018.
Democratization is defined as the action/development of making something accessible to everyone, to the “common masses.” AI | ML | DL technology stacks are complicated systems to tune and maintain, expertise is limited, and one minimal change of the stack can lead to failure.
- Regularization in Machine Learning
- Jan 10, 2018.
Regularization is a technique that helps to avoid overfitting and also make a predictive model more understandable.
-
How Docker Can Help You Become A More Effective Data Scientist - Jan 10, 2018.
I wrote this quick primer so you don’t have to parse all the information out there and instead can learn the things you need to know to quickly get started. - Training Sets, Test Sets, and 10-fold Cross-validation
- Jan 9, 2018.
More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.
- Custom Optimizer in TensorFlow
- Jan 8, 2018.
How to customize the optimizers to speed-up and improve the process of finding a (local) minimum of the loss function using TensorFlow.
- Introductory Data Concepts: Fantastic Video Tutorials from Ronald van Loon
- Jan 8, 2018.
Check out these introductory data videos from noted expert and influencer Ronald van Loon.
-
Quantum Machine Learning: An Overview - Jan 5, 2018.
Quantum Machine Learning (Quantum ML) is the interdisciplinary area combining Quantum Physics and Machine Learning(ML). It is a symbiotic association- leveraging the power of Quantum Computing to produce quantum versions of ML algorithms, and applying classical ML algorithms to analyze quantum systems. Read this article for an introduction to Quantum ML. - Supercharging Visualization with Apache Arrow
- Jan 5, 2018.
Interactive visualization of large datasets on the web has traditionally been impractical. Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale.
- The Convergence of AI and Blockchain: What’s the deal?
- Jan 5, 2018.
This article wants to give a flavor of the potentialities realized at the intersection of AI and Blockchain and discuss standard definitions, challenges, and benefits of this alliance, as well as about some interesting player in this space.
- 10 Tools to Help You Learn R
- Jan 4, 2018.
There are several tools to help you grasp the foundational principles and more. The list below gives you an idea of what’s available and how much it costs.
-
Docker for Data Science - Jan 2, 2018.
Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.