In this post I’ll share how I’ve been studying Deep Learning and using it to solve data science problems. It’s an informal post but with interesting content (I hope).
If you want to solve some real-world problems and design a cool product or algorithm, then having machine learning skills is not enough. You would need good working knowledge of data structures.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
We discuss 3Vs of Big Data; Infonomics and many aspects of monetizing information including promising analytics methods, successful companies, main challenges; Information marketplaces and why data ownership concept is misguided, and more.
In order for a data scientist to grow, they need to be challenged beyond the technical aspects of their jobs. They need to question their data sources, be concise in their insights, know their business and help guide their leaders.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
In this tutorial I want to show how you can implement a skip gram model in tensorflow to generate word vectors for any text you are working with and then use tensorboard to visualize them.
This article is about implementing Deep Learning (DL) using the H2O package in R. We start with a background on DL, followed by some features of H2O's DL framework, followed by an implementation using R.
In this tutorial, we will see how to apply a Genetic Algorithm (GA) for finding an optimal window size and a number of units in Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN).
But how do you learn data science? Let’s take a look at some of the steps you can take to begin your journey into data science without needing a degree, including Springboard’s Data Science Career Track.
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
We built a deep learning system that can automatically analyze and score an image for aesthetic quality with high accuracy. Check the demo and see your photo measures up!
Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.
For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2016. It's probably as close to an out-of-the-box machine learning algorithm as you can get today.
But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.
We review recent developments and tools in topological data analysis, including applications of persistent homology to psychometrics and a recent extension of piecewise regression, called Morse-Smale regression.
Governance roles for data science and analytics teams are becoming more common... One of the key functions of this role is to perform analysis and validation of data sets in order to build confidence in the underlying data sets.
This is the narrative of a typical AI Sunday, where I decided to look at building a sequence to sequence (seq2seq) model based chatbot using some already available sample code and data from the Cornell movie database.
This article will help you understand why we need the learning rate and whether it is useful or not for training an artificial neural network. Using a very simple Python code for a single layer perceptron, the learning rate value will get changed to catch its idea.
If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.
Democratization is defined as the action/development of making something accessible to everyone, to the “common masses.” AI | ML | DL technology stacks are complicated systems to tune and maintain, expertise is limited, and one minimal change of the stack can lead to failure.
Darrell Huff's classic How to Lie with Statistics is perhaps more relevant than ever. In this short article, I revisit this theme from some different angles.
This article contains a lot of links to resources that I think are very helpful in getting you started to "think like a data scientist" which in my opinion is the most important step of the transition. I hope that you find this useful.
More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.
Artificial General Intelligence (AGI) will likely be achieved in less than 50 years, according to latest KDnuggets Poll. The median estimate from all regions was 21-50 years, except in Asia where AGI is expected in 11-20 years.
Interactive visualization of large datasets on the web has traditionally been impractical. Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale.
This article presents our opinions and suggestions on how an Advanced Analytics department should operate. We hope this will be useful for those who want to implement analytics work in their company, as well as for existing departments.
There are several tools to help you grasp the foundational principles and more. The list below gives you an idea of what’s available and how much it costs.
In this webinar, Jan 11, DataRobot will show how automated machine learning can be used to reduce false positive rates, thereby improving the efficiency of AML transaction monitoring and reducing costs.
Nonprofits can use analytics to boost their fundraising efforts, measure and monitor the impact of their activities, build predictive models, optimize allocation of funds, and more
Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.