2018 Jan

Automated Text Classification Using Machine Learning

In this post, we talk about the technology, applications, customization, and segmentation related to our automated text classification API.

on Jan 30, 2018 in API, Deep Learning, Machine Learning, ParallelDots, Text Classification
My Journey into Deep Learning

In this post I’ll share how I’ve been studying Deep Learning and using it to solve data science problems. It’s an informal post but with interesting content (I hope).

on Jan 30, 2018 in Deep Learning, MOOC, Neural Networks
Data Structures Related to Machine Learning Algorithms

If you want to solve some real-world problems and design a cool product or algorithm, then having machine learning skills is not enough. You would need good working knowledge of data structures.

on Jan 30, 2018 in Machine Learning, Mathematics, Programming, Statsbot
Data Scientist – best job in America, 3 years in a row

For the third year in a row, Data Scientist was ranked as the no. 1 job in America by Glassdoor.

on Jan 29, 2018 in Career, Data Scientist, Glassdoor, Jobs, Trends
Error Analysis to your Rescue – Lessons from Andrew Ng, part 3

The last entry in a series of posts about Andrew Ng's lessons on strategies to follow when fixing errors in your algorithm

on Jan 29, 2018 in Andrew Ng, Bias, Distribution, Machine Learning, Variance
Using AutoML to Generate Machine Learning Pipelines with TPOT

This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.

on Jan 29, 2018 in Automated Machine Learning, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
Exclusive Interview: Doug Laney on Big Data and Infonomics

We discuss 3Vs of Big Data; Infonomics and many aspects of monetizing information including promising analytics methods, successful companies, main challenges; Information marketplaces and why data ownership concept is misguided, and more.

on Jan 25, 2018 in 3Vs of Big Data, Big Data, Doug Laney, Infonomics, Marketplace, Privacy
Four Big Data Trends for 2018

Curious about the future of Big Data and AI? Here’s what the trends have it in 2018 for innovations.

on Jan 25, 2018 in 2018 Predictions, AI, Big Data, Chatbot, Explainable AI, IoT, Trends
How To Grow As A Data Scientist

In order for a data scientist to grow, they need to be challenged beyond the technical aspects of their jobs. They need to question their data sources, be concise in their insights, know their business and help guide their leaders.

on Jan 25, 2018 in Advice, Career, Data Scientist
Want to Become a Data Scientist? Try Feynman Technique

Get over the impostor syndrome by developing a strong understanding about the various Data Science topics using the Feynman Technique

on Jan 24, 2018 in Advice, Data Scientist, Storytelling
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches

In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.

on Jan 24, 2018 in Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
Training and Visualising Word Vectors

In this tutorial I want to show how you can implement a skip gram model in tensorflow to generate word vectors for any text you are working with and then use tensorboard to visualize them.

on Jan 23, 2018 in Natural Language Processing, Text Mining, Visualization, word2vec
Deep Learning in H2O using R

This article is about implementing Deep Learning (DL) using the H2O package in R. We start with a background on DL, followed by some features of H2O's DL framework, followed by an implementation using R.

on Jan 22, 2018 in Backpropagation, Deep Learning, Gradient Descent, H2O, Machine Learning, R
Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AI

A complete and unbiased comparison of the three most common Cloud Technologies for Machine Learning as a Service.

on Jan 22, 2018 in AI, Amazon, Azure ML, Cloud, Google, Google Cloud, Machine Learning, Microsoft, MLaaS, Sagemaker
Using Genetic Algorithm for Optimizing Recurrent Neural Networks

In this tutorial, we will see how to apply a Genetic Algorithm (GA) for finding an optimal window size and a number of units in Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN).

on Jan 22, 2018 in Automated Machine Learning, Genetic Algorithm, Keras, Neural Networks, Python, Recurrent Neural Networks
Learn Data Science Without a Degree

But how do you learn data science? Let’s take a look at some of the steps you can take to begin your journey into data science without needing a degree, including Springboard’s Data Science Career Track.

on Jan 19, 2018 in Career, Data Science Education, Interview, Springboard
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search

Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.

on Jan 19, 2018 in Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
Visual Aesthetics: Judging photo quality using AI techniques

We built a deep learning system that can automatically analyze and score an image for aesthetic quality with high accuracy. Check the demo and see your photo measures up!

on Jan 18, 2018 in AI, Deep Learning, Image Recognition
Propensity Score Matching in R

Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.

on Jan 18, 2018 in Bias, R, Statistics
Gradient Boosting in TensorFlow vs XGBoost

For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2016. It's probably as close to an out-of-the-box machine learning algorithm as you can get today.

on Jan 18, 2018 in Gradient Boosting, Python, TensorFlow, XGBoost
Learning Curves for Machine Learning

But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.

on Jan 17, 2018 in Bias, Machine Learning, Metrics, Training Data, Variance
Topological Data Analysis for Data Professionals: Beyond Ayasdi

We review recent developments and tools in topological data analysis, including applications of persistent homology to psychometrics and a recent extension of piecewise regression, called Morse-Smale regression.

on Jan 16, 2018 in Algorithms, Clustering, R, Regression, Topological Data Analysis
Governance in Data Science

Governance roles for data science and analytics teams are becoming more common... One of the key functions of this role is to perform analysis and validation of data sets in order to build confidence in the underlying data sets.

on Jan 16, 2018 in Data Governance, Data Preparation, Data Science
A Day in the Life of an AI Developer

This is the narrative of a typical AI Sunday, where I decided to look at building a sequence to sequence (seq2seq) model based chatbot using some already available sample code and data from the Cornell movie database.

on Jan 16, 2018 in AI, Developer, TensorFlow
Is Learning Rate Useful in Artificial Neural Networks?

This article will help you understand why we need the learning rate and whether it is useful or not for training an artificial neural network. Using a very simple Python code for a single layer perceptron, the learning rate value will get changed to catch its idea.

on Jan 15, 2018 in Hyperparameter, Neural Networks, Python
Elasticsearch for Dummies

In this blog, you’ll get to know the basics of Elasticsearch, its advantages, how to install it and indexing the documents using Elasticsearch.

on Jan 12, 2018 in Elasticsearch, NLP, Text Mining
A Primer on Web Scraping in R

If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.

on Jan 12, 2018 in Data Cleaning, Data Curation, R, Web Scraping
Beyond Word2Vec Usage For Only Words

A good example on how to use word2vec in order to get recommendations fast and efficiently.

on Jan 11, 2018 in Machine Learning, Sports, Star Wars, word2vec
Democratizing Artificial Intelligence, Deep Learning, Machine Learning with Dell EMC Ready Solutions

Democratization is defined as the action/development of making something accessible to everyone, to the “common masses.” AI | ML | DL technology stacks are complicated systems to tune and maintain, expertise is limited, and one minimal change of the stack can lead to failure.

on Jan 11, 2018 in AI, Deep Learning, Dell, EMC, Machine Learning
How Not To Lie With Statistics

Darrell Huff's classic How to Lie with Statistics is perhaps more relevant than ever. In this short article, I revisit this theme from some different angles.

on Jan 11, 2018 in Statistics, Trust
Top 10 TED Talks for Data Scientists and Machine Learning Engineers

A comprehensive and diverse compilation of TED talks to understand the big picture of AI and Machine Learning.

on Jan 10, 2018 in AGI, AI, Anthony Goldbloom, Machine Learning, Nick Bostrom, TED, Zeynep Tufekci
Regularization in Machine Learning

Regularization is a technique that helps to avoid overfitting and also make a predictive model more understandable.

on Jan 10, 2018 in Machine Learning, Overfitting, Regularization
The Art of Learning Data Science

A beginner’s account of getting into comfort zone of learning Data Science.

on Jan 9, 2018 in Coursera, Data Science, Data Science Education, Kaggle, LinkedIn, MOOC
Becoming a Data Scientist

This article contains a lot of links to resources that I think are very helpful in getting you started to "think like a data scientist" which in my opinion is the most important step of the transition. I hope that you find this useful.

on Jan 9, 2018 in Advice, Data Scientist
Training Sets, Test Sets, and 10-fold Cross-validation

More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.

on Jan 9, 2018 in Cross-validation, Data Mining, Datasets, Machine Learning
Custom Optimizer in TensorFlow

How to customize the optimizers to speed-up and improve the process of finding a (local) minimum of the loss function using TensorFlow.

on Jan 8, 2018 in Deep Learning, Optimization, TensorFlow
Cartoon: AI at Home: How Far Can A Smart Device Go?

New KDnuggets cartoon looks at AI at Home technology and considers how a novel way how a smart device can help its owner to lose weight.

on Jan 6, 2018 in AI, Cartoon, Humor, IoT
Artificial General Intelligence (AGI) in less than 50 years, say KDnuggets readers

Artificial General Intelligence (AGI) will likely be achieved in less than 50 years, according to latest KDnuggets Poll. The median estimate from all regions was 21-50 years, except in Asia where AGI is expected in 11-20 years.

on Jan 5, 2018 in AGI, AI, Artificial Intelligence, Poll, Singularity
Supercharging Visualization with Apache Arrow

Interactive visualization of large datasets on the web has traditionally been impractical. Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale.

on Jan 5, 2018 in Apache Arrow, Big Data, Data Analytics, Data Visualization, Dremio, GPU, Graphistry, Open Source
How to build a Successful Advanced Analytics Department

This article presents our opinions and suggestions on how an Advanced Analytics department should operate. We hope this will be useful for those who want to implement analytics work in their company, as well as for existing departments.

on Jan 4, 2018 in Analytics, Analytics Team, Business, Data Science Team, Gartner, KPI
10 Tools to Help You Learn R

There are several tools to help you grasp the foundational principles and more. The list below gives you an idea of what’s available and how much it costs.

on Jan 4, 2018 in R, Tools, Training
Enhancing Anti-Money Laundering Programs with Automated Machine Learning, Jan 11 Webinar

In this webinar, Jan 11, DataRobot will show how automated machine learning can be used to reduce false positive rates, thereby improving the efficiency of AML transaction monitoring and reducing costs.

on Jan 3, 2018 in Automated Data Science, Automated Machine Learning, DataRobot, Finance, Money Laundering
How Nonprofits Can Benefit from the Power of Data Science

Nonprofits can use analytics to boost their fundraising efforts, measure and monitor the impact of their activities, build predictive models, optimize allocation of funds, and more

on Jan 3, 2018 in Big Data, Data Science, Social Good
Docker for Data Science

Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.

on Jan 2, 2018 in Data Science, Docker

2018 Jan

Latest Posts

Top Posts