2017 Aug

Top 10 Machine Learning Use Cases: Part 1

This post is the first in a series whose aim is to shake up our intuitions about what machine learning is making possible in specific sectors — to look beyond the set of use cases that always come to mind.

on Aug 31, 2017 in Belgium, Colombia, Government, IBM, IBM SPSS Modeler, Machine Learning, Use Cases
Next Generation Data Manipulation with R and dplyr

The idea behind the dplyr package is to do one thing at a time. dplyr has separate functions for every task which make its implementation crisp and easy to understand.

on Aug 31, 2017 in Data Cleaning, Data Exploration, R, R Packages
Learning Machine Learning… with Flashcards

Chris Albon has created and shared a way more cool way to reinforce your machine learning learning (not to be confused with learning reinforcement learning): the flashcard.

on Aug 31, 2017 in Education, Machine Learning
Are physicians worried about computers machine learning their jobs?

We review JAMA article on “Unintended Consequences of Machine Learning in Medicine” and argue that a number of alarming opinions in this pieces are not supported by evidence.

on Aug 30, 2017 in Automation, Decision Support, Healthcare, Machine Learning
Using GRAKN.AI to Detect Patterns in Credit Fraud Data

The term Horn Clause Mining, similar to Rule Based Machine Learning or Inductive Logic Programming, is used to describe the inverse of this functionality. Given a large enough knowledge base, can we infer rules that describe the data accurately?

on Aug 30, 2017 in Fraud, Fraud Detection, GRAKN.AI
PyTorch or TensorFlow?

PyTorch is better for rapid prototyping in research, for hobbyists and for small scale projects. TensorFlow is better for large-scale deployments, especially when cross-platform and embedded deployment is a consideration.

on Aug 29, 2017 in Deep Learning, Neural Networks, PyTorch, TensorFlow
Vital Statistics You Never Learned… Because They’re Never Taught

Marketing scientist Kevin Gray asks Professor Frank Harrell about some important things we often get wrong about statistics.

on Aug 29, 2017 in Bayesian, Data Science, Machine Learning, Statistics
Python overtakes R, becomes the leader in Data Science, Machine Learning platforms

While Python did not "swallow" R, in 2017 Python ecosystem overtook R as the leading platform for Analytics, Data Science, and Machine Learning and is pulling users from other platforms.

on Aug 28, 2017 in Data Science Platform, Poll, Python, Python vs R, R
An Intuitive Guide to Deep Network Architectures

How and why do different Deep Learning models work? We provide an intuitive explanation for 3 very popular DL models: Resnet, Inception, and Xception.

on Aug 28, 2017 in Deep Learning, Keras, Neural Networks
Top Stories, Aug 21-27: 42 Steps to Mastering Data Science; Deep Learning is not the AI future

Also: 37 Reasons why your Neural Network is not working; Machine Learning vs. Statistics: The Texas Death Match of Data Science; Understanding overfitting: an inaccurate meme in Machine Learning; Recommendation System Algorithms: An Overview; The Ultimate Guide to Basic Data Cleaning

on Aug 28, 2017 in Top stories
Support Vector Machine (SVM) Tutorial: Learning SVMs From Examples

In this post, we will try to gain a high-level understanding of how SVMs work. I’ll focus on developing intuition rather than rigor. What that essentially means is we will skip as much of the math as possible and develop a strong intuition of the working principle.

on Aug 28, 2017 in Algorithms, Machine Learning, Statsbot, Support Vector Machines, SVM
42 Steps to Mastering Data Science

This post is a collection of 6 separate posts of 7 steps a piece, each for mastering and better understanding a particular data science topic, with topics ranging from data preparation, to machine learning, to SQL databases, to NoSQL and beyond.

on Aug 25, 2017 in Data Preparation, Data Science, Deep Learning, Machine Learning, NoSQL, Python, SQL
How To Write Better SQL Queries: The Definitive Guide – Part 2

Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.

on Aug 24, 2017 in Algorithms, Complexity, Databases, Relational Databases, SQL
The Ultimate Guide to Basic Data Cleaning

Data cleaning can seem intimidating, but it’s not hard if you know the basic steps. That’s why we’re excited to announce our newest ebook, “The Ultimate Guide to Basic Data Cleaning”!

on Aug 24, 2017 in Data Cleaning, Data Preparation, ebook, Free ebook
Understanding overfitting: an inaccurate meme in Machine Learning

Applying cross-validation prevents overfitting is a popular meme, but is not actually true – it more of an urban legend. We examine what is true and how overfitting is different from overtraining.

on Aug 23, 2017 in Cross-validation, Machine Learning, Overfitting
How To Write Better SQL Queries: The Definitive Guide – Part 1

Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.

on Aug 23, 2017 in Databases, Relational Databases, SQL
Machine Learning vs. Statistics: The Texas Death Match of Data Science

Throughout its history, Machine Learning (ML) has coexisted with Statistics uneasily, like an ex-boyfriend accidentally seated with the groom’s family at a wedding reception: both uncertain where to lead the conversation, but painfully aware of the potential for awkwardness.

on Aug 23, 2017 in Machine Learning, Statistics
37 Reasons why your Neural Network is not working

Over the course of many debugging sessions, I’ve compiled my experience along with the best ideas around in this handy list. I hope they would be useful to you.

on Aug 22, 2017 in Data Engineering, Data Preparation, Gradient Descent, Neural Networks
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets

In this blog, I explore three sets of APIs—RDDs, DataFrames, and Datasets—available in a pre-release preview of Apache Spark 2.0; why and when you should use each set; outline their performance and optimization benefits; and enumerate scenarios when to use DataFrames and Datasets instead of RDDs.

on Aug 22, 2017 in Apache Spark, API
Recommendation System Algorithms: An Overview

This post presents an overview of the main existing recommendation system algorithms, in order for data scientists to choose the best one according a business’s limitations and requirements.

on Aug 22, 2017 in Algorithms, Recommendations, Recommender Systems, Statsbot
Using AI to Super Compress Images

Neural Network algorithms are showing promising results for different complex problems. Here we discuss how these algorithms are used in image compression.

on Aug 21, 2017 in AI, Compression, Convolutional Neural Networks, Image Recognition
What is the most important step in a machine learning project?

In any machine learning project, business understanding is very important. But in practice, it does not get enough attention. Here we explain what questions should be asked.

on Aug 18, 2017 in Business, CRISP-DM, Machine Learning, Methodology
Deep Learning and Neural Networks Primer: Basic Concepts for Beginners

This is a collection of introductory posts which present a basic overview of neural networks and deep learning. Start by learning some key terminology and gaining an understanding through some curated resources. Then look at summarized important research in the field before looking at a pair of concise case studies.

on Aug 18, 2017 in Deep Learning, Neural Networks
The Rise of GPU Databases

The recent but noticeable shift from CPUs to GPUs is mainly due to the unique benefits they bring to sectors like AdTech, finance, telco, retail, or security/IT . We examine where GPU databases shine.

on Aug 17, 2017 in Big Data, Database, GPU, Predictive Analytics, SQL, SQream
A Guide to Instagramming with Python for Data Analysis

I am writing this article to show you the basics of using Instagram in a programmatic way. You can benefit from this if you want to use it in a data analysis, computer vision, or any other cool project you can think of.

on Aug 17, 2017 in Data Analysis, Image Recognition, Instagram, Python
Lessons Learned From Benchmarking Fast Machine Learning Algorithms

Boosted decision trees are responsible for more than half of the winning solutions in machine learning challenges hosted at Kaggle, and require minimal tuning. We evaluate two popular tree boosting software packages: XGBoost and LightGBM and draw 4 important lessons.

on Aug 16, 2017 in Benchmark, Decision Trees, Kaggle, Machine Learning, Microsoft, XGBoost
First Steps of Learning Deep Learning: Image Classification in Keras

Whether you want to start learning deep learning for you career, to have a nice adventure (e.g. with detecting huggable objects) or to get insight into machines before they take over, this post is for you!

on Aug 16, 2017 in Deep Learning, Image Recognition, Keras, Neural Networks
4 Industries Being Transformed by Machine Learning and Robotics

When used in combination with big data and machine learning, both AI and robotics can actively improve over time as they collect more information. You don’t have to look far to see how these technologies have revolutionized the world, and continue to do so.

on Aug 15, 2017 in AI, Automation, Industry, Machine Learning, Robots
Comparing Distance Measurements with Python and SciPy

This post introduces five perfectly valid ways of measuring distances between data points. We will also perform simple demonstration and comparison with Python and the SciPy library.

on Aug 15, 2017 in Clustering, K-means, Python, SciPy
Global Big Data Conference, Santa Clara, Aug 29-31 – KDnuggets Offer

Global Big Data Conference, a leading vendor agnostic conference for the Big Data community, will hold 5th conference in Santa Clara. Use code KDnuggets to save.

on Aug 14, 2017 in Big Data, CA, Finance, Global Big Data Conference, Industry, Santa Clara
Making Predictive Models Robust: Holdout vs Cross-Validation

The validation step helps you find the best parameters for your predictive model and prevent overfitting. We examine pros and cons of two popular validation strategies: the hold-out strategy and k-fold.

on Aug 11, 2017 in Cross-validation, Dataiku, Overfitting
Data Science Primer: Basic Concepts for Beginners

This collection of concise introductory data science tutorials cover topics including the difference between data mining and statistics, supervised vs. unsupervised learning, and the types of patterns we can mine from data.

on Aug 11, 2017 in Bias, Data Mining, Data Science, Distribution, Ensemble Methods, Statistics
What Artificial Intelligence and Machine Learning Can Do—And What It Can’t

I have seen situations where AI (or at least machine learning) had an incredible impact on a business—I also have seen situations where this was not the case. So, what was the difference?

on Aug 10, 2017 in AI, Andrew Ng, Ingo Mierswa, Machine Learning, RapidMiner
What Is Optimization And How Does It Benefit Business?

Here we explain what Mathematical Optimisation is, and discuss how it can be applied in business and finance to make decisions.

on Aug 10, 2017 in Business, Credit Risk, Decision Making, Optimization
How Convolutional Neural Networks Accomplish Image Recognition?

Image recognition is very interesting and challenging field of study. Here we explain concepts, applications and techniques of image recognition using Convolutional Neural Networks.

on Aug 9, 2017 in Clarifai, Convolutional Neural Networks, IBM Watson, Image Recognition, Neural Networks
Google Analytics Audit Checklist and Tools

In this post, a Google Analytics & Google AdWords expert shares his tips and tools of intelligent Google Analytics auditing. Read on for some practical insight.

on Aug 9, 2017 in Analytics, Checklist, Google Analytics, Web Analytics
Mind Reading: Using Artificial Neural Nets to Predict Viewed Image Categories From EEG Readings

This post outlines the approach taken at a recent deep learning hackathon, hosted by YCombinator-backed startup DeepGram. The dataset: EEG readings from a Stanford research project that predicted which category of images their test subjects were viewing using linear discriminant analysis.

on Aug 9, 2017 in Brain, Convolutional Neural Networks, Deep Learning, Neural Networks, SVDS
Going deeper with recurrent networks: Sequence to Bag of Words Model

Deep learning makes it possible to convert unstructured text to computable formats, incorporating semantic knowledge to train machine learning models. These digital data troves help us understand people on a new level.

on Aug 8, 2017 in Deep Learning, LSTM, Machine Learning, NLP, Recurrent Neural Networks
Sampling: A Primer

Though it doesn’t get a lot of buzz, sampling is fundamental to any field of science. Marketing scientist Kevin Gray asks Dr. Stas Kolenikov, Senior Scientist at Abt Associates, what marketing researchers and data scientists most need to know about it.

on Aug 8, 2017 in Marketing, Sampling
How I Used Deep Learning To Train A Chatbot To Talk Like Me

In this post, we’ll be looking at how we can use a deep learning model to train a chatbot on my past social media conversations in hope of getting the chatbot to respond to messages the way that I would.

on Aug 8, 2017 in Chatbot, Deep Learning
Why Apache Arrow is the future for open source-columnar memory analytics

Apache Arrow is a de-facto standard for columnar in-memory analytics. In the coming years we can expect all the big data platforms adopting Apache Arrow as its columnar in-memory layer.

on Aug 7, 2017 in Analytics, Apache, Apache Arrow, Big Data, In-Memory Computing, Open Source
Insights from Data mining of Airbnb Listings

AirBnB has 2 million listings and operates in 65,000 cities. Here we look at insights related to vacation rental space in the sharing economy using the property listings data for Texas, US.

on Aug 4, 2017 in AirBnB, Data Mining, R, TX
Machine Learning Algorithms: A Concise Technical Overview – Part 1

These short and to-the-point tutorials may provide the assistance you are looking for. Each of these posts concisely covers a single, specific machine learning concept.

on Aug 4, 2017 in Algorithms, Machine Learning
Train your Deep Learning Faster: FreezeOut

We explain another novel method for much faster training of Deep Learning models by freezing the intermediate layers, and show that it has little or no effect on accuracy.

on Aug 3, 2017 in Deep Learning, Machine Learning, Model Performance, Modeling, Neural Networks
The Machine Learning Abstracts: Decision Trees

Decision trees are a classic machine learning technique. The basic intuition behind a decision tree is to map out all possible decision paths in the form of a tree.

on Aug 3, 2017 in Algorithms, Decision Trees, Machine Learning
Stanford Online Data Mining Courses and Certificates

With Stanford Online Data Mining Certificates, learn to guide important business decisions, become indispensable, and give your career a boost.

on Aug 1, 2017 in Certificate, Data Mining Training, Data Science Education, Online Education, Stanford
Visualizing Convolutional Neural Networks with Open-source Picasso

Toolkits for standard neural network visualizations exist, along with tools for monitoring the training process, but are often tied to the deep learning framework. Could a general, easy-to-setup tool for generating standard visualizations provide a sanity check on the learning process?

on Aug 1, 2017 in Convolutional Neural Networks, Neural Networks, Open Source, Visualization
Beautiful Python Visualizations: An Interview with Bryan Van de Ven, Bokeh Core Developer

Read this insightful interview with Bokeh's core developer, Bryan Van de Ven, and gain an understanding of what Bokeh is, when and why you should use it, and what makes Bryan a great fit for helming this project.

on Aug 1, 2017 in Bokeh, Continuum Analytics, Data Visualization, Visualization

2017 Aug

Latest Posts

Top Posts