2017 Apr

Cartoon: Machine Learning – What They Think I Do

Different views of Machine Learning: What society, my friends, my parents, other programmers think I do, and what I really do.

on Apr 29, 2017 in Cartoon, Machine Learning
Keep it simple! How to understand Gradient Descent algorithm

In Data Science, Gradient Descent is one of the important and difficult concepts. Here we explain this concept with an example, in a very simple way. Check this out.

on Apr 28, 2017 in Algorithms, Gradient Descent
One Deep Learning Virtual Machine to Rule Them All

The frontend code of programming languages only needs to parse and translate source code to an intermediate representation (IR). Deep Learning frameworks will eventually need their own “IR.”

on Apr 28, 2017 in Deep Learning, Neural Networks
Models: From the Lab to the Factory

In this post, we’ll go over techniques to avoid these scenarios through the process of model management and deployment.

on Apr 27, 2017 in Data Science, Modeling, SVDS
Dask and Pandas and XGBoost: Playing nicely between distributed systems

This blogpost gives a quick example using Dask.dataframe to do distributed Pandas data wrangling, then using a new dask-xgboost package to setup an XGBoost cluster inside the Dask cluster and perform the handoff.

on Apr 27, 2017 in Dask, Distributed Systems, Pandas, Python, XGBoost
What Data You Analyzed – KDnuggets Poll Results and Trends

Image/video data analysis is surging, JSON replacing XML, anonymized data usage is growing in US and Europe (but not in Asia), itemsets and Twitter analysis is declining - some of the highlights of KDnuggets Poll on data types used.

on Apr 26, 2017 in Anonymized, Asia, Data types, Europe, Image Recognition, Poll, Text Analysis, Time Series, USA
The Analytics of Emotion and Depression

Analytics can be used to provide a boost to the cure of depression. How analytics is being adopted by companies like Microsoft, Facebook to handle and detect vulnerable targets of depression.

on Apr 26, 2017 in Analytics, Depression, India, Instagram, Sentiment Analysis, Social Media Analytics, Text Analysis
How to Build a Recurrent Neural Network in TensorFlow

This is a no-nonsense overview of implementing a recurrent neural network (RNN) in TensorFlow. Both theory and practice are covered concisely, and the end result is running TensorFlow RNN code.

on Apr 26, 2017 in Deep Learning, Neural Networks, Recurrent Neural Networks, TensorFlow
The Data Science of Steel, or Data Factory to Help Steel Factory

Applying Machine Learning to steel production is really hard! Here are some lessons from Yandex researchers on how to balance the need for findings to be accurate, useful, and understandable at the same time.

on Apr 25, 2017 in Applications, Recommendation Engine, Regression, Russia, Steel, Yandex
AI & Machine Learning Black Boxes: The Need for Transparency and Accountability

When something goes wrong, as it inevitably does, it can be a daunting task discovering the behavior that caused an event that is locked away inside a black box where discoverability is virtually impossible.

on Apr 25, 2017 in AI, Machine Learning, Transparency
Must-Know: When can parallelism make your algorithms run faster? When could it make your algorithms run slower?

Efficient implementation is key to achieving the benefits of parallelization, even though parallelism is a good idea when the task can be divided into sub-tasks that can be executed independent of each other without communication or shared resources.

on Apr 25, 2017 in Interview Questions, Parallelism
Cartoon: the distance between Espresso and Cappuccino

This cartoon takes a vector space approach to your favorite drinks and examines the distance between Espresso and Cappuccino. Warning: this is only funny to Data Scientists and mathematicians.

on Apr 22, 2017 in Cartoon, Coffee, Humor, word2vec
Difference Between Big Data and Internet of Things

If you cannot manage real-time streaming data and make real-time analytics and real-time decisions at the edge, then you are not doing IOT or IOT analytics, in my humble opinion. So what is required to support these IOT data management and analytic requirements?

on Apr 21, 2017 in Big Data, Internet of Things, IoT
Awesome Deep Learning: Most Cited Deep Learning Papers

This post introduces a curated list of the most cited deep learning papers (since 2012), provides the inclusion criteria, shares a few entry examples, and points to the full listing for those interested in investigating further.

on Apr 21, 2017 in Deep Learning, Neural Networks, Research
Dataiku: The Complete Data Sheet

Whether your every day tool is Scala, Python, R, or Excel, you can now use one tool - Dataiku - to transform raw data to predictions without the hassle. Discover the platform!

on Apr 20, 2017 in Automated Data Science, Data Science Platform, Data Workflow, Dataiku
The Value of Exploratory Data Analysis

In this post, we will give a high level overview of what exploratory data analysis (EDA) typically entails and then describe three of the major ways EDA is critical to successfully model and interpret its results.

on Apr 20, 2017 in Data Analysis, Data Exploration, Data Visualization, SVDS
How to Lie with Data

We expect data scientists to be objective, but intentionally or not, they can produce results that mislead. We examine three common types of “lies” that Data Scientists should be aware of.

on Apr 20, 2017 in Confirmation Bias, Data Visualization, Mistakes, Overfitting
Data Science for the Layman (No Math Added)

Written for the layman, this book is a practical yet gentle introduction to data science. Discover key concepts behind more than 10 classic algorithms, explained with real-world examples and intuitive visuals.

on Apr 20, 2017 in Book, Data Science, Machine Learning, Tutorial
How Big Data Helps Today’s Airlines Operate

Companies all over the world have placed a lot of value on getting more insights from big data analytics. That’s not without good reason.

on Apr 19, 2017 in Airlines, Big Data
E-learning courses on Advanced Analytics, Credit Risk Modeling, and Fraud Analytics

These online courses, developed by Prof. Bart Baesens and SAS, include videos, case studies, quizzes, and focus on focusses on the concepts and modeling methodologies and not on specific software.

on Apr 18, 2017 in Advanced Analytics, Bart Baesens, Credit Risk, Fraud analytics, Online Education, SAS
The dynamics between AI and IoT

We see the need for a new type of Engineer who will combine knowledge from Electronics & IoT with Machine learning, AI, Robotics, Cloud and Data management (devops).

on Apr 18, 2017 in AI, Cloud Computing, Data Management, DevOps, Engineer, IoT, Robots
Time Series Analysis with Generalized Additive Models

In this tutorial, we will see an example of how a Generative Additive Model (GAM) is used, learn how functions in a GAM are identified through backfitting, and learn how to validate a time series model.

on Apr 18, 2017 in Temporal Data, Time Series
Must-Know: What is the curse of dimensionality?

What is the curse of dimensionality? This post gives a no-nonsense overview of the concept, plain and simple.

on Apr 18, 2017 in Dimensionality Reduction, High-dimensional, Interview Questions
More Deep Learning “Magic”: Paintings to photos, horses to zebras, and more amazing image-to-image translation

This is an introduction to recent research which presents an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.

on Apr 17, 2017 in Deep Learning, Generative Adversarial Network, Generative Models, Torch
Cartoon: Taxes, Artificial Intelligence, and Humans

In honor of Tax Day, new KDnuggets Cartoon looks at an unexpected white-collar job that may resist automation and Machine Learning.

on Apr 15, 2017 in AI, Artificial Intelligence, Cartoon, Fraud Detection, Humans, Taxes
What Makes a Good Analyst?

Without doubt, critical thinking is necessary in order to be a good analyst but particular skills and experience are also required. What are some of these skills?

on Apr 14, 2017 in Analyst, Science
Is Blockchain the Ultimate Enabler of Data Monetization?

Is blockchain the ultimate enabler of data and analytics monetization; creating marketplaces where companies, individuals and even smart entities (cars, trucks, building, airports, malls) can share/sell/trade/barter their data and analytic insights directly with others?

on Apr 14, 2017 in Blockchain, Data Monetization, Monetizing
Forrester vs Gartner on Data Science Platforms and Machine Learning Solutions

Who leads in Data Science, Machine Learning, and Predictive Analytics? We compare the latest Forrester and Gartner reports for this industry for 2017 Q1, identify gainers and losers, and strong leaders vs contenders.

on Apr 14, 2017 in Data Science Platform, Forrester, Gartner, IBM, Knime, Machine Learning, Mike Gualtieri, Predictive Analytics, RapidMiner, SAS
Top mistakes data scientists make when dealing with business people

There are no cover articles praising the fails of the many data scientists that don’t live up to the hype. Here we examine 3 typical mistakes and how to avoid them.

on Apr 13, 2017 in Business, Data Scientist, Mistakes, Skills
5 Machine Learning Projects You Can No Longer Overlook, April

It's about that time again... 5 more machine learning or machine learning-related projects you may not yet have heard of, but may want to consider checking out. Find tools for data exploration, topic modeling, high-level APIs, and feature selection herein.

on Apr 13, 2017 in Data Exploration, Deep Learning, Java, Machine Learning, Neural Networks, Overlook, Python, Scala, scikit-learn, Topic Modeling
Machine Learning Finds “Fake News” with 88% Accuracy

In this post, the author assembles a dataset of fake and real news and employs a Naive Bayes classifier in order to create a model to classify an article as fake or real based on its words and phrases.

on Apr 12, 2017 in Data Science, Fake News, Machine Learning, Naive Bayes, Politics, Text Analytics
Anonymization and the Future of Data Science

This post walks the reader through a real-world example of a "linkage" attack to demonstrate the limits of data anonymization. New privacy regulation, most notably the GDPR, are making it increasingly difficult to maintain a balance between privacy and utility.

on Apr 11, 2017 in Big Data Privacy, Data Science, Law, Privacy
The Evolution of a Productive Data Team

Successful data teams at companies of any size are able to produce results because they develop gradually through a series of stages and acquire skills along the way that help them stay efficient and effective.

on Apr 11, 2017 in Data Science Team, Dataiku
Must-Know: How to evaluate a binary classifier

Binary classification is a basic concept which involves classifying the data into two groups. Read on for some additional insight and approaches.

on Apr 11, 2017 in Classifier, Interview Questions, Machine Learning
New Poll: What data types you analyzed?

New KDnuggets Poll is asking: What data types you analyzed in the past 12 months? Please vote.

on Apr 11, 2017 in Data types, Poll
10 Free Must-Read Books for Machine Learning and Data Science

Spring. Rejuvenation. Rebirth. Everything’s blooming. And, of course, people want free ebooks. With that in mind, here's a list of 10 free machine learning and data science titles to get your spring reading started right.

on Apr 10, 2017 in Books, Data Science, ebook, Free ebook, Machine Learning
The 42 V’s of Big Data and Data Science

It's 2017 now, and we now operate in an ever more sophisticated world of analytics. To keep up with the times, we present our updated 2017 list: The 42 V's of Big Data and Data Science.

on Apr 7, 2017 in 3Vs of Big Data, Humor
A Brief History of Artificial Intelligence

This post is a brief outline of what happened in artificial intelligence in the last 60 years. A great place to start or brush up on your history.

on Apr 7, 2017 in AI, Artificial Intelligence, History, ImageNet
Stuff Happens: A Statistical Guide to the “Impossible”

Why are some people struck by lightning multiple times or, more encouragingly, how could anyone possibly win the lottery more than once? The odds against these sorts of things are enormous.

on Apr 6, 2017 in Probability, Statistics
How to stay out of analytic rabbit holes: avoiding investigation loops and their traps

Data scientists tend to think that their main job is to answer complex questions and gain in-depth insights, bu in reality it is all about solving problems – and the only way to solve a problem is to act on it.

on Apr 6, 2017 in Data Science, Methodology, Skills
Top 20 Recent Research Papers on Machine Learning and Deep Learning

Machine learning and Deep Learning research advances are transforming our technology. Here are the 20 most important (most-cited) scientific papers that have been published since 2014, starting with "Dropout: a simple way to prevent neural networks from overfitting".

By Thuy T. Pham on Apr 6, 2017 in Deep Learning, Machine Learning, Research, Top list, Yoshua Bengio
Putting Together A Full-Blooded AI Maturity Model

Here is a proposed “7A” model that is useful enough to capture of the core of what AI offers without falsely implying there is a static body of best practices in this area.

on Apr 5, 2017 in AI, Bernard Marr, Maturity Model, Methodology, Mike Gualtieri
Top /r/MachineLearning Posts, March: A Super Harsh Guide to Machine Learning; Is it Gaggle or Koogle?!?

A Super Harsh Guide to Machine Learning; Google is acquiring data science community Kaggle; Suggestion by Salesforce chief data scientist; Andrew Ng resigning from Baidu; Distill: An Interactive, Visual Journal for Machine Learning Research

on Apr 4, 2017 in Advice, Andrew Ng, Distill, Google, Kaggle, Machine Learning, Reddit, Salesforce
Must-Know: Why it may be better to have fewer predictors in Machine Learning models?

There are a few reasons why it might be a better idea to have fewer predictor variables rather than having many of them. Read on to find out more.

on Apr 4, 2017 in Feature Selection, Interview Questions, Machine Learning, Modeling
Introduction to Anomaly Detection

This overview will cover several methods of detecting anomalies, as well as how to build a detector in Python using simple moving average (SMA) or low-pass filter.

on Apr 3, 2017 in Anomaly Detection, Datascience.com, Python, Time Series
What is AI? Ingredients for Intelligence

This introductory overview of artificial intelligence acts as a layman's guide what AI is, and what it is made up of.

on Apr 3, 2017 in AI, GRAKN.AI, Machine Intelligence, Turing Test

2017 Apr

Latest Posts

Top Posts