2017 Feb

The Top 5 KPIs to Consider When Measuring Your Campaign

When it comes to measuring marketing campaign performance or analysing customers in any business, below top 5 Key Performance Indicators (KPIs) needs to be used to strategically drive the business.

on Feb 28, 2017 in Churn, Customer Analytics, KPI, Metrics, ROI, Social Media
What I Learned Implementing a Classifier from Scratch in Python

In this post, the author implements a machine learning algorithm from scratch, without the use of a library such as scikit-learn, and instead writes all of the code in order to have a working binary classifier algorithm.

on Feb 28, 2017 in Classification, Machine Learning, Perceptron, Python, Sebastian Raschka
The Human Data Scientist: Safeguarding Your Career in the World of Automation

"Data scientist" continues to be recognized as a top career, but does this mean unending spoils for the data scientist? With large scale mass automation on the horizon for numerous professions, what can we do to safeguard our positions?

on Feb 28, 2017 in Advice, Automated Data Science, Career, Data Scientist
An Overview of Python Deep Learning Frameworks

Read this concise overview of leading Python deep learning frameworks, including Theano, Lasagne, Blocks, TensorFlow, Keras, MXNet, and PyTorch.

on Feb 27, 2017 in Deep Learning, Keras, Neural Networks, Python, TensorFlow, Theano, Torch
Moving from R to Python: The Libraries You Need to Know

Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.

on Feb 24, 2017 in Jupyter, Pandas, Programming, Python, R, scikit-learn, Yhat
The Anatomy of Deep Learning Frameworks

This post sketches out some common principles which would help you better understand deep learning frameworks, and provides a guide on how to implement your own deep learning framework as well.

on Feb 24, 2017 in Deep Learning, Neural Networks
Gartner 2017 Magic Quadrant for Data Science Platforms: gainers and losers

We compare Gartner 2017 Magic Quadrant for Data Science Platforms vs its 2016 version and identify notable changes for leaders and challengers, including IBM, SAS, RapidMiner, KNIME, MathWorks, Microsoft, and Quest.

on Feb 23, 2017 in Data Science Platform, Gartner, IBM, Knime, Magic Quadrant, MathWorks, Microsoft, Quest, RapidMiner, SAS
Machine Learning-driven Firewall

Cyber Security is always a hot topic in IT industry and machine learning is making security systems more stronger. Here, a particular use case of machine learning in cyber security is explained in detail.

on Feb 23, 2017 in Firewall, Fsecurify, GitHub, Machine Learning, Security
What is a Support Vector Machine, and Why Would I Use it?

Support Vector Machine has become an extremely popular algorithm. In this post I try to give a simple explanation for how it works and give a few examples using the the Python Scikits libraries.

on Feb 23, 2017 in Python, scikit-learn, Support Vector Machines, SVM, Yhat
Introduction to Correlation

Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.

on Feb 22, 2017 in Beginners, Correlation, Datascience.com, Pandas, Python, Statistics
17 More Must-Know Data Science Interview Questions and Answers, Part 2

The second part of 17 new must-know Data Science Interview questions and answers covers overfitting, ensemble methods, feature selection, ground truth in unsupervised learning, the curse of dimensionality, and parallel algorithms.

on Feb 22, 2017 in Algorithms, Data Science, Ensemble Methods, Feature Engineering, Feature Selection, High-dimensional, Interview Questions, Overfitting, Unsupervised Learning
The Origins of Big Data

Big Data has truly come of age in 2013 when OED introduced the term “Big Data” for the first time. But when was the term Big Data first used and Why? Here are the results of our investigation.

on Feb 21, 2017 in Big Data, Doug Laney, History, Tim O'Reilly
Stacking Models for Improved Predictions

This post presents an example of regression model stacking, and proceeds by using XGBoost, Neural Networks, and Support Vector Regression to predict house prices.

on Feb 21, 2017 in Ensemble Methods, Machine Learning, XGBoost
Causation or Correlation: Explaining Hill Criteria using xkcd

This is an attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures.

on Feb 20, 2017 in Cartoon, Causation, Correlation, Statistics, xkcd
Creativity is Crucial in Data Science

Creativity and Innovation are integral to Data Science and going forward in the world of AI, those are the things that will give edge to the humans over the machines.

on Feb 20, 2017 in Analytics Innovation, Creativity, Data Science, Innovation
Deep Learning, Artificial Intuition and the Quest for AGI

Deep Learning systems exhibit behavior that appears biological despite not being based on biological material. It so happens that humanity has luckily stumbled upon Artificial Intuition in the form of Deep Learning.

on Feb 20, 2017 in AGI, AI, Deep Learning, Machine Intelligence
Webinar: Athena Health “Unbreaks” Health Care by Modernizing their Data Stack, Feb 28

With a new Snowflake data warehouse and Looker data platform on top, data analysts at athenahealth are delivering data to more people, and improving patient experience in the US healthcare system. Register and learn how.

on Feb 17, 2017 in Data Warehouse, Healthcare, Looker
Introduction to Natural Language Processing, Part 1: Lexical Units

This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.

on Feb 16, 2017 in Data Preprocessing, Datascience.com, Feature Extraction, Natural Language Processing, NLP, Tokenization
Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory

Apache Parquet and Apache Arrow both focus on improving performance and efficiency of data analytics. These two projects optimize performance for on disk and in-memory processing

on Feb 16, 2017 in Apache, Apache Arrow, Apache Spark, Data Science, Dremio, In-Memory Computing, Machine Learning, Python
17 More Must-Know Data Science Interview Questions and Answers

17 new must-know Data Science Interview questions and answers include lessons from failure to predict 2016 US Presidential election and Super Bowl LI comeback, understanding bias and variance, why fewer predictors might be better, and how to make a model more robust to outliers.

on Feb 15, 2017 in Anomaly Detection, Bias, Classification, Data Science, Donald Trump, Interview Questions, Outliers, Overfitting, Variance
Web Scraping for Dataset Curation, Part 2: Tidying Craft Beer Data

This is the second part in a 2 part series on curating data from the web. The first part focused on web scraping, while this post details the process of tidying scraped data after the fact.

on Feb 14, 2017 in Beer, Data Curation, Dataset, Python
Cartoon: Perfect Valentine’s Dates Found With Data Analysis

New KDnuggets Cartoon shows one example where perfect Valentine's Dates can be found with scientific data analysis.

on Feb 14, 2017 in Alexa, Cartoon, Cortana, Data Scientist, Siri, Valentine's Day
Career Advice for Analytics & Data Science Professionals

In our experience working with many quantitative professionals over the years, the two main areas that contribute to long-term career growth are networking and continuous learning. Here is specific advice on how to do this and tips for Continuous Learning.

on Feb 13, 2017 in Advice, Analytics, Burtch Works, Career, Data Science, Hiring
Web Scraping for Dataset Curation, Part 1: Collecting Craft Beer Data

This post is the first in a 2 part series on scraping and cleaning data from the web using Python. This first part is concerned with the scraping aspect, while the second part while focus on the cleaning. A concrete example is presented.

on Feb 13, 2017 in Beer, Data Curation, Dataset, Python, Web Scraping
The Data Science of NYC Taxi Trips: An Analysis & Visualization

This post outlines using Google BigQuery for an analysis of NYC Taxi Trips in the cloud, presenting the analysis and visualization in Tableau Public for readers to interact with.

on Feb 10, 2017 in Data Science, Data Visualization, New York City, NY, Tableau, Taxi
Getting Real World Results From Agile Data Science Teams

In this post, I’ll look at the practical ingredients of managing agile data science. By using agile data science methods, we help data teams do fast and directed work, and manage the inherent uncertainty of data science and application development.

on Feb 10, 2017 in Agile, Data Science, Data Science Team, SVDS
Automatically Segmenting Data With Clustering

In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.

on Feb 9, 2017 in Clustering, K-means, Unsupervised Learning
So What is Big Data?

We examine what experts say about Big Data – is it like teenage sex? Is it more than just a large and complex collection of data? And how many Vs are there?

on Feb 9, 2017 in 3Vs of Big Data, Big Data, Forrester, Gartner, IBM, McKinsey, O'Reilly
Quickly tackle unstructured text data

Learn about the new advanced text exploration capabilities available that let you quickly extract insights from text-based data.

on Feb 8, 2017 in Clustering, JMP, Text Analysis, Unstructured data
Making Python Speak SQL with pandasql

Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.

on Feb 8, 2017 in Pandas, Python, SQL, Yhat
Regression Analysis: A Primer

Despite the popularity of Regression, it is also misunderstood. Why? The answer might surprise you: There is no such thing as Regression. Rather, there are a large number of statistical methods that are called Regression, all of which are based on a shared statistical foundation.

on Feb 6, 2017 in Applied Statistics, Linear Regression, Regression
5 Career Paths in Big Data and Data Science, Explained

Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.

on Feb 6, 2017 in Big Data, Career, Data Analyst, Data Engineering, Data Infrastructure, Data Science, Explained, Machine Learning
Provalis Research Releases an Enhanced Qualitative Data Analysis Freeware

Upgraded version of the qualitative analysis freeware QDA Miner Lite now includes a document overview, tree-grid display, image rotation and resizing, importing from PowerPoint and more.

on Feb 3, 2017 in Provalis, Qualitative Analytics, Text Analytics
Top R Packages for Machine Learning

What are the most popular ML packages? Let's look at a ranking based on package downloads and social website activity.

on Feb 3, 2017 in Machine Learning, R, R Packages
Learning to Learn by Gradient Descent by Gradient Descent

What if instead of hand designing an optimising algorithm (function) we learn it instead? That way, by training on the class of problems we’re interested in solving, we can learn an optimum optimiser for the class!

on Feb 2, 2017 in Gradient Descent, Machine Learning, NIPS, Optimization
An ode to the analytics grease monkeys

Analytics is not one time job. It needs to be automated, deployed and improved for future business analytics requirements. Here an IBM expert discusses about development & deployment of analytics assets and capabilities of it.

on Feb 2, 2017 in Analytics, Analytics Leader, CRISP-DM, Deployment, IBM, IBM DSX, ROI
Identifying Variables That Might Be Better Predictors

This blog serves to expand on the approach that the data science team uses to identify (and quantify) which variables and metrics are better predictors of performance.

on Feb 2, 2017 in Data Science, Feature Selection, Prediction, Predictive Analytics
Fixing Deployment and Iteration Problems in CRISP-DM

Many analytic models are not deployed effectively into production while others are not maintained or updated. Applying decision modeling and decision management technology within CRISP-DM addresses this.

on Feb 1, 2017 in Analytics, CRISP-DM, Data Mining, Data Science, Decision Modeling, IIA, Methodology
Is Deep Learning the Silver Bullet?

With nearly every every smart young computer scientist planning to work on deep learning, are there really still artificial intelligence researchers working on other techniques? Is deep learning the AI silver bullet?

on Feb 1, 2017 in AI, Deep Learning, Machine Learning
5 Free Courses for Getting Started in Artificial Intelligence

A carefully-curated list of 5 free collections of university course material to help you better understand the various aspects of what artificial intelligence and skills necessary for moving forward in the field.

on Feb 1, 2017 in AI, Artificial Intelligence, Deep Learning, MIT, Reinforcement Learning, Self-Driving Car, UC Berkeley

2017 Feb

Latest Posts

Top Posts