- Grunion, Query Optimization Tool for Data Science and Big Data - Mar 14, 2017.
Grunion is a patent-pending query optimization, translation, and federation framework built to help bridge the gap between data science and data engineering teams. Read more to request access.
- The Challenges of Building a Predictive Churn Model - Mar 8, 2017.
Unlike other data science problems, there is no one method for predicting which customers are likely to churn in the next month. Here we review the most popular approaches.
- What is Customer Churn Modeling? Why is it valuable? - Mar 1, 2017.
Getting new customers is much more more expensive than retaining existing ones, so reducing churn is a top priority for many firms. Understanding why customers churn and estimating the risks are powerful components of a data-driven retention strategy.
- Introduction to Correlation - Feb 22, 2017.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
- Introduction to Natural Language Processing, Part 1: Lexical Units - Feb 16, 2017.
This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.
- Forrester Study: Companies Using Data Science Platforms Are Surpassing The Competition - Feb 8, 2017.
Companies that regularly exceed shareholder expectations have something in common: 88% of them use a fully functional platform to do data science work. Get the white paper from Forrester to learn more.
- Introduction to Forecasting with ARIMA in R - Jan 16, 2017.
ARIMA models are a popular and flexible class of forecasting model that utilize historical information to make predictions. In this tutorial, we walk through an example of examining time series for demand at a bike-sharing service, fitting an ARIMA model, and creating a basic forecast.
- Creating Data Visualization in Matplotlib - Jan 5, 2017.
Matplotlib is the most widely used data visualization library for Python; it's very powerful, but with a steep learning curve. This overview covers a selection of plots useful for a wide range of data analysis problems and discusses how to best deploy each one so you can tell your data story.
- Introduction to Bayesian Inference - Dec 16, 2016.
Bayesian inference is a powerful toolbox for modeling uncertainty, combining researcher understanding of a problem with data, and providing a quantitative measure of how plausible various facts are. This overview from Datascience.com introduces Bayesian probability and inference in an intuitive way, and provides examples in Python to help get you started.
- Introduction to K-means Clustering: A Tutorial - Dec 9, 2016.
A beginner introduction to the widely-used K-means clustering algorithm, using a delivery fleet data example in Python.