Data Analytics for Business Leaders Explained

Learn about a variety of different approaches to data analytics and their advantages and limitations from a business leader's perspective in part 1 of this post on data analytics techniques.

By Alex Jones, Sept 2014.

Gartner's 2014 Hype Cycle (below) shows the relative expectations of various technologies, including Big Data, Data Science, In-Memory Databases, and Prescriptive Analytics. In my mind, this illustrates it's time to stop tossing around buzz-words and start realizing value.

To give some perspective, it is important to realize that analytics is an evolution of skills and capabilities. Consider where you would put your organization.

With that little bit of context, let's take a moment to cut the buzz-words and get into the nuts and bolts of data science techniques.

This particular post is one that I have held in draft for some time now, simply because generalizing the complexities of mathematical models, computational efficiency, and sophisticated techniques is most definitely going to lose some accuracy. With that, this isn't meant to teach or to guide a data scientist, it is meant to help business leaders understand the analytics opportunity and techniques. All the while, providing a reference for Data Scientists and technical leaders to use as they try to distill immensely complex subject areas into comprehensible bite-size pieces.

Let's begin!
Linear Programming & NonLinear Programming:

Example: Solver/ SolverTable within Excel

Linear programming is an optimization method that allows users to maximize (or minimize) an objective function (a metric defined by an equation). In the graph below, the objective function is to maximize profits, given the trade off of manufacturing tables or chairs within a certain number of production hours. Although the example below is quite rudimentary, linear programming allows for many constraints/ factors and is incredibly fast, because it simply draws a number of "lines" to represent each constraint (green lines) and then identifies the peak "feasible" point (the optimal).

As you might expect, non-linear programming doesn't require linear constraints. However, non-linear is known to be much more computationally challenging as the program runs through each potential point (or uses an approximation parameter/ gradient).

Non Linear

Limitations: Requires users to "know" the constraints, influential variables and their relative impacts.

Monte Carlo Simulations:
An engineering and MBA classic, Monte Carlo simulations allow for users to designate randomization functions and distributions to represent unknowns. This is used to simulate problems that are not deterministic (meaning that they can't be solved directly). This ultimately estimates both a cone of uncertainty and the most-probable outcome.

Since Monte Carlo runs simulations based on estimates and user defined inputs, the output has ample opportunity for human error. Monte Carlo simulations can get pretty darn complex, but don't mistake complexity for accuracy. Instead, with Monte Carlo we must apply a veil of reason and constantly work to eliminate error by testing and benchmark against real world outcomes. Furthermore, whenever possible we should derive our estimations from historical data.

With that said, there are certain realms where Monte Carlo is phenomenal and best suited. As with any technique, simply urging caution. Take a look at some of Monte Carlo's variations-- such as Markov Chains, but that's for another post.

Example: StatTools
The classic. Good ole regression. Fitting a line to a set of points (ax+b=y). Regression provides insights into the relative importance of variables and the drivers of a given outcome. Today, regression takes many forms linear, logistic, polynomial, MARS, etc. One of the major differences is the "loss function." Most people are familiar with SSE, the sum of squared error. However, there are many more exciting options! Below is an image of a few of the flavors.

The key limitations are the input data being independent, well chosen, and interpreting the output. Regression can be deceptively confidence boosting particularly on large datasets. If you're feeling extra nerdy, check out this article on the limitations of p value (shocking and saddening, I know).

Decision Trees
Example: Numerous
Decision trees are easy to interpret and often output a great visual. Decision trees work well in situations where they are predicting a binary outcome. For instance, buy or not buy (1=buy, 0=not buy) based on certain characteristics of a consumer/ customer. As we progress, the examples I use will focus on marketing because it is relatively easy to follow, however, these models are all greedy data-mongers, they don't care what functional area or industry the data comes from! Below is an elementary example of a decision tree.

Decision trees aren't always good with datasets that are dynamically changing. In other words,when what's happening or going to happen doesn't match what happened in the past. Also, they have a tendency to "overfit" the data. That's where your data scientists come in, they're well aware of these problems and are able to "tune", adjust, reconfigure, and test against a holdout dataset.

What's a holdout set? Great question! Essentially, by randomly splitting the data or using cross validation, analysts can build a model with one set of data and then get the accuracy stats with another set.

Another concern is that interpretation is limited because variables exist at different "steps" in the decision tree and errors propagate forward. In other words, mistakes made at the beginning can impact the entire model!

You're still reading?! I'm impressed.

Example: knn package in R

Although there are a ton of classification algorithms, we'll focus on K-Nearest Neighbor, simply as a means to convey the logic. Let's say we have a dataset of buyers and non-buyers with lots of characteristic columns (things like age, gender, income, etc).

Technically speaking, it would be more accurate to describe our data as- input/ training class labeled vectors in a multi-dimensional feature space-- but life's too short for that many two-dollar words in one sentence.

We'll stick with Buyers and Non-Buyers with lots of columns. So let's say we have a new list of "prospects" and we have the columns of characteristics but we don't know if they'll become customers. Well, KNN can help predict! In the visual below, let's say that we have customers-- blue squares and non customers-- red triangles.

Then along comes "Green-dot-man". Will he be a customer or not? Well, in this case, it depends on a few things as to what we would predict.

First, how big is "K", in other words, how any nearby points are we going to consider? If we look at K=3, then we would look at the points inside the solid-line circle and see there's 2 red/non-customers and 1 blue/customer, so we'd predict green-dot man is a non-customer. However, if we look at K=5, we'd look within the dotted line circle and find 3 blue/ customers, 2 red/ non-customers.

What can we do? Well, we could weight by distance. In other words, we could say, let's consider those points that are closest (aka most like Green-man) more than the points further away. In that case, it would likely be a toss-up. However that is informative too! As our model would give us a "probability" of being a customer. For things like mail campaigns that is highly relevant!

So when is this a good option? Well, let's think about Amazon for a second. Currently, Amazon recommends products that are "associated" with the product you are looking at or your browsing history. However, that's a pretty loose model.

Rather, a K-Nearest Neighbor model is likely to find that handful of weirdos just like you, those guys that also buy red silk suspenders, rent movies at 9pm on Friday nights, look at pocket protectors that are dishwasher safe, and write shamelessly about their adventures following the purchase of 3 wolves t-shirt. Those recommendations will drive tons of sales! Talk about product discovery!

The true downside is that KNN calculates the distance between each Green-dot-man (new point) and every other point. That's a lot of math. Fortunately, there are some binning, parallelization and generalization strategies that can speed up the process.

Here is the original post.

Alex Jones is a Graduate Student at U. Texas McCombs School of Business.