Causation in a Nutshell

Every move we make, every breath we take, and every heartbeat is an effect that is caused. Even apparent randomness may just be something we cannot explain.

By Kevin Gray, Cannon Gray on July 20, 2018 in Causality, Causation, Statistics

comments

Credit: xkcd

Knowing the who, what, when, where, etc., is vital in marketing. Predictive analytics can also be useful for many organizations. However, also knowing the why helps us better understand the who, what, when, where, and so on, and the ways they are tied together. It also helps us predict them more accurately. Knowing the why increases their value to marketers and increases the value of marketing.

Analysis of causation can be challenging, though, and there are differences of opinion among authorities. The statistical orthodoxy is that randomized experiments are the best approach. Experiments in many cases are infeasible or unethical, however. They also can be botched or be so artificial that they do not generalize to real world conditions. They may also fail to replicate. They are not magic.

Non-experimental research may be our only option in many instances. The key distinction between randomized experiments and non-experimental research is that, in experiments, subjects (e.g., consumers) are randomly assigned to treatment conditions (e.g., different versions of a website). Randomization reduces the possibility that the groups were different before the experiment in ways which might bias the results.

In non-experimental research, this random assignment mechanism is absent. Fortunately, various statistical methods have been developed to reduce bias caused by pre-existing differences among groups. This short article summarizes some of the more popular ones.

Causal analysis is part of my work and has been an area of interest to me for many years. Based on my experience, outside reading, and interaction with academics and other researchers, I’d like to offer a few practical tips for marketing researchers.

First, be very clear about objectives and client expectations. Politics can knock out any statistical model in the first round. If the recommendations of academic consultants can be disregarded, so can ours!
Use randomized experiments whenever possible. Design and analysis of experiments has come a long way since the days of R.A. Fisher, but simple designs are often all we’ll need. An excellent reference is Experimental Design: Procedures for the Behavioral Sciences (Kirk).
Most effects have more than one cause, and alternative causal models may fit the data about equally well but suggest different courses of action to decision-makers.
Don’t be fooled by randomness. Stuff Happens explains what I mean by this.
Don’t be fooled by spurious correlations. A correlation between ice cream consumption and sunburn is probably attributable to weather, for instance.
There are also interactions - moderated effects - in which the relationship between two variables depends on a third variable. For example, older consumers may be heavy users among males but light users among females.
Mediation is easy to confuse with moderation, though they are not the same. A mediator variable is influenced by an independent variable and, in turn, affects the dependent variable. Thus, an independent variable may have both direct and indirect effects, depending on the causal model. Mediation is currently a hot topic in the marketing literature.
Relationships between variables are not always linear. Below a certain point, for instance, increases in ad spend may have no impact. Above that point, sales may increase very rapidly until they begin to taper off, with further increases in ad spend having little or no effect. Statistical analyses need to account for patterns such as this.
Cause will normally precede effect, but causation can also be reciprocal. For instance, A influences B, then B influences A. Brand usage and image frequently interact in a similar fashion.
Lagged effects are common in time-series data. The impact of a new ad campaign, for example, may not show up in sales data for several days or weeks. If we analyze data collected across time with methodologies designed for cross-sectional data, we may get a very distorted picture of the marketplace.
There are often distinct causal models at work for different consumer segments. Some of these segments, such as lifestage, may be obvious but others are hidden and must be uncovered through statistical analysis.
Beware of regression to the mean. In key driver analysis, for example, variables scoring very high (or very low) in relative importance will probably rank more towards the middle the next time we conduct the study (though still high or low). Regression to the mean is a statistical phenomenon but researchers often mistakenly attribute its effects solely to marketing activities or market trends.
More data and bigger data has actually made business understanding more important, not less. Correlation has not replaced theory, despite some claims by data scientists.
Use significance testing judiciously. Statistical significance is unrelated to business significance, and a variable's effect size is far more meaningful.
Also, be wary of automated modeling, even if it’s been branded as machine learning or AI. With just a few variables, many causal models are possible and mathematical criteria can seldom determine which is best for decision-makers. As the legendary statistician George Box put it, "Essentially, all models are wrong, but some are useful."
Marketing ROI is often treated as an accounting exercise but is really a form of causal analysis. Econometric methods are generally most suitable because they enable us to examine how inputs (e.g., marketing activity) covary with outputs (e.g., sales) over time.

This short interview with Harvard professor Tyler VanderWeele provides a snapshot of causal analysis. Mastering 'Metrics (Angrist and Pischke) and Observation and Experiment (Rosenbaum) are two comparatively non-technical overviews of the topic.

The Shaddish, Cook and Campbell classic Experimental and Quasi-Experimental Designsis a hard read in places, but I'd recommend it to any marketing researcher. The book’s diagrams of research designs and summaries of their advantages and vulnerabilities are priceless and timeless.

There are also advanced books more appropriate for marketing scientists, such as Causal Inference (Imbens and Rubin), Counterfactuals and Causal Inference (Morgan and Winship), Explanation in Causal Inference (VanderWeele), and Linear Causal Modeling with Structural Equations (Mulaik).

The importance of theory in causal analysis is difficult to overstate. A theory is often tested multiple times and in different ways, and a set of procedures known as meta-analysis is increasingly used to statistically synthesize the results of numerous primary studies.

Causality is finally beginning to receive the attention I feel it has long deserved. Hopefully, it will not turn into another business fad, with fallacies and embellishments going viral on the Internet.

Bio: Kevin Gray is President of Cannon Gray, a marketing science and analytics consultancy. He has more than 30 years’ experience in marketing research with Nielsen, Kantar, McCann and TIAA-CREF. Kevin also co-hosts the audio podcast series MR Realities.

This article was first published in GreenBook in July 9, 2018.

Original. Reposted with permission.

Related:

Causation in a Nutshell

More On This Topic

Latest Posts

Top Posts