Every time someone runs a correlation coefficient on two time series, an angel loses their wings
We all know correlation doesn’t equal causality at this point, but when working with time series data, correlation can lead you to come to the wrong conclusion.
By Jodi Blomberg, Charles Schwab
We all know correlation doesn’t equal causality at this point, but when working with time series data, correlation can be very misleading.
Let’s start with a simple example where you have only two data series, collected over time. This is hypothetical data, but let’s say Series 1 is the Dow Jones Industrial Average (DJIA) and Series 2 is the number of clicks to your blog.
The eyeball test makes you think they are correlated, right? A simple Pearson correlation coefficient of these two series is a respectable .96. You might like to think if DJIA keeps rising, your clicks will rise.
If this were real data, I’d bet strongly against that theory. Why? The underlying upward trend is hiding the real story.
Let’s look at the first difference of both of the series instead. It’s easy to calculate, it’s just yt – yt-1 or in this example: today’s DJIA less yesterday’s DJIA. Intuitively, it’s the day over day change.
Turns out the day over day change is inversely correlated. In fact, I created this data to be perfectly inversely correlated on the first differences. When the DJIA goes up on a given day.. clicks to your blog go down by the same amount. Still betting that your clicks will rise if the DJIA rises? I’m guessing not. Something else is driving the general increase in clicks.
Not all data is so simply related, but it’s still a good cautionary tale. First differences aren’t the only way to account for this, but they certainly are an easy check.
Bio: Jodi Blomberg is data scientist with over 15 years of experience in managing data science teams through the full lifecycle from problem scoping to model deployment. She thinks deploying data science models into real life solutions is often harder than finding novel solutions to hard problems, but just as much fun. She is currently the Managing Director of Analytic Strategy at Charles Schwab and previously was at the Advanced Analytic Lab at SAS. You can find her on twitter @jablomberg and writing at datarevolution.me.
- 5 Things You Need To Know About Data Science
- Modelling Time Series Processes using GARCH
- A Better Stats 101