KDnuggets Home » News » 2017 » Jan » Tutorials, Overviews » Time Series Analysis: A Primer ( 17:n02 )

Gold BlogTime Series Analysis: A Primer


Time series analysis is a complex subject but, in short, when we use our usual cross-sectional techniques such as regression on time series data, variables can appear "more significant" than they really are and we are not taking advantage of the information the serial correlation in the data provides.



Time series

What is a Time Series?

Many data sets are cross-sectional and represent a single slice of time.  However, we also have data collected over many periods - weekly sales data, for instance.  This is an example of time series data.  Time series analysis is a specialized branch of statistics used extensively in fields such as Econometrics and Operations Research. Unfortunately, most Marketing Researchers and Data Scientists still have had little exposure to it. As we'll see, it has many very important applications for marketers.

Just to get our terms straight, below is a simple illustration of what a time series data file looks like.  The column labeled DATE is the date variable and corresponds to a respondent ID in survey research data.  WEEK, the sequence number of each week, is included because using this column rather than the actual dates can make graphs less cluttered.  The sequence number can also serve as a trend variable in certain kinds of time series models.  SALES is the number of packs sold in each week.

Table 1: Example of Time Series Data

I should note that the unit of analysis doesn't have to be brands and can include individual consumers or groups of consumers whose behavior is followed over time.

But first, why do we need to distinguish between cross-sectional and time series analysis?  For several reasons, one being that our research objectives will usually be different.  Another is that most statistical methods we learn in college and make use of in marketing research are intended for cross-sectional data, and if we apply them to time series data the results we obtain may be misleading.  Time is a dimension in the data we need to take into account.

Time series analysis is a complex subject but, in short, when we use our usual cross-sectional techniques such as regression on time series data:

  1. Standard errors can be far off.  More often than not, p-values will be too small and variables can appear "more significant" than they really are;
  2. In some cases regression coefficients can be seriously biased; and
  3. We are not taking advantage of the information the serial correlation in the data provides.

Univariate Analysis

To return to our example data, one objective might be to forecast sales for our brand.  There are many ways to do this and the most straightforward is univariate analysis, in which we essentially extrapolate future data from past data.  Two popular univariate time series methods are Exponential Smoothing (e.g., Holt-Winters) and ARIMA(Autoregressive Integrated Moving Average).  In the example shown in Figure 1, one year (52 weeks) of historical sales data have been used to forecast sales one quarter (12 weeks) ahead with an ARIMA model.

Figure 1: Example of Forecast

Causal Modeling

Obviously, there are risks in assuming the future will be like the past but, fortunately, we can also include "causal" (predictor) variables to help mitigate these risks.  But besides improving the accuracy of our forecasts, another objective may be to understand which marketing activities most influence sales.

Causal variables will typically include data such as GRPs and price and also may incorporate data from consumer surveys or exogenous variables such as GDP.  These kinds of analyses are called Market Response or Marketing Mix modeling and are a central component of ROMI (Return on Marketing Investment) analysis.  They can be thought of as key driver analysis for time series data.  The findings are often used in simulations to try to find the "optimal" marketing mix.

Transfer Function Models, ARMAX and Dynamic Regression are terms that refer to specialized regression procedures developed for time series data.  There are more sophisticated methods, in addition, and I'll touch on a few in just a bit.

Multiple Time Series

You might need to analyze multiple time series simultaneously, e.g., sales of your brands and key competitors.  Figure 2 below is an example and shows weekly sales data for three brands over a one-year period.  Since sales movements of brands competing with each other will typically be correlated over time, it often will make sense, and be more statistically rigorous, to include data for all key brands in one model instead of running separate models for each brand.

Vector Autoregression (VAR), the Vector Error Correction Model (VECM) and the more general State Space framework are three frequently-used approaches to multiple time series analysis.  Causal data can be included and Market Response/Marketing Mix modeling conducted.

Figure 2: Example of Multiple Time Series

Other Methods

There are several additional methods relevant to marketing research and data science I'll now briefly describe.

  • Panel Models include cross sections in a time series analysis. Sales and marketing data for several brands, for instance, can be stacked on top of one another and analyzed simultaneously.  Panel modeling permits category-level analysis and also comes in handy when data are infrequent (e.g., monthly or quarterly).
  • Longitudinal Analysis is a generic and sometimes confusingly-used term that can refer to Panel modeling with a small number of periods ("short panels"), as well as to Repeated Measures, Growth Curve Analysis or Multilevel Analysis.  In a literal sense it subsumes time series analysis but many authorities reserve that term for analysis of data with many time periods (e.g., >25). Structural Equation Modeling (SEM) is one method widely-used in Growth Curve modeling and other longitudinal analyses.
  • Survival Analysis is a branch of statistics for analyzing the expected length of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. It's also called Duration Analysis in Economics and Event History Analysis in Sociology. It is often used in customer churn analysis.
  • In some instances one model will not fit an entire series well because of structural changes within the series, and model parameters will vary across time. There are numerous breakpoint tests and models (e.g., State Space, Switching Regression) available for these circumstances.
  • You may also notice that sales, call center activity or other data series you are tracking exhibit clusters of volatility. That is, there may be periods in which the figures move up and down in much more extreme fashion than other periods. Figure 3 gives an illustration of this kind of pattern.

 Figure 3: Example of Volatile Time Series

  • In these cases, you should consider a class of models with the forbidding name of GARCH (Generalized Autoregressive Conditional Heteroskedasticity).  ARCH and GARCH models were originally developed for financial markets but can used for other kinds of time series data when volatility is of interest.  Volatility can fall into many patterns and, accordingly, there are many flavors of GARCH models.  Causal variables can be included.  There are also multivariate extensions (MGARCH) if you have two or more series you wish to analyze jointly.
  • Non-Parametric Econometrics is a very different approach to studying time series and longitudinal data that is now receiving a lot of attention because of big data and the greater computing power we now enjoy.  These methods are increasingly feasible and useful as alternatives to the more familiar methods such as those described in this article.
  • Machine Learning (e.g., Artificial Neural Networks) is also useful in some circumstances but the results can be hard to interpret - they predict well but may not help us understand the mechanism that generated to data (the Why).  To some extent, this drawback also applies to non-parametric techniques.
  • Most of the methods I've mentioned are Time Domain techniques. Another group of methods known as Frequency Domain, plays a more limited role in Marketing Research.

Further Study

I've barely scratched the surface of a rich and multifaceted set of techniques that are new to most Marketing Researchers and Data Scientists, but increasingly important to our work.  For readers wishing to learn more about these methods, there are now online courses and many excellent introductory textbooks available, as well as those covering specific topics in depth.

Bio: Kevin Gray is president of Cannon Gray, a marketing science and analytics consultancy.

Original. Reposted with permission.

Related: