Topics: Coronavirus | AI | Data Science | Deep Learning | Machine Learning | Python | R | Statistics

KDnuggets Home » News » 2017 » Jan » Tutorials, Overviews » Introduction to Forecasting with ARIMA in R ( 17:n02 )

# Introduction to Forecasting with ARIMA in R

http likes 218

ARIMA models are a popular and flexible class of forecasting model that utilize historical information to make predictions. In this tutorial, we walk through an example of examining time series for demand at a bike-sharing service, fitting an ARIMA model, and creating a basic forecast.

Prerequisites: Previous knowledge of forecasting is not required, but the reader should be familiar with basic data analysis and statistics (e.g., averages, correlation). To follow the example, the reader should also be familiar with R syntax. R packages needed: forecast, tseries, ggplot2.The sample dataset can be downloaded here. ### Introduction to Time Series Forecasting

This tutorial will provide a step-by-step guide for fitting an ARIMA model using R. ARIMA models are a popular and flexible class of forecasting model that utilize historical information to make predictions. This type of model is a basic forecasting technique that can be used as a foundation for more complex models. In this tutorial, we walk through an example of examining time series for demand at a bike-sharing service, fitting an ARIMA model, and creating a basic forecast. We also provide a checklist for basic ARIMA modeling to be used as a loose guide.

Time series analysis can be used in a multitude of business applications for forecasting a quantity into the future and explaining its historical patterns. Here are just a few examples of possible use cases:

• Explaining seasonal patterns in sales
• Predicting the expected number of incoming or churning customers
• Estimating the effect of a newly launched product on number of sold units
• Detecting unusual events and estimating the magnitude of their effect

### Objectives

At the end of this tutorial, the reader can expect to learn how to:

• Plot, examine, and prepare series for modeling
• Extract the seasonality component from the time series
• Test for stationarity and apply appropriate transformations
• Choose the order of an ARIMA model
• Forecast the series

Readers can use the following ARIMA cheat sheet as an outline of this tutorial and general guidance when fitting these types of models:

• Plot the data and examine its patterns and irregularities
• Clean up any outliers or missing values if needed
• tsclean() is a convenient method for outlier removal and inputing missing values
• Take a logarithm of a series to help stabilize a strong growth trend
• Does the series appear to have trends or seasonality?
• Use decompose() or stl() to examine and possibly remove components of the series
3. Stationarity
• Is the series stationary?
• Use adf.test(), ACF, PACF plots to determine order of differencing needed
4. Autocorrelations and choosing model order
• Choose order of the ARIMA by examining ACF and PACF plots
5. Fit an ARIMA model
6. Evaluate and iterate
• Check residuals, which should haven no patterns and be normally distributed
• If there are visible patterns or bias, plot ACF/PACF. Are any additional order parameters needed?
• Refit model if needed. Compare model errors and fit criteria such as AIC or BIC.
• Calculate forecast using the chosen model

### A Short Introduction to ARIMA

ARIMA stands for auto-regressive integrated moving average and is specified by these three order parameters: (p, d, q).
The process of fitting an ARIMA model is sometimes referred to as the Box-Jenkins method.

An auto regressive (AR(p)) component is referring to the use of past values in the regression equation for the series Y. The auto-regressive parameterspecifies the number of lags used in the model. For example, AR(2) or, equivalently, ARIMA(2,0,0), is represented as where φ1φ2 are parameters for the model.

The d represents the degree of differencing in the integrated (I(d)) component. Differencing a series involves simply subtracting its current and previous values d times. Often, differencing is used to stabilize the series when the stationarity assumption is not met, which we will discuss below. Get KDnuggets, a leading newsletter on AI, Data Science, and Machine Learning