# Modelling Time Series Processes using GARCH

To go into the turbulent seas of volatile data and analyze it in a time changing setting, ARCH models were developed.

By Perceptive Analytics

Marching towards the ARCH and GARCH

When techniques like linear regression or time series were aimed at modelling the general trend exhibited by a set or series of data points, data scientists faced another question - though these models can capture the overall trend but how can one model the volatility in the data? In real life, the initial stages in a business or a new market are always volatile and changing with a high velocity until things calm down and become saturated. It is then one can apply the statistical techniques such as time series analysis or regression as the case may be. To go into the turbulent seas of volatile data and analyze it in a time changing setting, ARCH models were developed.

ARCH - Autoregressive Conditional Heteroskedasticity

As I already mentioned, ARCH is a statistical model for time series data. The proxy for volatility used by ARCH is variance (or standard deviation). The approach measures the variance of the error term with time. In modelling using ARCH methodology, the error term is assumed to follow an AR (Autoregressive) model. This means that the error terms cannot have a Moving Average (MA) component. If they possess both AR and MA components, that is, if they follow an ARMA model, we use GARCH (Generalized ARCH) model for the terms. GARCH models are useful for modelling market data such as stock markets and other financial instruments. Let’s learn a few more interesting peculiarities about volatility

It all starts with clustering

When we are looking at variance of error terms, there can be a lot of patterns and one of the most common among them is a repetitive one. Volatility clustering, as it is called is a pattern which comes from clustered volatility periods or, in other words, repeating pattern of high and low volatility periods. GARCH models quite suitably capture volatility clustering trends in data. One needs to remember here that whether ARCH and GARCH are applied, they do not explain trends in error terms but only capture them. This also means that GARCH is more focussed on the occurrence of spikes and troughs than their level. You can know when we can witness a possible decline or steep rise but should not rely on how much will that change be. Naturally, such a problem requires a lot of data. We’re talking about tens of thousands of observations just to model the peaks.

Since GARCH is based on ARMA modelling, we use the GARCH(p,q) notation to indicate the AR and MA components. One of the most popular GARCH models is the GARCH(1,1) model. The exact values of p and q are then estimated using maximum likelihood. However, we do not generally depend on the assumption of normality of data rather, we use t- distribution which fits long tailed distributions. Other long tailed distributions are also suitable and can be used.

To test the goodness of fit, we usually check autocorrelation in squared standardized residuals. A robust test for this is the Ljung-Box test which calculates the Ljung-Box statistic and p-values. Another thing of interest in GARCH models is its persistence. It indicates how fast the volatile spikes decay after a shock and stabilize. In the typical GARCH(1,1) model, the key statistics is the sum of the two parameters commonly denoted as alpha1 and beta1. If the sum is greater than 1 then it means that the volatility will increase and explode instead of decay which is hardly the situation. A value exactly equal to 1 means an exponential decay model. In real life, most GARCH models have the sum less than 1.

We can also transform the persistence in terms of half-life. We know the half-life is the time in which half of the volatility decays. Hence, we use the log notation:

half life = log(0.5)/log(alpha1 + beta1)

Since log (1) = 0, if sum of alpha1 and beta1 is exactly equal to 1, the half life becomes infinite. What does it mean? Persistence and half life are derived from training data. If there is a trend in the volatility of the data in training data, then the estimator may be mistakenly calculate an infinite half life based on when it ends. This is another reason why we need tens of thousand of data points for modelling GARCH as a smaller sample will result in higher possibility of errors. These parameter estimates are very important as they are used to make predictions in test data and needs to be checked after model fitting.

All these may be a bit hard to digest. Let’s understand more concepts using a practical implementation in R.

Implementation in R

There are a lot of garch packages since GARCH models are further specialized in many variations. It is difficult to understand and explain all of them. However, we will go through one of the most popular GARCH packages - fGarch. We will also use the package Ecdat for the Garch dataset. The package contains Garch data set Daily Observations on Exchange Rates of the US Dollar Against Other Currencies from 1 Jan, 1980 to 21 May, 1987, which is a sum total of 1867 observations.

#Install the Ecdat package
install.packages("Ecdat")
library(Ecdat)
mydata=Garch
#Look at the dataset
str(mydata)
'data.frame': 1867 obs. of 8 variables:
\$ date: int 800102 800103 800104 800107 800108 800109 800110 800111 800114 800115 ...
\$ day : chr "wednesday" "thursday" "friday" "monday" ...
\$ dm : num 0.586 0.584 0.584 0.585 0.582 ...
\$ ddm : num NA -0.004103 0.000856 0.001881 -0.004967 ...
\$ bp : num 2.25 2.24 2.24 2.26 2.26 ...
\$ cd : num 0.855 0.855 0.857 0.854 0.855 ...
\$ dy : num 0.00421 0.00419 0.00427 0.00432 0.00426 ...
\$ sf : num 0.636 0.636 0.635 0.637 0.633 ...

We notice that the data types are a bit mismatched. We need to convert date to date format and day to factor before proceeding further. The rest of the features are exchange rates and are in correct format

#Correct the data types of date and day
#Correcting date fixes it to some arbitrary date such that the trend is same but the mapping is different
mydata\$date=as.Date(mydata\$date, origin = "01-02-1980")
mydata\$day=as.factor(mydata\$day)

Let’s include the other packages. We will use fGarch function to perform our analysis

install.packages("tseries")
install.packages("urca")
install.packages("fUnitRoots")
install.packages("forecast")
install.packages("fGarch")
library(fGarch) # estimate GARCH and Forecast
library(tseries) #used for time series data
library(urca) #Used for checking Unit root Cointegration
library(fUnitRoots) #Used for conducting unit root test
library(forecast) #Used for forecasting ARIMA model

Let’s convert the dataset into a time series now

#Converting Dollar - Deutsche mark exchange rate to time series
exchange_rate_dollar_deutsch_mark <- ts(mydata\$dm, start=c(1980, 1), end=c(1987, 5), frequency=266) #Plot the time series plot.ts(exchange_rate_dollar_deutsch_mark, main="exchange_rate_dollar_deutsch_mark")

We have a lot of small variations across the years as visible from the plot. The next step is to start processing the data. For this, we take the difference of the values. Though we already have the ddm column which provides us the difference, I am calculating the difference separately as the log of the exchange rate and then multiplying it with 100 as it serves as a better representation of the variation. Remember, in economic terms, the difference of the exchange rates is also represented by inflation/deflation as the case may be.

#Calculate inflation as difference of log of exchange rate and then multiplied by 100
inflation_series<-(diff(log(exchange_rate_dollar_deutsch_mark)))*100 #Plot the inflation plot.ts(inflation_series, main="Inflation of exchange rate")

This is the inflation residual on which represents the variability in the original time series. There is a continuous variation without a definite trend or pattern. It even has some spikes such as the one between the years 1985 and 1986 of about 5.5. This is the series which can be adequately captured by using a GARCH model. To make things more clear, we will also see the summary statistics of the inflation series.

summary(inflation_series)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.822000 -0.451700 -0.026770 -0.002183 0.428900 5.502000