# Fitting a volatility model on stocks

Today a quant posed me a question:

If I had a sorted timeseries, how would I know if it was ordered correctly? What if it's in reverse?

After having an interesting conversation about how I would solve the issue, he infomed me that a straightforward way was to fit a GARCH model, and that the model fit would be much higher if the timeseries was sorted in the right direction.

So I thought to myself, I'm going to try it out and see if this is true as I personally was not convinced. This code uses the ARCH package written by Prof. Kevin Sheppard from Oxford.

A brief introduction to GARCH follows. A simple regression model can be defined as followws

$$r_{t} = m_{t} + \sqrt{h_{t}} \epsilon_{t}$$

We can see that the variance of the residuals are being explicitly modeled using a GARCH model as below:

$$h_{t+1} = \omega + \alpha(r_{t} - m_{t})^2 + \beta h_{t} = \omega + \alpha h_{t} \epsilon^{2}_{t} + \beta h_{t}$$

THe intuition behind the GARCH model is fairly simple. The model itself is asserts that the best predictor of variance/volatility in the next period is the weighted average of the following:

1. Long-run average variance. ($\omega$)
2. Variance predicted for this period. ($h_{t}$)
3. New information in this period that is captured by the most recent squared residual. ($h_{t} \epsilon^{2}_{t}$)

Thus, the weights that need to be estimated are $\omega$, $\alpha$, and $\beta$, and the inputs are the previous forecast ($h$) and the residual ($\epsilon$). The long-run average variance is given by $\sqrt{\omega/(1-\alpha-\beta)}$.

#### Importing all necessary modules¶

In [35]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
import os

from arch import arch_model
from statsmodels.tsa.stattools import acf, pacf


In [36]:
dir = os.getcwd() + '/inputs/'
filename = 'stk_GOOG.csv'

stk_df.info()

/home/randlow/github/blog/content/articles/Econometrics/inputs/stk_GOOG.csv
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1931 entries, 2010-12-31 to 2018-08-31
Data columns (total 6 columns):
Open         1931 non-null float64
High         1931 non-null float64
Low          1931 non-null float64
Close        1931 non-null float64
Volume       1931 non-null int64
dtypes: float64(5), int64(1)
memory usage: 105.6 KB

Out[36]:
Open High Low Close Adj Close Volume
Date
2010-12-31 296.441925 297.276489 294.102142 295.065887 295.065887 3098500
2011-03-01 296.312775 300.838348 296.312775 300.222351 300.222351 4761100
2011-04-01 300.853241 301.131439 298.121002 299.114563 299.114563 3672700
2011-05-01 298.096161 303.193024 298.086243 302.567078 302.567078 5097500
2011-06-01 303.366882 307.216858 303.053925 304.767792 304.767792 4142300

#### Converting prices to returns¶

In [37]:
stk_price = stk_df['Adj Close']
stk_ret = stk_price.pct_change().dropna()

Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8780606080>

#### Calculating ACF¶

In [44]:
stk_ret_acf_1 =  acf(stk_ret)[1:32]
stk_ret_acf_2 = [stk_ret.autocorr(i) for i in range(1,32)]

test_df = pd.DataFrame([stk_ret_acf_1, stk_ret_acf_2]).T
test_df.columns = ['Pandas Autocorr', 'Statsmodels Autocorr']
test_df.index += 1
test_df.plot(kind='bar')

Out[44]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8780168710>

#### Fitting a GARCH model¶

In [54]:
am = arch_model(stk_ret)
res = am.fit(update_freq=5)
print(res.summary())

Optimization terminated successfully.    (Exit mode 0)
Current function value: -5454.19535778409
Iterations: 4
Function evaluations: 39
Constant Mean - GARCH Model Results
==============================================================================
Dep. Variable:              Adj Close   R-squared:                      -0.000
Mean Model:             Constant Mean   Adj. R-squared:                 -0.000
Vol Model:                      GARCH   Log-Likelihood:                5454.20
Distribution:                  Normal   AIC:                          -10900.4
Method:            Maximum Likelihood   BIC:                          -10878.1
No. Observations:                 1930
Date:                Wed, Dec 05 2018   Df Residuals:                     1926
Time:                        05:33:53   Df Model:                            4
Mean Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
mu         8.8455e-04  3.710e-04      2.384  1.710e-02 [1.575e-04,1.612e-03]
Volatility Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
omega      6.6146e-05  8.934e-06      7.404  1.319e-13 [4.864e-05,8.366e-05]
alpha[1]       0.2000  7.609e-02      2.628  8.579e-03   [5.086e-02,  0.349]
beta[1]        0.5000  7.239e-02      6.907  4.945e-12     [  0.358,  0.642]
============================================================================

Covariance estimator: robust


#### Reversing the time-series¶

In [40]:
stk_ret_reverse = stk_ret.iloc[::-1]
plt.plot(stk_ret_reverse.values)

Out[40]:
Text(0.5,1,'Google Returns (Reversed)')
In [45]:
stk_ret_acf_1_rev =  acf(stk_ret_reverse)[1:32]
stk_ret_acf_2_rev = [stk_ret_reverse.autocorr(i) for i in range(1,32)]

test_df = pd.DataFrame([stk_ret_acf_1_rev, stk_ret_acf_2_rev]).T
test_df.columns = ['Pandas Autocorr', 'Statsmodels Autocorr']
test_df.index += 1
test_df.plot(kind='bar')

Out[45]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8780086668>
In [55]:
am_rev = arch_model(stk_ret_reverse)
res_rev = am_rev.fit(update_freq=5)
print(res_rev.summary())

Iteration:      5,   Func. Count:     57,   Neg. LLF: -5434.537980550844
Iteration:     10,   Func. Count:    101,   Neg. LLF: -5435.012878187394
Optimization terminated successfully.    (Exit mode 0)
Current function value: -5435.012878206995
Iterations: 12
Function evaluations: 112
Constant Mean - GARCH Model Results
==============================================================================
Dep. Variable:              Adj Close   R-squared:                      -0.001
Mean Model:             Constant Mean   Adj. R-squared:                 -0.001
Vol Model:                      GARCH   Log-Likelihood:                5435.01
Distribution:                  Normal   AIC:                          -10862.0
Method:            Maximum Likelihood   BIC:                          -10839.8
No. Observations:                 1930
Date:                Wed, Dec 05 2018   Df Residuals:                     1926
Time:                        05:34:03   Df Model:                            4
Mean Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
mu         1.2909e-03  4.417e-04      2.923  3.468e-03 [4.253e-04,2.157e-03]
Volatility Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
omega      2.4191e-05  1.189e-06     20.343  5.308e-92 [2.186e-05,2.652e-05]
alpha[1]       0.1227  6.650e-02      1.845  6.497e-02  [-7.615e-03,  0.253]
beta[1]        0.7851  4.163e-02     18.859  2.464e-79     [  0.703,  0.867]
============================================================================

Covariance estimator: robust


So both the forward and reverse timeseries have GARCH models that can be estimated. Thus, I'm not sure what that quant meant. Technically, whether a time series is reversed or not, its just a set of returns thus its not certain how the fit of a GARCH model would lead to one knowing whether it is reversed or not. All I can see is that it took more iterations for a reverse timeseries to converge. The other part is that perhaps the p-values for the reverse case are very small indicating that all the variables of the GARCH model are extremely significant?

#### Fitting a GARCH-GJR model (Forward)¶

In [52]:
am = arch_model(stk_ret, p=1, o=1, q=1)
res = am.fit(update_freq=5, disp='off')
print(res.summary())

                   Constant Mean - GJR-GARCH Model Results
==============================================================================
Dep. Variable:              Adj Close   R-squared:                      -0.000
Mean Model:             Constant Mean   Adj. R-squared:                 -0.000
Vol Model:                  GJR-GARCH   Log-Likelihood:                5454.52
Distribution:                  Normal   AIC:                          -10899.0
Method:            Maximum Likelihood   BIC:                          -10871.2
No. Observations:                 1930
Date:                Wed, Dec 05 2018   Df Residuals:                     1925
Time:                        05:33:16   Df Model:                            5
Mean Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
mu         8.6411e-04  3.594e-04      2.404  1.621e-02 [1.596e-04,1.569e-03]
Volatility Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
omega      6.6234e-05  1.033e-05      6.415  1.412e-10 [4.600e-05,8.647e-05]
alpha[1]       0.2000      0.137      1.461      0.144  [-6.830e-02,  0.468]
gamma[1]       0.0500      0.144      0.347      0.729     [ -0.233,  0.333]
beta[1]        0.4750  9.420e-02      5.043  4.591e-07     [  0.290,  0.660]
============================================================================

Covariance estimator: robust


#### Fitting a GARCH-GJR model (Reverse)¶

In [53]:
am = arch_model(stk_ret_reverse, p=1, o=1, q=1)
res = am.fit(update_freq=5, disp='off')
print(res.summary())

                   Constant Mean - GJR-GARCH Model Results
==============================================================================
Dep. Variable:              Adj Close   R-squared:                      -0.000
Mean Model:             Constant Mean   Adj. R-squared:                 -0.000
Vol Model:                  GJR-GARCH   Log-Likelihood:                5439.29
Distribution:                  Normal   AIC:                          -10868.6
Method:            Maximum Likelihood   BIC:                          -10840.7
No. Observations:                 1930
Date:                Wed, Dec 05 2018   Df Residuals:                     1925
Time:                        05:33:20   Df Model:                            5
Mean Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
mu         1.1064e-03  3.408e-04      3.247  1.168e-03 [4.385e-04,1.774e-03]
Volatility Model
=============================================================================
coef    std err          t      P>|t|       95.0% Conf. Int.
-----------------------------------------------------------------------------
omega      2.2049e-05  9.655e-12  2.284e+06      0.000  [2.205e-05,2.205e-05]
alpha[1]       0.0100  1.187e-02      0.842      0.400 [-1.327e-02,3.327e-02]
gamma[1]       0.1000  5.319e-02      1.880  6.012e-02   [-4.260e-03,  0.204]
beta[1]        0.8400  2.399e-02     35.019 1.140e-268      [  0.793,  0.887]
=============================================================================

Covariance estimator: robust