Model validation for time series regression models

This is an overview of the diagnostic and performance tests that need to be performed to ensure the validity of a time-series ARIMAX regression model.

\begin{equation*} R_{t} = \mu + \sum^{p}_{i=1} \alpha_{i} R_{t-i} + \sum^{q}_{j=1} \beta_{j} \epsilon_{t-j} + \sum^{r}_{k=1} \gamma_{k} X_{t,k} + \epsilon_{t} \text{, where } \epsilon \backsim N(0,\sigma) \end{equation*}

In the above equation, $R_{t}$ is the differenced original series. For example, if we have stock prices ($P_{t}$), we perform a differencing operation of 1 (i.e., $R_{t} = P_{t} - P_{t-1}$).

A time series is integrated of order $d$ if $(1-L)^{d} X_{t}$ is a stationary process where $L$ is the lag operator and $1-L$ is the first difference (i.e., $(1-L)X_{t}=X_{t}-X_{t-1}$). A time series of order $d=2$ is as follows:

\begin{equation*} \triangle^{2} R_{t} = \triangle Y_{t} - \triangle Y_{t-1} \end{equation*}

The values of $p$ and $q$ are the orders of the autoregressive (AR) and moving average (MA) terms. The value of $r$ denotes the number of independent variable (X) terms included in the model.

The assumptions of the ARIMAX regression model are:


The stochastic process within the time series is time invariant (i.e., constant mean and variance through time). Stationarity testing is important on both the independent and dependent variables as two variables that are non-stationary that are regressed on one another can lead to spurious regressions.


Error term is normally distributed.


Error terms are statistically independent.


Error term has constant variance for all observations.

Lack of multicollinearity

No excessive correlation between independent variables.

Data Diagnostics

The data used for modelling should be evaluated for the following:

  1. Compliance with relevant regulatory requirements

    Often these requirements refer to data length requirements for different types of portfolios, ensuring the data length is representative of the economic cycle, and requirements for use of data proxies (e.g., BCC13-5 [Conservatism to risk parameters in Advanced Approaches], BCC14-3 [Selection of reference data periods and data deficiencies]).

  2. Outliers, missing or special values.

    Outliers or influential data points should be identified (i.e., Cook's distance) and model performance should be evaluated with the exclusion of these outliers.

Model identification

These tests evaluate how well a regression model fits the data. The tests are formal regression statistics and descriptive fit statistics all of which assess the statistical significance of the independent variables individually and as a whole.



Auto-correlation (ACF)

Auto-correlation describes the dependence (i.e., relationship) between a prior time step and the current observation. This dependence captured by ACF includes both direct and indirect dependence information. To decide the number of lags for the MA term, look at the spikes in the ACF plot.

Partial auto-correlation (PACF)

Partial auto-correlation describes only the direct dependence between an observation and its lag. The partial autocorrelation at lag kk is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. To decide the number of lags for the AR term, look at the spikes in the PACF plot.


If the ACF/PACF values zeroes quickly, the series can be considered stationary. If the ACF/PACF values decrease slowly or oscillate, the series may be non-stationary and transformations may need to be applied to produce a stationary series (e.g., first/second differencing, log transformation).


If there are spikes in the autocorrelation values of the ACF/PACF plot at regular intervals (i.e., 12, 24, 36,..., etc. for monthly data; 1, 4, 8,..., etc. for quarterly data), there is seasonality. Transformations may need to be applied to remove the seasonality effect (i.e., seasonality differencing).


Strict stationary.

Time series with statistical properties such as mean, variance, autocorrelation, etc. are all constant over time.

Trend stationary

Time series has no unit root but exhibits a trend. If the trend is removed from the trend stationary series, it becomes strict stationary.

Difference stationary

Time series that can be made stationary via differencing.



Augmented Dicky-Fuller (ADF)

Detects whether a timeseries can be made strict stationary (or trend stationary) via differencing (removing the trend). Is parametric and requires selection of the level of serial correlation. The null hypotheses is that the process has a unit root.

Phillips-Perron (PP)

Detects whether a timeseries can be made strict stationary via differencing. Is non-parametric and improves the ADF by correcting for autocorrelations and heteroscedasticity (i.e., HAC type corrections). Requires large datasets.

Kwiatkowski-Phillips-Schmidt-Shin (KPSS)

Detects whether a timeseries has a unit root. Thus, the timeseries can be strict stationary or trend stationary. The null hypotheses is that the process is trend-stationary.


Although regressing two non-stationary variables against each other leads to spurious regressions, it is acceptable to do if both variables are co-integrated. When you have two non-stationary processes (i.e., $X_1$ and $X_2$), there is a vector (i.e., co-integration vector) that can combine these two processes into a stationary process. Basically, the stochastic trends in both $X_1$ and $X_2$ are the same and can be cancelled out using the co-integration vector.

It is possible to difference these non-stationary variables, but often doing so can result in a loss of information regarding their long-run relationship. Thus, regressing two non-stationary variables that are co-integrated may be preferable.




Tests each time series for a unit root (i.e., non-stationarity) using an ADF test. If the time series has a unit root, an OLS is run between the time series to obtain the residuals. The residuals are tested using ADF and if they are stationary, then the original time series are co-integrated. The time series that are being considered must be of the same order of integration.


This is an improvement on the Engle-Granger test that is accounts the variability in the residuals since they are estimates, not actual parameter values. It is also invariant to the normalization of the cointegration relationship.

Johansen test

This is an improvement on the Engle-Granger test that it avoids the issue of chossing a dependent variable and when errors are being used from one step to the next in the Engle-Granger test. The Johansen test can detect multiple cointegrating vectors.

Structural breaks



Chow Test

The null hypotheses is that there is no structural break in the data. On a graphical or theoretical basis, the data is split into two samples and regressions are run on each sample The Chow test is used to evaluate whether the model paramters from the two data samples are statistically similar. Evidence of a structural break means that the model may need to be estimated using different specifications (i.e., spline functions) or data (i.e., data subsets, data exclusions).


Comments powered by Disqus