Model validation for logistic regression models
This is an overview of the diagnostic and performance tests that need to be performed to ensure the validity of a logistic regression model. The logistic regression measures the relationship between a binary dependent variable, and one or more continuous/categorical independent variables by estimating probabilities. It is able to do so via the logit link function.
The logit link function ensures that $P$ is bounded by 0 and 1. The logit function is preferred compared to the probit model due to the ease of interpretability of the results being the adjusted odds ratio.
Contents
Variable analysis
Variable contribution testing
Test 
Description 

Event Rate vs Predictor Plot 

Discriminatory power of predictor 

Weight of Evidence 

Information Value 

Wald Test 

Dropout/Switchout Test 
Discriminatory power
Test 
Description 

Receiver Operating Characteristic (ROC) Curve 

Somer's D 

KolmogorovSmirnov (KS) Test 
Predictive Accuracy testing
Test 
Description 

Predicted versus Actual Plot 

HosmerLemeshow test 

Mean Absolute Deviation 
Multicollinearity
Test 
Description 

Correlation assessment 
Data Diagnostics
The data used for modelling should be evaluated for the following:
 Compliance with relevant regulatory requirements

Often these requirements refer to data length requirements for different types of portfolios, ensuring the data length is representative of the economic cycle, and requirements for use of data proxies (e.g., BCC135 [Conservatism to risk parameters in Advanced Approaches], BCC143 [Selection of reference data periods and data deficiencies]).
 Outliers, missing or special values.

Outliers or influential data points should be identified (i.e., Cook's distance) and model performance should be evaluated with the exclusion of these outliers.
 Binning methodology.

The criteria for binning data needs to be supported economically and statistically. When data is being binned, make sure that each binned distribution is significantly different from each other (i.e., KStests) as it is not ideal to have too many bins. Excessive binning can lead to model overfit as binning linearizes the nonlinearities in continuous variables, thus strengthening the relationship with the dependent variable.
Test 
Description 

Autocorrelation (ACF) 
Autocorrelation describes the dependence (i.e., relationship) between a prior time step and the current observation. This dependence captured by ACF includes both direct and indirect dependence information. To decide the number of lags for the MA term, look at the spikes in the ACF plot. 
Partial autocorrelation (PACF) 
Partial autocorrelation describes only the direct dependence between an observation and its lag. The partial autocorrelation at lag kk is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. To decide the number of lags for the AR term, look at the spikes in the PACF plot. 
Stationarity 
If the ACF/PACF values zeroes quickly, the series can be considered stationary. If the ACF/PACF values decrease slowly or oscillate, the series may be nonstationary and transformations may need to be applied to produce a stationary series (e.g., first/second differencing, log transformation). 
Seasonality 
If there are spikes in the autocorrelation values of the ACF/PACF plot at regular intervals (i.e., 12, 24, 36,..., etc. for monthly data; 1, 4, 8,..., etc. for quarterly data), there is seasonality. Transformations may need to be applied to remove the seasonality effect (i.e., seasonality differencing). 
Stationarity
 Strict stationary.

Time series with statistical properties such as mean, variance, autocorrelation, etc. are all constant over time.
 Trend stationary

Time series has no unit root but exhibits a trend. If the trend is removed from the trend stationary series, it becomes strict stationary.
 Difference stationary

Time series that can be made stationary via differencing.
Test 
Description 

Augmented DickyFuller (ADF) 
Detects whether a timeseries can be made strict stationary (or trend stationary) via differencing (removing the trend). Is parametric and requires selection of the level of serial correlation. The null hypotheses is that the process has a unit root. 
PhillipsPerron (PP) 
Detects whether a timeseries can be made strict stationary via differencing. Is nonparametric and improves the ADF by correcting for autocorrelations and heteroscedasticity (i.e., HAC type corrections). Requires large datasets. 
KwiatkowskiPhillipsSchmidtShin (KPSS) 
Detects whether a timeseries has a unit root. Thus, the timeseries can be strict stationary or trend stationary. The null hypotheses is that the process is trendstationary. 
Cointegration
Although regressing two nonstationary variables against each other leads to spurious regressions, it is acceptable to do if both variables are cointegrated. When you have two nonstationary processes (i.e., $X_1$ and $X_2$), there is a vector (i.e., cointegration vector) that can combine these two processes into a stationary process. Basically, the stochastic trends in both $X_1$ and $X_2$ are the same and can be cancelled out using the cointegration vector.
It is possible to difference these nonstationary variables, but often doing so can result in a loss of information regarding their longrun relationship. Thus, regressing two nonstationary variables that are cointegrated may be preferable.
Test 
Description 

EngleGranger 
Tests each time series for a unit root (i.e., nonstationarity) using an ADF test. If the time series has a unit root, an OLS is run between the time series to obtain the residuals. The residuals are tested using ADF and if they are stationary, then the original time series are cointegrated. The time series that are being considered must be of the same order of integration. 
PhillipsOuliaris 
This is an improvement on the EngleGranger test that is accounts the variability in the residuals since they are estimates, not actual parameter values. It is also invariant to the normalization of the cointegration relationship. 
Johansen test 
This is an improvement on the EngleGranger test that it avoids the issue of chossing a dependent variable and when errors are being used from one step to the next in the EngleGranger test. The Johansen test can detect multiple cointegrating vectors. 
Structural breaks
Test 
Description 

Chow Test 
The null hypotheses is that there is no structural break in the data. On a graphical or theoretical basis, the data is split into two samples and regressions are run on each sample The Chow test is used to evaluate whether the model paramters from the two data samples are statistically similar. Evidence of a structural break means that the model may need to be estimated using different specifications (i.e., spline functions) or data (i.e., data subsets, data exclusions). 
Comments
Comments powered by Disqus