Model validation for logistic regression models

This is an overview of the diagnostic and performance tests that need to be performed to ensure the validity of a logistic regression model. The logistic regression measures the relationship between a binary dependent variable, and one or more continuous/categorical independent variables by estimating probabilities. It is able to do so via the logit link function.

\begin{equation*} ln \left (\frac{P}{1-P} \right) = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \ldots + \beta_{n} x_{n} \end{equation*}

The logit link function ensures that $P$ is bounded by 0 and 1. The logit function is preferred compared to the probit model due to the ease of interpretability of the results being the adjusted odds ratio.

Variable analysis

Variable contribution testing



Event Rate vs Predictor Plot

Discriminatory power of predictor

Weight of Evidence

Information Value

Wald Test

Dropout/Switchout Test

Discriminatory power



Receiver Operating Characteristic (ROC) Curve

Somer's D

Kolmogorov-Smirnov (KS) Test

Predictive Accuracy testing



Predicted versus Actual Plot

Hosmer-Lemeshow test

Mean Absolute Deviation

Performance testing

System Message: ERROR/3 (<string>, line 82)

Error parsing content block for the "list-table" directive: uniform two-level bullet list expected, but row 5 does not contain the same number of items as row 1 (1 vs 2).

.. list-table::
    :widths: 10 50
    :header-rows: 1
    :align: center

    * - Test
      - Description
    * - Out-of-sample testing
    * - Cross validation
    * - Sensitivity analysis
    * - Error attribution analysis




Correlation assessment

Data Diagnostics

The data used for modelling should be evaluated for the following:

  1. Compliance with relevant regulatory requirements

    Often these requirements refer to data length requirements for different types of portfolios, ensuring the data length is representative of the economic cycle, and requirements for use of data proxies (e.g., BCC13-5 [Conservatism to risk parameters in Advanced Approaches], BCC14-3 [Selection of reference data periods and data deficiencies]).

  2. Outliers, missing or special values.

    Outliers or influential data points should be identified (i.e., Cook's distance) and model performance should be evaluated with the exclusion of these outliers.

  3. Binning methodology.

    The criteria for binning data needs to be supported economically and statistically. When data is being binned, make sure that each binned distribution is significantly different from each other (i.e., KS-tests) as it is not ideal to have too many bins. Excessive binning can lead to model overfit as binning linearizes the nonlinearities in continuous variables, thus strengthening the relationship with the dependent variable.



Auto-correlation (ACF)

Auto-correlation describes the dependence (i.e., relationship) between a prior time step and the current observation. This dependence captured by ACF includes both direct and indirect dependence information. To decide the number of lags for the MA term, look at the spikes in the ACF plot.

Partial auto-correlation (PACF)

Partial auto-correlation describes only the direct dependence between an observation and its lag. The partial autocorrelation at lag kk is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. To decide the number of lags for the AR term, look at the spikes in the PACF plot.


If the ACF/PACF values zeroes quickly, the series can be considered stationary. If the ACF/PACF values decrease slowly or oscillate, the series may be non-stationary and transformations may need to be applied to produce a stationary series (e.g., first/second differencing, log transformation).


If there are spikes in the autocorrelation values of the ACF/PACF plot at regular intervals (i.e., 12, 24, 36,..., etc. for monthly data; 1, 4, 8,..., etc. for quarterly data), there is seasonality. Transformations may need to be applied to remove the seasonality effect (i.e., seasonality differencing).


Strict stationary.

Time series with statistical properties such as mean, variance, autocorrelation, etc. are all constant over time.

Trend stationary

Time series has no unit root but exhibits a trend. If the trend is removed from the trend stationary series, it becomes strict stationary.

Difference stationary

Time series that can be made stationary via differencing.



Augmented Dicky-Fuller (ADF)

Detects whether a timeseries can be made strict stationary (or trend stationary) via differencing (removing the trend). Is parametric and requires selection of the level of serial correlation. The null hypotheses is that the process has a unit root.

Phillips-Perron (PP)

Detects whether a timeseries can be made strict stationary via differencing. Is non-parametric and improves the ADF by correcting for autocorrelations and heteroscedasticity (i.e., HAC type corrections). Requires large datasets.

Kwiatkowski-Phillips-Schmidt-Shin (KPSS)

Detects whether a timeseries has a unit root. Thus, the timeseries can be strict stationary or trend stationary. The null hypotheses is that the process is trend-stationary.


Although regressing two non-stationary variables against each other leads to spurious regressions, it is acceptable to do if both variables are co-integrated. When you have two non-stationary processes (i.e., $X_1$ and $X_2$), there is a vector (i.e., co-integration vector) that can combine these two processes into a stationary process. Basically, the stochastic trends in both $X_1$ and $X_2$ are the same and can be cancelled out using the co-integration vector.

It is possible to difference these non-stationary variables, but often doing so can result in a loss of information regarding their long-run relationship. Thus, regressing two non-stationary variables that are co-integrated may be preferable.




Tests each time series for a unit root (i.e., non-stationarity) using an ADF test. If the time series has a unit root, an OLS is run between the time series to obtain the residuals. The residuals are tested using ADF and if they are stationary, then the original time series are co-integrated. The time series that are being considered must be of the same order of integration.


This is an improvement on the Engle-Granger test that is accounts the variability in the residuals since they are estimates, not actual parameter values. It is also invariant to the normalization of the cointegration relationship.

Johansen test

This is an improvement on the Engle-Granger test that it avoids the issue of chossing a dependent variable and when errors are being used from one step to the next in the Engle-Granger test. The Johansen test can detect multiple cointegrating vectors.

Structural breaks



Chow Test

The null hypotheses is that there is no structural break in the data. On a graphical or theoretical basis, the data is split into two samples and regressions are run on each sample The Chow test is used to evaluate whether the model paramters from the two data samples are statistically similar. Evidence of a structural break means that the model may need to be estimated using different specifications (i.e., spline functions) or data (i.e., data subsets, data exclusions).


Comments powered by Disqus