This is an overview of the diagnostic and performance tests that need to be performed to ensure the validity of a logistic regression model. The logistic regression measures the relationship between a binary dependent variable, and one or more continuous/categorical independent variables by estimating probabilities. It is able to do so via the logit link function.

\begin{equation*} ln \left (\frac{P}{1-P} \right) = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \ldots + \beta_{n} x_{n} \end{equation*}

The logit link function ensures that $P$ is bounded by 0 and 1. The logit function is preferred compared to the probit model due to the ease of interpretability of the results being the adjusted odds ratio.

Contents

Variable analysis
- Variable contribution testing
Discriminatory power
Predictive Accuracy testing
Performance testing
Multicollinearity
Data Diagnostics
Stationarity
Cointegration
Structural breaks
Model diagnostics
Performance testing

Variable analysis

Variable contribution testing

Test	Description
Event Rate vs Predictor Plot
Discriminatory power of predictor
Weight of Evidence
Information Value
Wald Test
Dropout/Switchout Test

Discriminatory power

Test	Description
Receiver Operating Characteristic (ROC) Curve
Somer's D
Kolmogorov-Smirnov (KS) Test

Predictive Accuracy testing

Test	Description
Predicted versus Actual Plot
Hosmer-Lemeshow test
Mean Absolute Deviation

Performance testing

System Message: ERROR/3 (<string>, line 82)

Error parsing content block for the "list-table" directive: uniform two-level bullet list expected, but row 5 does not contain the same number of items as row 1 (1 vs 2).

.. list-table::
    :widths: 10 50
    :header-rows: 1
    :align: center

    * - Test
      - Description
    * - Out-of-sample testing
      -
    * - Cross validation
      -
    * - Sensitivity analysis
      -
    * - Error attribution analysis

Multicollinearity

Test	Description
Correlation assessment

Data Diagnostics

The data used for modelling should be evaluated for the following:

Compliance with relevant regulatory requirements

Often these requirements refer to data length requirements for different types of portfolios, ensuring the data length is representative of the economic cycle, and requirements for use of data proxies (e.g., BCC13-5 [Conservatism to risk parameters in Advanced Approaches], BCC14-3 [Selection of reference data periods and data deficiencies]).
Outliers, missing or special values.

Outliers or influential data points should be identified (i.e., Cook's distance) and model performance should be evaluated with the exclusion of these outliers.
Binning methodology.

The criteria for binning data needs to be supported economically and statistically. When data is being binned, make sure that each binned distribution is significantly different from each other (i.e., KS-tests) as it is not ideal to have too many bins. Excessive binning can lead to model overfit as binning linearizes the nonlinearities in continuous variables, thus strengthening the relationship with the dependent variable.

Test	Description
Auto-correlation (ACF)	Auto-correlation describes the dependence (i.e., relationship) between a prior time step and the current observation. This dependence captured by ACF includes both direct and indirect dependence information. To decide the number of lags for the MA term, look at the spikes in the ACF plot.
Partial auto-correlation (PACF)	Partial auto-correlation describes only the direct dependence between an observation and its lag. The partial autocorrelation at lag kk is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. To decide the number of lags for the AR term, look at the spikes in the PACF plot.
Stationarity	If the ACF/PACF values zeroes quickly, the series can be considered stationary. If the ACF/PACF values decrease slowly or oscillate, the series may be non-stationary and transformations may need to be applied to produce a stationary series (e.g., first/second differencing, log transformation).
Seasonality	If there are spikes in the autocorrelation values of the ACF/PACF plot at regular intervals (i.e., 12, 24, 36,..., etc. for monthly data; 1, 4, 8,..., etc. for quarterly data), there is seasonality. Transformations may need to be applied to remove the seasonality effect (i.e., seasonality differencing).

Stationarity

Strict stationary.: Time series with statistical properties such as mean, variance, autocorrelation, etc. are all constant over time.
Trend stationary: Time series has no unit root but exhibits a trend. If the trend is removed from the trend stationary series, it becomes strict stationary.
Difference stationary: Time series that can be made stationary via differencing.

Test	Description
Augmented Dicky-Fuller (ADF)	Detects whether a timeseries can be made strict stationary (or trend stationary) via differencing (removing the trend). Is parametric and requires selection of the level of serial correlation. The null hypotheses is that the process has a unit root.
Phillips-Perron (PP)	Detects whether a timeseries can be made strict stationary via differencing. Is non-parametric and improves the ADF by correcting for autocorrelations and heteroscedasticity (i.e., HAC type corrections). Requires large datasets.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS)	Detects whether a timeseries has a unit root. Thus, the timeseries can be strict stationary or trend stationary. The null hypotheses is that the process is trend-stationary.

Cointegration

Although regressing two non-stationary variables against each other leads to spurious regressions, it is acceptable to do if both variables are co-integrated. When you have two non-stationary processes (i.e., $X_1$ and $X_2$), there is a vector (i.e., co-integration vector) that can combine these two processes into a stationary process. Basically, the stochastic trends in both $X_1$ and $X_2$ are the same and can be cancelled out using the co-integration vector.

It is possible to difference these non-stationary variables, but often doing so can result in a loss of information regarding their long-run relationship. Thus, regressing two non-stationary variables that are co-integrated may be preferable.

Test	Description
Engle-Granger	Tests each time series for a unit root (i.e., non-stationarity) using an ADF test. If the time series has a unit root, an OLS is run between the time series to obtain the residuals. The residuals are tested using ADF and if they are stationary, then the original time series are co-integrated. The time series that are being considered must be of the same order of integration.
Phillips-Ouliaris	This is an improvement on the Engle-Granger test that is accounts the variability in the residuals since they are estimates, not actual parameter values. It is also invariant to the normalization of the cointegration relationship.
Johansen test	This is an improvement on the Engle-Granger test that it avoids the issue of chossing a dependent variable and when errors are being used from one step to the next in the Engle-Granger test. The Johansen test can detect multiple cointegrating vectors.

Structural breaks

Test	Description
Chow Test	The null hypotheses is that there is no structural break in the data. On a graphical or theoretical basis, the data is split into two samples and regressions are run on each sample The Chow test is used to evaluate whether the model paramters from the two data samples are statistically similar. Evidence of a structural break means that the model may need to be estimated using different specifications (i.e., spline functions) or data (i.e., data subsets, data exclusions).

Test

Description

Chow Test

The null hypotheses is that there is no structural break in the data. On a graphical or theoretical basis, the data is split into two samples and regressions are run on each sample The Chow test is used to evaluate whether the model paramters from the two data samples are statistically similar. Evidence of a structural break means that the model may need to be estimated using different specifications (i.e., spline functions) or data (i.e., data subsets, data exclusions).

Comments