Response to discussion on machine learning for IRB models

Go back

1.1: For the estimation of which parameters does your institution currently use or plan to use ML models, i.e. PD, LGD, ELBE, EAD, CCF?

As an audit and consulting firm, we saw banks mainly use ML models in PD for both model development and validation, given that it’s the most important model in the IRB context.

1.2: Can you specify for which specific purposes these ML models are used or planned to be used? Please specify at which stage of the estimation process they are used, i.e. data preparation, risk differentiation, risk quantification, validation.

We saw banks use ML models in risk differentiation and validation. For example, ML models can be used in risk driver selection, segmentation analysis, PD score banding and can be used as challenge models in model validation.

1.3: Please also specify the type of ML models and algorithms (e.g. random forest, k-nearest neighbours, etc.) you currently use or plan to use in the IRB context?

We saw banks use various ML models and algorithms:
• Random forest used for high predictive risk driver selection
• K-means clustering used for PD score banding / model segmentation
• Kalman filter / recursive Bayesian estimation used to smooth the transition matrix
• Random Forest / XGBoostGBM used as challenge models in model validation

1.4: Are you using or planning to use unstructured data for these ML models? If yes, please specify what kind of data or type of data sources you use or are planning to use. How do you ensure an adequate data quality?

We did not see banks use unstructured data. Unstructured data could provide additional information for ML models. But most banks do not have such data (e.g. social media data) in their current system and it’s costly to build such databases. Also, data quality and representativeness of unstructured data is difficult to ensure.

2: Have you outsourced or are you planning to outsource the development and implementation of the ML models and, if yes, for which modelling phase? What are the main challenges you face in this regard?

We saw banks outsource the ML models either in the development or implementation phase. But more in the development phase, as most banks would take over the ownership of the model and implement it in its IT infrastructure.

Challenges:
1. Banks need to recruit more resources/build new teams with relevant knowledge and skills to perform risk controls and governance over ML models, for example modelling, validation, and audit, as well as management functions who need adequate ML knowledge in order to perform model (re)calibration, validation, audit and governance.
2. ML models are not easy to understand compared to traditional models (e.g. logistic regression). Thus, it raises challenges for validators, auditors, and regulators to validate and review ML models. As such, clear documentation with sufficient details is very important.

3: Do you see or expect any challenges regarding the internal user acceptance of ML models (e.g. by credit officers responsible for credit approval)? What are the measures taken to ensure good knowledge of the ML models by their users (e.g. staff training, adapting required documentation to these new models)?

- Challenges regarding the internal user acceptance of ML models:
o Model change or new implementations will impact the existing codes, systems, and documentations.
o The ML algorithm might require more data to build a robust model. The credit officers need to ensure the availability of IT programs able to manage the new data flow.
o A new documentation will need to be performed and reviewed by Validation team and Audit team. So, it does not only involve the credit modellers, but the whole chain.
- Measures taken to ensure good knowledge of the ML models by their users:
o Staff training: internal and / or external training, as well as courses to enhance credit modellers skills
o Hiring new profiles specialized in ML
o Existing staff should be collaborating with new hired ML stuff to work together on the use of existing models and to improve them with including ML components.

4: If you use or plan to use ML models in the context of IRB, can you please describe if and where (i.e. in which phase of the estimation process, e.g. development, application or both) human intervention is allowed and how it depends on the specific use of the ML model?

- Assessment of the consistency of model outputs with respect to business expectations.
- Some of the ML algorithms might be subject to some human choices in the model assumptions, for example cost function selection.
- Choice of the length of the data used to implement the new models (which must be aligned with regulatory requirements).
- Depending on the amount of available data, human intervention will determine the part of data that will be used for model implementation and the part of data that will be used for model validation and model testing. There are general market practices, but this is highly dependent on the parameters to estimate and the availability as well as the quality of the data.

5. Do you see any issues in the interaction between data retention requirements of GDPR and the CRR requirements on the length of the historical observation period?

- The main risk for Banks due to GDPR is the right for the Client to be forgotten. For LGD models, Banks might have to get rid of data that was useful for their model.
- We don’t think the use of Machine Learning changes this issue with respect to GDPR requirements.

6.a) Methodology (e.g. which tests to use/validation activities to perform).

Example 1:
a) Methodology (e.g. which tests to use/validation activities to perform).

Use of the ML algorithm: the ML algorithm was used by a bank in the Staging approach. The goal of the ML algorithm is to estimate quantitative thresholds that depends on the initial rating of the borrower and on its rating at the as of date of the ECL (expected credit loss) computation. To determine whether a borrower has experienced a significant increase in credit risk, a score is computed then compared to the calibrated threshold.
The model used is the adjusted logistic regression which is an upgraded version of the standard logistic regression.

Example 2:
a) Methodology (e.g. which tests to use/validation activities to perform).
Use of the ML algorithm: the ML algorithm was used on the calculation of an LGD parameter for a bank’s ECL model. CART (Classification and regression trees) is used to determine future value of a limit utilization parameter (the ratio of balance sheet exposure to the limit).

Example 3:
a) Methodology (e.g. which tests to use/validation activities to perform).
Use of the ML algorithm: the ML algorithm was used by a bank in the validation of the probability of default (PD) segmentation for the bank’s PD model for ECL. The algorithm is based on a mixture of gradient boosting and linear regression. It produces a multi-level binary tree that indicates how reference data is progressively split into segments.

Example 4:
a) Methodology (e.g. which tests to use/validation activities to perform).
The ML algorithm was used by a bank as part of the modelling the probability of default. It was used in the multinomial logistic regression chosen to model the migration of credit ratings across segments, which was one of the three models concerned with modelling the PD (probability of default). The goal of the ML algorithm is to estimate the optimal parameters that maximize the likelihood function of the multinomial logistic regression, where the multinomial logistic regression is viewed as a feedforward neural network with one hidden layer. The multinomial regression had three states for the dependent variable: an upgrade in credit rating, a downgrade, and no change. The reason for the use of an ML algorithm here is for increased efficiency since a multinomial logistic regression is more complex than a standard logistic regression model and parameter estimation using traditional methods is more difficult.

6.b) Traceability (e.g. how to identify the root cause for an identified issue).


Example 1:
Traceability (e.g. how to identify the root cause for an identified issue).
The model was tested checking the correct implementation of the model in terms of data processing and training/fitting.
Additionally, the bank carried out sensitivity analysis on the outputs of the model along with sanity checks to assess whether these outputs are reliable, under different scenarios. For example, if 2 borrowers with 2 different initial ratings 〖(rating〗_0^1 is better than 〖rating〗_0^2), have experienced a downgrade between 0 and t but have the same rating at time t (〖rating〗_t^1=〖rating〗_t^2), then the score of borrower 1 has to be higher than the score of borrower 2, meaning that borrower 1 has higher risk to breach the calibrated threshold.
Another test is to assess under which scenarios the breach of the calibrated threshold happens, which allows the analysis of the soundness and robustness of the ML model outputs.

Example 3:
a) Traceability (e.g. how to identify the root cause for an identified issue).
The validation team then compares the segmentation resulting from the ML model with the original PD model segmentation based on business intuition.
- Tests: The metrics used to evaluate the results are MSE (mean squared error), Log Loss and AUC.
- Results: It was found that the ML methodology mostly produces segmentation close to the segmentation based on business intuition. However, in very few cases, the ML algorithm recommends merging two business segments when there was a strong business rationale to have separate modelling segments according to the modelers. A conclusion is thus drawn that the results of the ML algorithm cannot be blindly followed.

Example 4:
a) Traceability (e.g. how to identify the root cause for an identified issue).
Independent implementation using R. From the code output, one can see whether the likelihood function was able to converge to a maximum, which suggests that the parameters obtained by the ML algorithm were indeed optimal. One can also implement tests to check the predictive accuracy of the fitted model via the Area Under Curve (AUC) metric. Additionally, one can perform directional checks on the coefficients, to check that they were logical. For example, a positive coefficient for an upgrade in GDP was obtained. Intuitively, this means that increasing the GDP increases the multinomial log-odds for an upgrade in rating, versus a downgrade in rating, which agrees with what we would expect.

6.c) Knowledge needed by the validation function (e.g. specialised training sessions on ML techniques by an independent party).New textarea

To approach an ML model in credit risk, it is essential to have knowledge of machine learning, deep learning and to be familiar with programming languages to be able to perform tests on the model. Not only the credit modellers have to be trained, but also the validation department and the internal audit department. For further details, please refer to our answer to question 3.

6.d) Resources needed to perform the validation (e.g. more time needed for validation)?

Please refer to our answer to question 3.

8: What are the specific challenges you see regarding the development, maintenance and control of ML models in the IRB context, e.g., when verifying the correct implementation of internal rating and risk parameters in IT systems, when monitoring the correct functioning of the models or when integrating control models for identifying possible incidences?

The main advantage of a statistically developed model is its high level of objectivity; however, this does not always imply that in this way the “best” model is obtained. Good models are not built blindly based purely on the outputs of statistical methods, they are also based on expert judgement and experience. For each predictor variable, there must be an intuitive business reason to explain why it is related to the target variable (default flag, recoveries, exposure at default…). This is important because the use of intuitive variables renders the model less reliant on the specific data sample used to create it, making it more likely that the model will hold up under new data and even under changing economic conditions. It also makes the model more intuitive as a predictor and therefore it makes it easier to obtain acceptance amongst credit officers (business).
The overall model should cover a broad set of financial and non-financial variables that are simultaneously statistically robust and intuitively acceptable to credit officers.
All the above depends on the modelling technique selected. The more complex the model (the more of a “black box” it is), the less likely that will hold. Accordingly, one needs to ensure everything is mathematically sound and makes sense from a business perspective. For example, modelling low default portfolios (LDPs) in this way might be challenging. LDPs are not proper to ML algorithms, as it is an existing issue with the currently used models. This might be more challenging as ML algorithms are supposed to use larger datasets, to implement a proper model training/validation/testing.

Upload files

Name of the organization

Mazars