In our first article, we reviewed some of the key findings from the Bank of England’s (BoE) and Financial Conduct Authority’s (FCA) ‘machine learning in UK financial services’ survey. Our second piece explored some of the key risks that Machine Learning (ML) applications present to financial institutions (FIs). Here, in this final article, we will discuss some of the risk management approaches that are available to FIs.

The management of risks is critical to ensure the safety, security, and reliability of the use of ML applications. If not addressed appropriately, ML risks could result in customer harm, significant financial losses, legal liabilities, and reputational damage for FIs.

The survey highlights that effective ML risk mitigation requires robust governance frameworks such as model risk management and data-quality validation, and thorough assessments and reviews of ML models from the development to deployment stages. Additionally, it is essential to establish clear lines of accountability to ensure supervision of any autonomous decisions. It is interesting to note that there are parallels here to Article 22 under GDPR that gives people the right not to be subject to solely automated decisions. Moreover, Chapter 2 of the EU’s proposed AI Act outlines the legal requirements for high-risk AI systems in relation to risk management system, data and data governance, and human oversight, to minimise the risk of erroneous or biased AI-assisted decisions and to protect people’s fundamental rights.

Model validation

Model validation is crucial to reducing the risks associated with using ML applications, as it helps ensure that the models are accurate and reliable. Validation techniques should be used throughout the ML development lifecycle, from the pre-deployment phase where the model is being trained and tested, to the post-deployment phase where it is live in the business. By continuously monitoring and assessing the performance of the ML application, any issues or potential risks can be identified and addressed promptly and proactively. The survey highlights several common validation methods currently used by FIs, including:

  • Outcome monitoring against a benchmark: This method evaluates the performance of an ML application against predetermined historical benchmarks or standards. It involves setting up specific performance metrics, such as profitability or customer satisfaction, and comparing these to a predefined threshold value. By monitoring the ML application’s performance against these benchmarks, it is possible to detect when the application is underperforming or diverging from its desired behaviour so that corrective actions can be taken.

  • Outcome monitoring against non-ML model or A–B testing: This method involves comparing the performance metrics or outcomes of an ML application against those of a non-ML one. It allows FIs to assess if the ML application is providing similar or even better results than their existing non-ML application.

  • Black box testing: This test involves experimenting with different inputs and examining the corresponding outputs to understand how the ML application works. It analyses the input/output of the ML application to determine whether it is performing as intended and to identify any anomalous behaviours that could indicate potential risks.

  • Explainability tools: These tools provide a more advanced and comprehensive approach to testing the input/output of an ML application, enabling FIs to understand how the application produces a specific result. By using explainability tools, FIs can detect bias, errors, or potential risks, as well as generate explanations of how the application work to provide transparency.

  • Validation of data quality: This approach examines the accuracy, completeness, and consistency of the data used to train and test ML applications to detect and eliminate errors, biases, and other risks.

The survey reports that the most popular validation methods are outcome monitoring and testing against benchmarks and data quality validation, with 81% of respondents using these methods. More than half of the respondents (63%) benchmark outcomes against non-ML models. Black box testing techniques were used less frequently, by less than half of the respondents.

With the financial services industry constantly evolving, ML applications are often equipped with the ability to quickly identify and adjust to new behaviours using live training data. These behaviours may be found in consumer spending patterns, fraud scams, and money laundering typologies. As ML applications are continuously being refreshed and updated to reflect these changes, it is essential for FIs to establish a robust model validation framework that monitors new behaviours exhibited by ML applications to prevent unfair treatment or discrimination.


The survey found that 42% of respondents use some form of monitoring to manage ML risks, but the responses did not provide details about the specific safeguards in place for their ML applications. Among the controls commonly used by respondents, the three most frequently named were ‘alert systems’, ‘human-in-the-loop’, and ‘back-up systems’. ‘Alert systems’ flag unusual or unexpected actions to employees, allowing them to investigate and take corrective actions if necessary. ‘Human-in-the-loop’ systems require a human to review or approve decisions made by the ML application before they are executed, providing an additional layer of oversight. ‘Back-up systems’ perform the same or similar function as the ML application and can be used as a replacement in the event of any failures or errors to minimise negative impact.


The use of ML in financial institutions has significantly increased over recent years and this trend seems set to continue, or even accelerate further. As the number, maturity, and level of sophistication of ML applications increase, so do the risks associated with them. The key risk factors include those related to the data, the model, and the governance framework that sits around these. To mitigate these risks, it is important to prioritise input data quality, properly validate models, and implement a strong governance framework with appropriate safeguards.

The benefits of ML applications are clear and are being realised by FIs today. However, the shift from deterministic models to ML-based ones, whose behaviour is more difficult to understand and explain, presents new risks to FIs and is likely to invite regulatory scrutiny in the near future.

You can read the previous articles in this series below: