Investigating decision-making algorithms, part 2: Investigating conventional algorithms

How can a decision-making algorithm and the environment it operates in be fully investigated? In part, this is determined by which element in the decisioning system is suspected of having an issue. In the first article of this series, we identified several potential components of the decision-making system that could present an issue, namely:

The input data
The algorithm’s configuration
The reference data
The algorithm itself
The algorithm’s outputs.

In our experience, investigations of this kind do not proceed in a strict linear fashion, especially when the decisioning system is still in active use. Nonetheless, there is an overarching sequence of steps we would typically follow. There are also situations where the potential issue applies to an active system, creating a need for a parallel process that escalates identified issues in real-time so that tactical fixes can be applied to address these.

Purpose and history

Gaining an understanding of the purpose and history of the decision-making algorithm is critical to understanding whether issues have occurred and what the likely impacts of these might be. While this may sound obvious and trivial, getting a clear understanding of this can be challenging, particularly when the algorithm has been in place for a long time, has had many changes to it, or where its function spans multiple parts of the business.

Some of the key questions to ask about the algorithm could include:

What was the primary objective of the decision-making system?
When was it first introduced and how did it evolve over time?
Is it still in use and, if not, when did it cease operating?
What was its scope (e.g. which business units, products, services, geographies, and regulations did it serve)?
What are the main components of the system, in particular from a configuration perspective?
Who was responsible for setting it up, maintaining it, and overseeing it?
What is the evidence that there might be an issue with the system?

Preservation

An immediate priority in investigations of this nature should be an assessment of the need to preserve the system and its related data. The details of this will always be case-specific, but in our experience the following components should be investigated and considered for preservation:

The algorithm’s source code
The IT environment in which the algorithm runs, such as the operating system and programming environment and any configuration files
The reference data, including the history of how this was updated over time
Copies of historical data inputs and outputs
Technical and change control documentation
Historical testing and calibration results.

Sandbox environment

To fully investigate the system, it is often necessary to have a fully working sandbox environment available for interrogation. Often systems will already have testing environments in place, and these can be cloned to provide a sandbox environment in which the system can be safely interrogated. In addition to the IT challenges of setting up a sandbox environment, a key consideration is to gain assurance that the sandbox version of the algorithm is a reasonably faithful replica of the actual system.

During a high-profile regulatory investigation, we spent a significant period calibrating the algorithm, using historical input and output data, to ensure we had faithfully recreated its behavior, and that we could evidence this to the regulator.

Establish what went wrong

At this point in the investigation, we get to heart of the matter. How you go about investigating an algorithm and the wider system it operates in will be fact-specific, but there are three broad approaches that can be used in conjunction:

Code review: In this approach, a team of experienced code reviewers will read the algorithm’s code line-by-line to understand how it is functioning and whether this aligns with the intended behavior of the algorithm. Of particular interest will be how the code handles unexpected inputs or scenarios ("error-handling"). This is a rigorous approach, but it is limited by its somewhat theoretical basis and is potentially unable to anticipate all of the real-world input data scenarios that the algorithm needs to accommodate.
Black box testing: In this approach, the code is treated as a black box and instead the algorithm is assessed by feeding in different scenarios of input data and analyzing the outputs it produces, to see whether the algorithm behaves as expected. This approach may test inputs that are unusual ("edge cases"), ones that do not conform to the standard the algorithm expects, or ones that are known to have been problematic in the past. The advantage of this approach is that it is very practical: how the algorithm behaves in close-to-real-life scenarios can be quickly tested. The limitation of this approach is that a review of parts of the code will likely be required to understand the root cause of the issues that are identified.
Challenger model: For very complex algorithms, it is sometimes preferable to design and build an independent algorithm that allows a subset of the algorithm’s functionality to be tested. This challenger model provides a comparable baseline to the behavior of the actual algorithm. Having this baseline comparison helps identify and diagnose the root cause of issues more rapidly, albeit building the challenger model comes with an associated overhead.

Materiality

Once any issues have been identified it is often important to establish their materiality. It may be possible that you have identified an "edge-case" bug: where a certain set of inputs could cause the algorithm to function incorrectly. But if these inputs never actually occurred there’s less cause for concern than if this was a regular occurrence.

To assess materiality, it is often necessary to undertake a lookback exercise. In this exercise, historical input data is fed through a corrected version of the algorithm. The resulting outputs are then compared to the original outputs that were produced. This approach requires both historical input and output data. When this is not available, it may be possible to create synthetic inputs which can be run through the original and corrected version of the algorithms. While lookbacks can be time-consuming exercises, they have the benefit of being able to quantify the actual impacts of any issues that were identified. When a regulator is involved, this is especially advantageous, as the materiality of the issues can factor into the regulator’s resulting sanctions and/ or censure.

Permanent fixes and root causes

As we referenced in the introduction, as the investigation proceeds, any identified issues should be escalated so that temporary fixes can be implemented. These temporary fixes are often put in place quickly and often include inefficient manual checks and controls.

Once the full list of issues is understood, it is advisable to mature these temporary fixes into more permanent ones that will improve the efficiency of the process and make them more robust in the longer-term. Equally, obtaining an understanding of why the issues arose helps to inform changes in how the system should be managed and governed. It may also hold wider lessons for the organization.

In our final article in this series, we will explore how we might go about investigating AI decision-making algorithms, contrasting this with how we’ve outlined investigating conventional algorithms.

Read the previous article in this series: