Investigating decision-making algorithms, part 3: AI, machine learning and LLMs

Having covered the points of failure of, and investigative methods used on, conventional computer algorithms, we now turn our attention to AI models. In this article, we will offer a brief introduction to AI and the various ways to define an AI taxonomy, and describe the unique issues AI presents to investigators.

While some definitions of AI algorithms (like the one used in the EU AI Act) can be taken to include conventional algorithms, we believe it is more helpful to distinguish between conventional algorithms and AI on the basis of whether the algorithm’s logic is entirely dictated by humans or not. In this distinction, an AI program is any program where the computer itself has discovered some of the logic to be used.

We could write an entire essay on the definition of AI. In the interest of brevity, we will adopt a somewhat simplified approach.

AI taxonomy

Under the banner of AI, many further taxonomies can be defined. For example, a traditional way to divide AI models is to distinguish the way in which they learn or discover associations from data.

The three main paradigms of traditional “machine learning” are:

Supervised learning: This is where the model is given a sizeable dataset of labeled examples (for example, pictures of dogs and cats with accompanying text stating whether they depict either a dog or a cat). The model is “trained” on these examples and subsequently tasked with applying its learned skills to new, previously unseen samples. A classic example would be in handwriting recognition, where the algorithm is provided with labeled examples of handwritten alphabetical or numerical characters and learns to recognize them.
Unsupervised learning: Here, a model is provided a large dataset of unlabeled examples and is told to classify (group) these into different categories, or to identify associations between them. The patterns thereby discovered can then be used to create associations that may not be obvious to a human interrogating the same dataset. An example would be in customer segmentation, where unsupervised learning can be used to group similar customers based on their characteristics and purchasing habits.
Reinforcement learning: This is where an AI agent is inserted into an environment and instructed to make its own decisions, with the aim of achieving a predefined goal. The outcome of these decisions is evaluated and fed back to the agent as a reward or punishment, prompting it to adapt its decision-making in pursuit of the goal. After enough “practice”, the agent is competent enough to handle future decisions. One fairly recent example of a successful reinforcement learning project is AlphaGo, the first computer program to beat humans at the boardgame Go. This animation provides a delightful illustration of reinforcement learning.

Ultimately, the objective of machine learning is either to make predictions about an output value, given an input value, or to create associations between input values so these can be classified into groups.

An alternative taxonomy of AI sees different approaches as a series of concentric circles (see figure 1):

Generic AI, which is the broadest category and can potentially include conventional algorithms as well.
Machine learning, a subcategory of AI in which the computer determines some or all of the logic to be used. This is what many people, including the authors, mean when talking about AI.
Deep learning, a subcategory of machine learning which uses neural networks to perform its learning. Most of the current popular AI systems are deep learning models.
Generative AI (or GenAI), a subcategory of deep learning tasked specifically with generating seemingly original content. GenAI models DALL-E, Stable Diffusion and especially Large Language Models (LLMs) [1] such as ChatGPT, Copilot, LLaMA and Gemini have sparked the latest wave of public interest in AI.

Figure 1: An alternative taxonomy of AI

In this article, we take a generalist approach, as most investigatory issues with AI are common to many sections of these taxonomies, if not all.

Training data and parameters

As the above taxonomies show, a common feature of many AI models is that they are “trained” on a large amount of example data before being put into use. It is therefore helpful to give a rough description of how this training works in practice. For the purpose of this article, we will assume that training is being done on a “labeled” dataset – where, for each sample, the output (whether that’s the next word in a sentence or the appropriate label for an image) is known. [2]

A deep neural network contains an enormous number of parameters. For example, GPT-4, OpenAI’s latest LLM at the time of writing, is rumored to have almost 1.8 trillion parameters. During the training of these models, they are “fed” the labeled examples and then auto-adjust their parameters (using a technique called backwards propagation) in order to provide outputs that most closely “fit” the training data. In other words, the parameters are iteratively refined such that the model minimizes prediction error when interpreting the training data.

Investigating AI algorithms

Recalling our previous article on investigating algorithms, we will discuss both the additional points of failure introduced by these AI algorithms, and the specific issues they pose to investigatory methods.

All points of failure we covered when examining conventional algorithms apply equally to AI algorithms. Governance frameworks that are appropriate for conventional algorithms are the bare minimum for AI governance. The training data used to “teach” an AI introduces a new point of failure. Issues can arise if this training data is not representative or appropriate to the model’s real-world use cases, if that data is inaccurate, or if it is biased. The data used to train a model should thus be subject to the same treatment as the input and reference data we discussed in part 1, including the forensic preservation of the data history, governance, testing and investigation protocols.

As an illustration of the complexities that arise from selecting appropriate training data, we note that laws and regulations around AI tend to emphasize the fairness of the AI model’s outcomes. However, the real world from which the model’s training data was drawn might reflect an existing unfairness, either currently or historically. By ingesting real-world training data without correcting for any inherent biases, training can embed this ingrained unfairness into the algorithm.

It is interesting to note that some of the debates about AI algorithms seem less to do with their design or the training data; rather they hinge on fundamental philosophical and ethical questions about pre-existing societal biases and the “fair” outcomes we, as a society, want to achieve.

And when it comes to investigations, AI poses another problem. The nature of AI algorithms means there are several reasons why these models can typically only be investigated as a “black box”:

The sheer number of parameters in all but the smallest AI systems make it impossible to truly analyze the inner workings of the system.
The way these parameters are set is not intended to be “human-readable”. They depend on detected features in the training data, but cannot be guaranteed to correspond to known features that people can relate to.
Some AI models (notably GenAI models such as the GPT family of large language models) function with a degree of stochasticity, meaning that their output is influenced by a certain amount of intentionally introduced randomness. Here's a recent example of an LLM giving two different responses to the exact same question or prompt, when asked twice:

Prompt: “Provide a definition of ‘Artificial Intelligence’ understandable to a high school student.”

LLM response 1:"Sure! Artificial intelligence, or AI, is when computers or machines are designed to think and learn like humans. They can do tasks that usually need human intelligence, such as recognizing speech, making decisions, or translating languages."

LLM response 2: “Certainly! Here's a simple definition of Artificial Intelligence (AI) that a high school student can understand: Artificial Intelligence refers to systems or machines that, like humans, use intelligence to perform tasks. These tasks can include things like understanding language, solving problems, and learning from experience. Imagine a computer than can learn and get better at something over time, just like how you learn and improve by practicing. That's what AI is all about!”

This lack of “reproducibility” can introduce challenges in assessing why results were produced, and mask the true root causes of any issues.

These factors all mean that a direct review of an AI algorithm’s inner workings is often impractical at best. However, the rest of our toolkit remains useful. Notably, any part of the investigation that does not directly target the inner workings of the model (e.g., the model’s governance context, the quality and accuracy of the data provided to the model, or the materiality of the model’s malfunctioning) can be approached as we would with conventional algorithms. It is also still possible to perform black-box testing or develop challenger models. Our previous article describes these approaches in more detail.

As investigators of AI algorithms, we may think of ourselves as medical professionals: sometimes internists, sometimes psychoanalysts. We are trying to assess the health of a patient and identify pathologies, when needed, to determine root causes of maladies that present themselves. It is important to recognize that the pathways for these pathologies are obscured by the vastness of the internal systems and the myriad environmental conditions. Regardless, a diagnosis needs to be determined. Since this is a new paradigm, a heuristic approach is appropriate while more deliberate processes are formed.

While this article series has focused on our approach to investigating algorithms (whether AI or conventional), we would be remiss not to briefly mention AI governance – good diet, exercise and regular health check-ups, to extend the medical metaphor. AI algorithms present several challenges to conventional governance and risk management frameworks. There are fascinating developments in this space, and we hope to cover them in depth in a future article.

This concludes our series of articles on investigating algorithms. We hope you have found it informative and perhaps even enjoyable to read. Like you, we will continue to watch the fast-evolving world of AI with awe, while wondering what’s happening inside this “black box” – and whether the new AI algorithms are doing what they are meant to.

[1] Large Language Models are effectively predictive AI models. Given the first part of a sentence or text, they are designed to predict the most likely continuation of this text. To generate longer texts, the initial predictions are fed back into the model as the new ‘first part’ of the text, letting the predictive cycle continue.

[2] This is the case for the training of Large Language Models, since they are trained on pre-existing texts where, for each part of the text, the next part is known.

Read our previous articles in this series: