Trustworthy AI in Financial Services — Insights

Why 40% of AI projects will be cancelled

Gartner predicts that by 2028, more than 40% of agentic AI projects will be abandoned after the pilot phase. This is not a technology failure — most of the technology works. It is a governance and strategy failure: organisations deploying AI without a clear vision of what success looks like, without the leadership to see it through, and without the accountability structures to manage what happens when it goes wrong.

The regulated financial services sector faces a version of this problem that is more acute than most. AI systems that make or influence decisions about policyholders, customers or counterparties carry regulatory risk, conduct risk and model risk — all of which require specific governance controls that are absent from most AI deployments. The senior managers regime, the Consumer Duty, the PRA's model risk management requirements and the emerging AI regulatory frameworks all converge on the same point: human accountability is non-negotiable, and "the algorithm decided" is not a defence.

What AI still cannot do

I have spent significant time over the past several years testing the limits of AI systems in professional contexts — including a structured experiment using chess as a controlled environment for assessing AI strategic reasoning.

The chess experiment was instructive. Current AI chess engines are extraordinarily strong at tactical calculation — they are nearly unbeatable by any human. But when you constrain the engine to explain its reasoning, the explanations reveal a fundamental limitation: the system is optimising a learned objective function, not reasoning about the position the way a human strategist does. It cannot surprise itself. It cannot recognise that the situation is genuinely novel and that its training may not apply. It does not know what it does not know.

This matters enormously in financial services, where novel situations — a market event with no historical analogue, a regulatory change that reframes the decision context, a counterparty behaving in an unexpected way — are precisely the situations where sound judgment is most needed and where AI systems are most likely to fail quietly.

True intelligence acknowledges what it does not know. Current AI does not. This is not a flaw to be engineered away — it is a structural characteristic to be governed around.

The three failure modes

Most AI project failures in financial services trace back to one of three root causes.

No clear vision. The organisation is deploying AI because it feels like the right thing to do, or because a competitor appears to be doing it. There is no specific problem being solved, no measurable definition of success, and no honest assessment of what "good" looks like. Without this, the project drifts — and eventually someone senior realises it has been drifting for two years at significant cost.

No grounded ambition. The organisation has a vision but it is disconnected from what AI systems can currently do reliably. The most common version of this is deploying a large language model to make consequential decisions — credit, claims, underwriting — without appreciating that these systems hallucinate, that their confidence is not calibrated to their accuracy, and that their performance degrades in distribution shift conditions that are common in financial services.

No human accountability. The organisation has deployed an AI system without designating a human accountable for its outputs. This is both a governance failure and, in many regulated firms, a regulatory breach. The senior manager responsible for a model risk function cannot delegate accountability to a model. The decision remains human, even if the recommendation is machine-generated.

The five pillars for success

Against these three failure modes, the following five disciplines distinguish AI programmes that succeed — and that remain sustainable when a regulator, an auditor or a serious incident tests them.

Pillar 1: Vision. A clear, specific articulation of the problem being solved, the business outcome sought, and the definition of success. Not "harness AI" — but "reduce claims processing time by 40% while maintaining error rates below X, with a named individual accountable for both outcomes."

Pillar 2: Grounded ambition. An honest assessment of what AI can currently do reliably, what it cannot, and how the gap between the two will be managed. This requires someone on the programme who has actually tested AI systems under realistic conditions — not just read the vendor's benchmark results.

Pillar 3: Leadership. A senior sponsor with genuine accountability for outcomes, the authority to stop a project that is not working, and the willingness to do so. Agentic AI programmes that succeed have this. Ones that fail typically have a sponsor who treats the role as a credential rather than a responsibility.

Pillar 4: Customer and policyholder focus. Explicit analysis of how the AI system affects the people whose interests the regulated firm is obliged to protect. Under the Consumer Duty, firms must be able to demonstrate that their use of AI delivers good outcomes for customers — and must be able to identify and remediate cases where it does not.

Pillar 5: Transparency and explainability. The capacity to explain, to a regulator, to a customer, to a court, or to a senior manager conducting a conduct review, why the AI system produced the output it did. This does not require a technically exhaustive explanation of the model architecture. It requires a governance-level understanding of the inputs, the objective function, the known limitations, and the human review process applied to the output.

A word on AI-on-AI risk

A risk receiving insufficient attention is the interaction of AI systems with each other — both within a firm's own infrastructure and across the broader ecosystem. AI-generated phishing attacks, AI-assisted fraud, and adversarial inputs designed to manipulate AI decision systems are all active threats. Firms deploying AI in their customer-facing or internal processes need to model not just how the AI performs in normal conditions, but how it behaves when it is being attacked by another AI system.

This is the frontier where my own research is currently focused. The combination of agentic AI with the quantum security transition creates a convergence risk that most board risk registers have not yet acknowledged.

The board's role

Boards in regulated financial services firms are not expected to understand the technical details of AI systems they oversee. They are expected to ensure that the governance framework is adequate, that human accountability is clearly assigned, that the model risk management framework covers AI models appropriately, and that the organisation is honest about the limitations of the systems it deploys.

The questions a board should be asking: Who is the accountable senior manager for each material AI system we operate? What is our model risk management framework for AI, and does it distinguish between different risk levels appropriately? How do we identify and respond to cases where an AI system produces a poor outcome for a customer? And what is our plan if a material AI system has to be turned off?