What is explainable AI?

Geschrieben von Dr. Christian Debes | 26.05.2026 09:01:05

What is explainable AI? The questions every enterprise should be asking

A common question from clients goes something like this: our model gives the right answer most of the time, so why do we need to explain how it got there? The answer depends on what the model is for. When a prediction feeds a decision that a regulator, a customer, or a colleague can question, being right most of the time is not enough on its own. You also need to be able to show why.

That is what explainable AI is about. Working with clients across healthcare, insurance, legal, and finance, we have found that the questions organisations ask about XAI tend to cluster around the same themes. This post addresses the most important ones.

What is explainable AI (XAI)?

Explainable AI refers to the set of methods, techniques, and design principles that make the outputs of AI systems understandable to humans. The aim is not only technical accuracy but interpretability - outputs that people can evaluate, challenge, and act on with confidence.

XAI tries to answer questions a black-box model cannot.

Why did the model make this decision?
What would have needed to be different to get a different result?
Which inputs actually drove the output?
When should I trust what it tells me?

Most AI deployed in enterprise settings today, particularly large language models and complex predictive systems, does not answer these questions by default. It produces outputs. XAI is what lets you move from a model that works to a model you understand.

What is the difference between explainability and interpretability?

These terms are often used interchangeably, but the distinction is worth making. Interpretability refers to the degree to which a human can understand the internal mechanisms of a model - how it is structured and what it is doing computationally. Explainability is broader. It describes the ability to communicate the behaviour and outputs of a model in human-understandable terms, whether or not the internal mechanics are fully visible.

A simple decision tree is interpretable - you can trace every rule. A large language model is not inherently interpretable, but XAI techniques can still make it explainable by surfacing which inputs mattered, what reasoning steps were followed, or what evidence the model drew upon. For enterprise AI, explainability is usually the more practical goal. You do not need to understand every parameter in a billion-parameter model. You need to be able to justify its decisions to a regulator, a customer, or a colleague.

What is a black-box AI model, and why is it a problem?

A black-box model is one where the relationship between inputs and outputs is opaque. The model produces a result - a score, a classification, a piece of generated text - but offers no window into how it got there.

This causes problems in several places. When something goes wrong, you have no way to identify the cause. When a model produces a biased or incorrect output, you cannot pinpoint why. And when a stakeholder asks you to justify an AI-driven decision such as a claims outcome, a credit denial, or a medical recommendation, you have nothing to show them. High-performing black-box models are common, but in regulated industries and high-stakes use cases, performance alone is not sufficient. The question "did the model work?" has to be accompanied by "can we explain how?"

What are the main methods of explainable AI?

There is no single best approach to XAI. The right method depends on the model architecture, the use case, and the audience for the explanation. A few techniques are central to most enterprise applications.

Post-hoc attribution methods such as SHAP and LIME examine a trained model and calculate which input features most influenced a particular output. They do not change how the model works; they provide a retrospective account of its decisions. They are well established and widely used, particularly in predictive modelling for finance and insurance.

It is worth saying that simpler methods are often underrated. Feature importance and partial dependence plots are older and less fashionable than SHAP, but for a wide class of predictive problems they do most of the explainability work, and they have one advantage that matters in practice: a non-specialist can read them. The newest method is not always the right one.

Integrated gradients is a technique suited to deep learning and LLM-based systems. It works backwards through a model's computation to assign a contribution score to each input token or feature, producing a map of which parts of the input drove the output. Its attributions are grounded in the model's actual internal computation rather than approximations of it, which gives it strong faithfulness properties.

Chain-of-thought and natural language explanations ask the model to articulate its reasoning in plain language before arriving at an answer. This can improve accuracy and give users a readable account of the model's logic. It has an important limitation, though: research has shown that the reasoning a model produces in natural language does not always reflect the actual computational process that generated the output. These explanations can be plausible without being faithful.

Mechanistic interpretability is an emerging field that examines the internal components of a model - its neurons, attention heads, and the circuits they form - to understand how information flows through the system. It offers the deepest and most causally grounded form of explanation, but it is also the most technically demanding, and translating its findings into something a non-expert can act on remains an open challenge.

What is the difference between post-hoc and intrinsic explainability?

Intrinsic explainability refers to models that are transparent by design, where the structure itself is readable. Linear models, decision trees, and rule-based systems fall into this category. You do not need additional tools to explain them; the model is the explanation.

Post-hoc explainability applies methods after the fact to models that are not inherently interpretable. Most enterprise LLM deployments require post-hoc techniques, since the models that produce the best results are not interpretable by design. The practical implication: if your use case is high-stakes enough that explainability is non-negotiable, you have two choices. Constrain yourself to intrinsically interpretable models and accept potential performance trade-offs, or invest in rigorous post-hoc methods and evaluate them carefully for both plausibility and faithfulness.

Why does explainable AI matter for business?

The business case for XAI has three parts.

The first is regulatory compliance. In regulated sectors, AI-driven decisions must often be auditable. GDPR gives individuals the right to meaningful information about automated decisions that affect them. The EU AI Act requires high-risk AI systems to support human oversight and interpretability. In financial services and healthcare, sector regulations add further requirements. An AI system that cannot explain itself is, in many jurisdictions, a liability.
The second is trust and adoption, which in our experience is often the more immediate constraint. Even technically capable AI systems fail to deliver value when the people using them do not trust them. A doctor will not act on a diagnosis they cannot interrogate. A claims adjuster will not approve a recommendation they cannot explain to their manager. XAI closes the gap between what a model can do and what users will actually do with it.
The third is debugging and model improvement, which includes catching data leakage. Black-box failures are expensive. When a model starts producing unexpected outputs, or when it performs suspiciously well in a way that suggests it has learned something it should not have access to, explainability methods are what let you identify the cause. Without them, you are guessing.

How does XAI help with AI bias and fairness?

Bias in AI models tends to be invisible until it is not. A model trained on historical data will often encode historical patterns, including patterns of discrimination, without anyone intending it to. Without explainability, that bias can persist undetected across thousands of decisions.

XAI methods surface the features and data patterns driving a model's outputs, making it possible to detect when a model is relying on proxies for protected characteristics, or performing systematically differently across demographic groups. This does not resolve bias automatically - that requires deliberate intervention in data, model design, and deployment - but it makes bias visible, which is the prerequisite for addressing it. For organisations in regulated sectors, demonstrating fairness is increasingly not optional, and explainability is how you produce that demonstration.

What do GDPR and the EU AI Act require regarding AI explainability?

GDPR establishes that when individuals are subject to decisions made through automated processing, decisions with legal or similarly significant effects, they are entitled to an explanation of the logic involved and the significance and likely consequences of that processing.

The EU AI Act goes further. For high-risk AI applications, which include systems used in healthcare, employment, credit, law enforcement, and critical infrastructure, operators are required to implement human oversight mechanisms and ensure that outputs are interpretable by the people responsible for them. The Act also mandates documentation, transparency reports, and ongoing monitoring. For enterprises deploying AI in these sectors, this is not a future consideration. The regulatory environment is already here, and building explainability into AI systems from the start is substantially easier than retrofitting it after deployment.

Does explainable AI work for large language models and generative AI?

This is one of the most important open questions in the field, and one we engage with directly in our own research.

Classical XAI methods were developed for relatively constrained predictive models - a credit scoring system, an image classifier. Applying them to large language models introduces new challenges. LLMs do not simply classify inputs; they retrieve context, reason across multiple steps, generate open-ended outputs, and increasingly take autonomous actions within multi-step pipelines. The target of explanation has shifted from a single model output to the behaviour of an entire system.

Attribution methods like SHAP and LIME can still be applied, but their outputs are often correlational summaries rather than faithful accounts of what the model actually did. Integrated gradients provide stronger faithfulness guarantees. Mechanistic interpretability offers the deepest insights, at significant technical cost. The honest answer is that XAI for generative AI is an active research area, not a solved problem. The methods we have are useful and improving quickly, but enterprises should be clear about their limitations and cautious about treating any single technique as a complete solution.

What are the limitations of explainable AI?

XAI is sometimes presented as a layer you add on top of an AI system. In practice it is more complicated than that. The most powerful AI models are often the least interpretable by nature, and there is a genuine tension between performance and explainability - the model that gives you the best predictions may also be the hardest to explain.

Explanations themselves can also mislead. A chain-of-thought reasoning trace can look compelling and still be unfaithful to the model's actual computation. Feature attribution scores tell you which inputs were correlated with an output, not necessarily what caused it. An explanation that is easy to understand but not grounded in what the model actually did can be worse than no explanation at all, because it creates false confidence.

This is why we evaluate XAI methods against two criteria. Plausibility: does the explanation correspond to what is actually true about the inputs and the world? And faithfulness: does the explanation correspond to what the model actually did, not just what it could plausibly have done? Methods that satisfy both are rare and valuable. Methods that satisfy neither should be treated with caution regardless of how readable they are.

Is there such a thing as too much explainability?

It is a less commonly asked question, but a genuine one. Some research suggests that mandating full transparency in certain competitive or high-frequency decision contexts can have unintended consequences, making systems more gameable or creating information asymmetries that benefit sophisticated actors at the expense of others.

More practically, the right level of explanation depends on the audience and the decision at hand. A regulatory auditor needs a different kind of explanation than an end user receiving a personalised recommendation. A model serving a trained clinician does not need to explain basic medical concepts; it needs to surface the specific evidence driving a particular conclusion in a particular case. The question to ask is not how much explanation can we produce, but what does this person need to understand in order to act responsibly on this output.

Where to start

If you are working on enterprise AI and explainability is not yet part of your evaluation framework, it is worth treating it as a requirement alongside performance rather than an afterthought. Define who needs to understand your model's outputs and what they need to understand. Choose XAI methods appropriate to your model architecture and evaluate them for plausibility and faithfulness, not just readability. And be realistic about the current state of the art, particularly for generative AI, where the field is still developing.

For a concrete example of what this looks like in practice, we have written a companion piece on how we built an explainable disease prediction model that went into production. It walks through the methodology step by step, from feature design with domain experts to the staged use of explainability methods at the end.

We are actively investing in LLM explainability research at Spryfox, and we share what we are learning as we go. If you are navigating these questions in your own organisation, we are happy to talk.

Frequently asked questions

What does XAI stand for?

XAI stands for explainable artificial intelligence. It refers to methods and techniques that make AI model outputs understandable and interpretable to humans.

What is the difference between XAI and traditional AI?

Traditional AI systems are built primarily to optimise for predictive performance. XAI systems are designed to also communicate how they arrive at their outputs, making them auditable, easier to trust, and easier to correct when they go wrong.

Is explainable AI required by law?

In many jurisdictions and sectors, yes. GDPR, the EU AI Act, and sector-specific regulations in finance, healthcare, and insurance all create legal or regulatory obligations around AI transparency and explainability.

What are the most widely used XAI methods?

SHAP and LIME are the most widely deployed methods for predictive models. Feature importance and partial dependence plots are simpler and often sufficient. Integrated gradients is commonly used for deep learning and language models. Chain-of-thought prompting provides natural language explanations. Mechanistic interpretability is an emerging frontier for deep analysis of model internals.

Can you apply XAI to ChatGPT or Claude?

Partially. Large language models can produce chain-of-thought reasoning traces, and some attribution methods can be applied at the token level. Fully faithful explanations of the internal computations of frontier LLMs remain technically challenging and are an active area of research.

How do I know if an XAI explanation is trustworthy?

Evaluate it against two criteria: plausibility (is it factually consistent with the inputs and context?) and faithfulness (does it accurately reflect the model's actual computation, not just a plausible-sounding account of it?). An explanation that is easy to read but not grounded in what the model actually did can create false confidence.

Read part 2 here: How to build explainable AI: A guide with examples from real AI projects

Vollständigen Beitrag anzeigen