Skip to main content

Enterprise AI Analysis: The Probabilities Also Matter

Paper: The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models

Authors: Noah Y. Siegel, Oana-Maria Camburu, Nicolas Heess, Maria Perez-Ortiz (Google DeepMind, University College London)

Executive Summary

This research paper addresses a critical gap in evaluating Large Language Models (LLMs): how can we be sure an AI's explanation for its decision is actually truthful? The authors argue that existing methods for measuring "faithfulness" are flawed. These older metrics often just check if an explanation mentions a word that caused the model's final answer to flip, a binary and easily-gamed approach. For instance, an AI could simply repeat its entire input as an "explanation" and achieve a perfect score, without providing any real insight. To solve this, the paper introduces the Correlational Counterfactual Test (CCT), a more sophisticated metric. CCT doesn't just look at whether the final decision changed; it measures the *degree* of change in the model's underlying probabilities. It then checks if there's a strong correlation between the words that cause the biggest probabilistic shifts and the words that actually appear in the explanation. For enterprises, this is a monumental step forward. It provides a quantitative method to move beyond plausible-sounding rationalizations to statistically-backed, faithful explanations, which is essential for risk management, regulatory compliance (e.g., GDPR's "right to explanation"), and building genuine trust in mission-critical AI systems.

The High-Stakes Problem with "Plausible" AI Explanations

In the enterprise world, an AI's decision is often just the beginning of the story. Whether it's a loan application denial, a medical diagnosis suggestion, or a supply chain forecast, stakeholders need to know *why*. LLMs are adept at generating human-like explanations that sound convincing. But what if these explanations are just post-hoc rationalizations? What if the AI denies a loan citing "insufficient credit history" when its decision was actually influenced by a spurious correlation with the applicant's zip code?

This is not a hypothetical risk. It's a core challenge for deploying AI in regulated industries. Relying on explanations that are merely plausible, not faithful, exposes a business to significant risks:

  • Compliance Risk: Regulations like GDPR and AI ethics frameworks demand transparent and explainable decision-making. An unfaithful explanation is a compliance failure waiting to happen.
  • Operational Risk: If you don't know the true drivers of your AI's decisions, you can't debug it, improve it, or predict its behavior in new scenarios.
  • Reputational Risk: A single instance of an AI making a biased decision, covered up by a plausible but false explanation, can erode customer trust and cause lasting brand damage.

The research by Siegel et al. provides a crucial tool to mitigate these risks by moving beyond superficial checks of faithfulness.

Deconstructing Faithfulness: From Binary Checks to Probabilistic Correlation

The paper's main contribution is a new way to measure faithfulness. To understand its value, we first need to look at the limitations of the previous approach.

Key Research Findings: Rebuilt for Business Intelligence

The authors tested their CCT metric on the Llama2 family of models across several NLP tasks. Their findings provide actionable intelligence for any enterprise building or deploying LLMs.

Finding 1: CCT Uncovers "Hidden" Unfaithfulness

The most striking result is how CCT reveals flaws that the old Counterfactual Test (CT) misses. The paper found that on certain datasets (like ECQA), models produced very verbose explanations that mentioned nearly every inserted word. This led to a very low (i.e., good) unfaithfulness score on the old CT metric. However, CCT told a different story. The CCT score was very low (i.e., bad), showing there was almost no correlation between a word's impact and its inclusion in the explanation. The model was just "parroting" input, not genuinely explaining.

Interactive Chart: CCT vs. CT Unfaithfulness (Llama2 70B PE)

This chart compares the faithfulness scores for the Llama2 70B model using both the new CCT metric (higher is better) and the old CT Unfaithfulness metric (lower is better). Notice how ECQA looks good on CT but poor on CCT, revealing its verbose, uninformative explanations.

Finding 2: Model Scale Correlates with Faithfulness

As a general trend, the research showed that larger models tend to produce more faithful explanations. The Llama2 70B model consistently achieved higher CCT scores than its 7B and 13B counterparts, particularly on more complex tasks. This suggests that for high-stakes applications requiring trustworthy explanations, investing in larger, more capable models can yield a direct return in terms of model reliability and transparency.

Interactive Chart: Faithfulness (CCT Score) by Model Size

This chart visualizes the CCT faithfulness scores across different Llama2 models for the e-SNLI dataset. A clear trend emerges: as model size increases, so does the faithfulness of its explanations.

Enterprise Applications & Strategic Value of CCT

The CCT framework isn't just an academic exercise; it's a practical tool for enterprise AI governance. Heres how it can be applied in different sectors:

ROI and Business Impact Calculator

While the direct ROI of faithfulness can be hard to quantify, we can estimate the value derived from reduced risk, lower manual audit costs, and increased operational efficiency. Use this calculator to model the potential impact for your organization.

Your Custom Implementation Roadmap with OwnYourAI.com

Adopting advanced faithfulness metrics like CCT requires a structured approach. At OwnYourAI.com, we guide enterprises through a proven implementation roadmap to build truly trustworthy AI systems.

Interactive Knowledge Check

Test your understanding of these advanced faithfulness concepts. A strong grasp of these ideas is the first step toward building more reliable enterprise AI.

Conclusion: Moving to a New Standard of AI Trust

The research paper "The Probabilities Also Matter" marks a pivotal moment in the field of explainable AI. By introducing the Correlational Counterfactual Test (CCT), the authors have provided a path away from superficial, gameable faithfulness metrics toward a robust, quantitative, and more meaningful standard. For enterprises, this is the key to unlocking the full potential of LLMs in mission-critical roles.

True faithfulness isn't just an ethical nice-to-have; it's a cornerstone of responsible AI deployment, a requirement for regulatory compliance, and a driver of long-term business value. By ensuring that an AI's explanations accurately reflect its internal reasoning, businesses can build systems that are not only powerful but also predictable, auditable, and worthy of trust.

Ready to elevate your AI governance strategy? Let's discuss how we can implement a custom CCT-based validation pipeline for your AI solutions.

Book Your Strategic AI Faithfulness Session

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking