Skip to main content
Enterprise AI Analysis: ChatGPT's Astonishing Fabrications About Percy Ludgate

Artificial Intelligence Analysis

ChatGPT's Astonishing Fabrications About Percy Ludgate

This analysis delves into the severe hallucination problem encountered when using Large Language Models (LLMs) like ChatGPT for historical research, specifically focusing on the little-known computer pioneer Percy Ludgate. Initial experiments with ChatGPT 3.5 revealed that nearly half of its generated content was factually incorrect, despite its authoritative tone. Subsequent testing with a more recent model, Claude 3, showed a significant reduction in fabrications but highlighted that fundamental issues persist, particularly when information is scarce. The findings underscore the critical need for human verification and caution against relying on LLMs as primary historical sources.

Key Metrics & Impact

Quantifying the challenge: A look at the fabrication rates across different LLM generations.

0 ChatGPT 3.5 Fabrications
0 Claude 3 Fabrications
0 Total Words Analyzed (Claude 3)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Large Language Models (LLMs) are celebrated for their linguistic fluency, but their propensity for 'hallucinations'—generating plausible but false information—remains a critical challenge. This problem is particularly acute in domains where precise factual recall is paramount, such as historical research. While LLMs can synthesize vast amounts of text, their underlying mechanism often prioritizes coherence and pattern matching over absolute truth, leading to convincing but ultimately misleading outputs. As Ted Chiang eloquently put it, an LLM is like a 'blurry jpeg of all the text on the Web,' where an exact sequence of bits (facts) cannot be guaranteed.

In late 2022, an experiment with ChatGPT 3.5 focused on Percy Ludgate, a lesser-known computer pioneer. The LLM was queried on facts already known to the researchers. ChatGPT generated authoritative-sounding but highly inaccurate answers, inventing biographical details, project names, and even citing non-existent newspaper articles. Astonishingly, 48% of the 2,086 words generated were found to be fabrications. Attempts to 'coach' the model to correct answers were largely unsuccessful, demonstrating the depth of the hallucination problem and ruling out its use as a reliable historical source without rigorous external verification.

A follow-up experiment in July 2024 by Walter Tichy replicated the initial queries using Claude 3, a more recent LLM. The results showed a marked improvement: only 7% of the 3,107 words generated were fabrications. Claude 3 correctly identified Ludgate's profession as an accountant, his birth/death dates, and key aspects of his analytical engine design. However, subtle inaccuracies persisted, such as mischaracterizing the 'index wheel' or the exact timeline of his accounting work, highlighting that while the hallucination rate decreased, the need for careful scrutiny remains, especially with nuanced historical details.

The comparison between ChatGPT 3.5 and Claude 3 indicates that newer LLMs are becoming more factually grounded, especially when drawing from established knowledge bases. However, the core challenge of hallucination remains a fundamental problem, particularly when dealing with scarce or ambiguous historical data. Solutions may involve advanced grounding mechanisms, continuous human feedback, and a deeper understanding of the trade-off between linguistic plausibility and factual accuracy. The experiment with Google's NotebookLM, using curated documents, demonstrated that providing LLMs with complete, trusted information drastically improves accuracy, suggesting a future where LLMs act as sophisticated indexing and summarization tools rather than unverified knowledge producers.

41% Reduction in Fabrication Rate (ChatGPT 3.5 to Claude 3)

The Claude 3 model showed a 41 percentage point reduction in fabricated words compared to ChatGPT 3.5 for the same historical queries.

Feature ChatGPT 3.5 (Initial Test) Claude 3 (Later Test)
Overall Fabrication Rate 48% of words fake 7% of words fake
Invented Biographical Details Frequent, including false university attendance, civil engineering career details, and death age. Mostly accurate, but initially debated accounting qualification and mischaracterized 'index wheel.'
Invented Publications/Sources Cited numerous non-existent Irish Times articles and letters. Explicitly stated it does not have access to specific primary sources like newspaper articles.
Machine Details Accuracy Called it 'Analytical Engine No. 2,' invented 'store wheels' and 'error adjusting mechanisms.' Correctly identified 'store' and capacity; inaccurate on 'index wheel' being a 'wheel with 20 rings' and 'pegs' for storage.
Correction Responsiveness Largely resistant to correction, repeated fabrications. Corrected itself on Ludgate's accounting profession when prompted.
Trustworthiness for Scarce Data Extremely unreliable, prone to egregious fabrications. Improved, but still requires absolute independent checking; prone to subtle inaccuracies when information is limited.

Enterprise Process Flow

LLM Ingests Web Data (Compressed)
Identifies Textual Patterns
Generates Grammatically Plausible Response
Fills Gaps with 'Best Guess' (Hallucination)
Presents as Factual Information
Requires External Human Verification

The Percy Ludgate Paradox: Scarcity Fuels Fabrication

The case of Percy Ludgate serves as a stark illustration of LLM limitations. Ludgate, a genuine but lesser-known computer pioneer, presents a sparse digital footprint, making him a challenging subject for ungrounded AI. Initial queries to ChatGPT 3.5 resulted in a detailed, yet almost entirely fabricated, biography. Even direct corrections were dismissed or re-contextualized into new fictions. While Claude 3 significantly reduced the error rate, it still produced nuanced inaccuracies, such as misrepresenting his machine's 'index wheel' or the precise timeline of his career. This demonstrates that for topics with scarce or fragmented online information, LLMs struggle to differentiate between plausible inference and established fact, making human historical research indispensable.

The critical takeaway is that when an LLM operates outside of a rich, verifiable knowledge base, it prioritizes linguistic coherence, creating 'facts' that sound convincing but lack any basis in reality. This phenomenon, where the 'blurry jpeg' of the web is reconstructed into a sharp but fictional image, necessitates a rigorous, human-led verification process for any output intended for historical or critical applications.

Estimate Your AI Implementation ROI

Understand the potential time and cost savings by strategically integrating AI into your enterprise workflows, using industry-specific efficiency gains.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Enterprise AI Implementation Roadmap

A phased approach to integrate AI responsibly and effectively into your organization.

Phase 1: Discovery & Strategy

Assess current workflows, identify AI opportunities, and define strategic goals. This includes data readiness assessment and ethical considerations.

Phase 2: Pilot & Proof-of-Concept

Develop and deploy a small-scale AI pilot project to validate technical feasibility and demonstrate initial value. Gather feedback and iterate.

Phase 3: Integration & Scaling

Integrate successful AI solutions into existing enterprise systems. Develop robust monitoring, maintenance, and governance frameworks for broader deployment.

Phase 4: Optimization & Expansion

Continuously monitor AI performance, refine models, and explore new applications across the organization. Foster an AI-driven culture and upskill teams.

Ready to Own Your AI Strategy?

Don't let the complexities of AI hold you back. Let's build a reliable, impactful AI roadmap tailored to your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking