Enterprise AI Analysis

Predictable Confabulations: Factual Recall by LLMS Scales with Model Size and Topic Frequency

Matthew L. Smith*, Jonathan P. Shock, Samuel T. Segun, Iyiola E. Olatunji, Tegawendé F. Bissyandé
While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an automated reference verification system. Recall quality follows a sigmoid in the log-linear combination of model parameter count and topic representation in training data. These two variables alone explain 60% of the variance across 16 dense models from four families, rising to 74-94% within individual families. The form matches a superposition-inspired account in which recall is gated by a signal-to-noise ratio: signal strength scales with concept frequency and the noise floor with model capacity.

Schedule Your Strategy Session

Executive Impact

Factual recall in LLMs is not random but follows predictable scaling laws tied to model size and topic frequency, revealing a structural inequality in knowledge encoding.

Optimize LLM deployment and fine-tuning strategies by understanding the predictable relationship between model capacity, training data composition, and factual recall quality, mitigating confabulations for business-critical applications.

0.0 Sigmoid R² Value

0 Models Evaluated

0 Scholarly References

0.0 Variance by Model Size

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Economics Insights for LLM Factual Recall

In economics-related topics, LLMs demonstrate varying factual recall capabilities depending on the specificity and frequency of the concept in training data. Highly represented topics like "Economics" (767,282 works) show higher recall, while niche areas such as "Microfinance loan repayment" (1,341 works) are more susceptible to confabulation. This implies that for enterprise applications relying on specific financial or economic data, model scaling and targeted fine-tuning on relevant datasets are crucial.

Education Insights for LLM Factual Recall

Educational topics highlight the "long-tail" challenge. While general concepts like "Education" (5,921,253 works) are well-recalled, specialized topics like "School dropout prevention in rural areas" (32 works) fall into the "floor regime" of confabulations. Enterprises building educational AI tools must consider supplemental retrieval systems for highly specific or underrepresented domains to ensure accuracy and avoid generating misleading information.

Energy Insights for LLM Factual Recall

Energy sector information in LLMs also follows the frequency-recall pattern. Broad topics like "Energy" (8,631,334 works) are robustly recalled, but granular ones such as "Mini-grid electrification" (747 works) exhibit lower recall quality. This finding is critical for energy enterprises using LLMs for analysis or content generation, suggesting that bespoke models or advanced RAG are necessary for reliable information on niche energy technologies and policies.

Environmental Science Insights for LLM Factual Recall

Environmental science topics, ranging from "Climate change" (1,222,665 works) to "Climate-smart agriculture for smallholders" (1,406 works), showcase the importance of training data diversity and volume. LLMs perform better on broadly discussed environmental issues but struggle with highly specific, localized, or emerging concepts. Enterprises in sustainability and agriculture must invest in targeted data strategies to ensure their AI systems provide accurate, detailed information.

Health Insights for LLM Factual Recall

In the health domain, recall quality is high for widespread topics like "Health" (8,504,910 works) and "Infectious disease" (469,401 works). However, very specific public health interventions such as "Insecticide-treated bed nets for malaria" (1,331 works) show reduced recall. For healthcare enterprises, this means LLMs can provide general medical information reliably, but for critical, highly specific treatment details or rare conditions, human verification and/or specialized knowledge bases are indispensable.

Political Science Insights for LLM Factual Recall

Political science topics reveal a similar pattern, with "Political science" (285,942 works) demonstrating better recall than "Biometric voter registration" (171 works). This highlights potential biases in LLM knowledge based on global discourse frequency. Enterprises using AI for policy analysis or public sector applications must be aware of these disparities, especially when dealing with nuanced or less-covered political and social topics, requiring robust validation or expert input.

Enterprise Process Flow: Scaling Law Derivation

Setup: Superposition & Interference

→

Signal-to-Noise Ratio (SNR) Control

→

Zipf's Law for Concept Frequencies

→

Power-law Signal Mapping

→

Effective Dimensions (N_eff)

→

Recall Fraction Q Derived

→

Logistic Mapping to Continuous Quality

0.599 Variance Explained by Sigmoid Fit (R²)

The sigmoid functional form, combining model parameter count and topic frequency, accounts for nearly 60% of the observed variance in factual recall quality across diverse LLMs and topics, demonstrating a robust and predictable scaling relationship.

Model Family Performance Comparison

Family	Key Characteristics	Recall Quality Trend
Llama	Strong performance across sizes. Consistently good training procedures.	Systematic offsets above baseline, suggests superior training/data.
Gemma & Mistral	Follows general scaling trends. Represents typical scaling behavior.	Systematic offsets below baseline, indicates potential for optimization.
Qwen3 (Outlier)	Smallest members exhibit floor regime behavior. Steep apparent slope in early scaling.	Outlier behavior in smaller models, hints at floor regime limitations.
MoE Architectures	Mixture of Expert models. Scaling driven by total parameters.	Total parameters (not active) dominate noise floor for MoE models.

Case Study: The Floor Regime of Factual Recall

At the lowest end of the sigmoid curve, models exhibit a 'floor regime' where they cease to produce verifiable references and instead generate templated fabrications. For example, Llama 3.2 1B produced only 22 verifiable references out of 215, often repeating the same work (Dahl's Polyarchy) with different publication years. Similarly, Qwen3 8B relied on a small pool of 62 unique first-author surnames, with 'Smith' appearing across all 24 topics, indicating slot-filling rather than true recall. This highlights the critical threshold where model capacity and topic frequency are insufficient for reliable factual generation, leading to predictable confabulations.

Estimate Your ROI

Unlock Productivity: Calculate Your Potential AI Savings

Project the tangible benefits of optimizing LLM factual recall within your enterprise. Understand how improved accuracy and reduced confabulations translate into significant time and cost savings.

Your Industry

Number of Employees Using LLMs

Average Hours/Week Using LLMs

Average Hourly Fully Loaded Cost ($)

Estimated Annual Savings $0

Productive Hours Reclaimed Annually 0

Implementation Blueprint

Your Roadmap to Reliable LLM Factual Recall

Based on the predictable scaling of factual recall, here's a strategic roadmap to integrate these insights into your enterprise AI initiatives and minimize confabulations.

Phase 1: Baseline Assessment & Gap Analysis

Evaluate existing LLM factual recall performance against topic frequency and model size to identify knowledge gaps and confabulation hotspots across critical enterprise domains.

Phase 2: Targeted Data Curation & Fine-Tuning

Implement targeted pre-training and fine-tuning strategies focusing on low-frequency, high-value concepts to raise signal strength above the interference floor and improve recall.

Phase 3: Retrieval Augmented Generation (RAG) Integration

For very low-frequency or critical concepts below the recall floor, integrate robust RAG systems to bypass parametric recall and ensure accuracy, complementing LLM capabilities.

Phase 4: Continuous Monitoring & Adaptive Scaling

Establish ongoing monitoring of factual recall quality across diverse topics and model scales, using the sigmoid framework to adaptively optimize model deployments and resource allocation.

Discuss Your Implementation Roadmap

Ready to Transform Your AI Strategy?

Don't let unpredictable confabulations hinder your enterprise AI. Our experts are ready to help you implement a data-driven approach to LLM deployment, ensuring factual accuracy and maximizing ROI.

Schedule Your Strategy Session Today

Enterprise AI Analysis

Predictable Confabulations: Factual Recall by LLMS Scales with Model Size and Topic Frequency

Executive Impact

Factual recall in LLMs is not random but follows predictable scaling laws tied to model size and topic frequency, revealing a structural inequality in knowledge encoding.

Deep Analysis & Enterprise Applications

Economics Insights for LLM Factual Recall

Education Insights for LLM Factual Recall

Energy Insights for LLM Factual Recall

Environmental Science Insights for LLM Factual Recall

Health Insights for LLM Factual Recall

Political Science Insights for LLM Factual Recall

Enterprise Process Flow: Scaling Law Derivation

Model Family Performance Comparison

Case Study: The Floor Regime of Factual Recall

Estimate Your ROI

Unlock Productivity: Calculate Your Potential AI Savings

Implementation Blueprint

Your Roadmap to Reliable LLM Factual Recall

Phase 1: Baseline Assessment & Gap Analysis

Phase 2: Targeted Data Curation & Fine-Tuning

Phase 3: Retrieval Augmented Generation (RAG) Integration

Phase 4: Continuous Monitoring & Adaptive Scaling

Ready to Transform Your AI Strategy?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai