Enterprise AI Analysis
LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models
Large language model (LLM) research has grown rapidly, along with increasing concern about their limitations. In this survey, we conduct a data-driven, semi-automated review of research on limitations of LLMs (LLLMs) from 2022 to early 2025 using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we identify 14,648 relevant papers using keyword filtering, LLM-based classification, validated against expert labels, and topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM). We find that the share of LLM-related papers increases over fivefold in ACL and nearly eightfold in arXiv between 2022 and 2025. Since 2022, LLLMs research grows even faster, reaching over 30% of LLM papers by 2025. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward security risks, alignment, hallucinations, knowledge editing, and multimodality. We offer a quantitative view of trends in LLLMs research and release a dataset of annotated abstracts and a validated methodology, available at: github.com/a-kostikova/LLLMs-Survey.
Key Insights & Impact
Quantitative overview of the research landscape and critical findings for enterprise AI adoption.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
By late 2024, Large Language Models account for 75% of all crawled ACL papers, indicating a significant shift in NLP research focus.
Research into LLM Limitations has grown even faster, reaching over 30% of all LLM-related papers by 2025, reflecting increasing community engagement with risks.
Reasoning failures remain the most studied limitation across both ACL and arXiv datasets, followed by generalization, hallucination, bias, and security concerns.
Systematic Literature Review Pipeline
The survey employs a data-driven, semi-automated pipeline for systematic literature review, involving keyword filtering, LLM-based classification, human validation, and two distinct clustering methods.
Shifting Priorities: Safety, Controllability & Multimodality
LLLMs research shows a notable shift towards safety and controllability concerns in arXiv (Security Risks, Alignment Limitations, Knowledge Editing, Hallucination), and increasing attention to Multimodality. This aligns with the community's response to widespread LLM deployment and emerging challenges with non-textual data.
- Safety & Controllability: Topics like Security Risks, Alignment Limitations, and Hallucination show significant growth, especially in arXiv, driven by the increasing deployment of LLMs and concerns with their real-world impact.
- Multimodality: Research on limitations in multimodal LLMs is rapidly increasing, indicating new challenges arising from integrating diverse input types like images and audio.
- Reasoning & Knowledge Editing: These fundamental limitations show steady growth, reflecting ongoing efforts to improve core LLM capabilities without retraining.
- Social Bias & Generalization: While still important, these topics show a relative decline in focus in arXiv after mid-2023, suggesting a maturation of the field and integration of these issues into broader safety discussions.
| Feature | HDBSCAN+BERTopic | LLooM |
|---|---|---|
| Core Shared Topics | Reasoning, Hallucination, Security Risks, Social Bias, Generalization, Long Context | Reasoning, Hallucination, Security Risks, Social Bias, Generalization, Long Context |
| Clustering Approach | Single-label, density-based | Multi-label, LLM-based |
| Granularity | Fewer, broader clusters | Finer-grained categories |
| Paper Assignment | One topic per paper | Multiple topics per paper |
| Overlap (Jaccard) | Moderate (0.313 ACL, 0.201 arXiv) | Moderate (0.239 ACL, 0.244 arXiv) |
Quantify Your Enterprise AI Impact
Estimate the potential efficiency gains and cost savings from addressing LLM limitations in your organization.
Your LLM Limitation Strategy Roadmap
A phased approach to integrating insights from LLM limitation research into your enterprise AI strategy.
Phase 1: Research & Assessment
Identify critical LLM limitations relevant to your specific business use cases and conduct an internal audit of existing AI deployments.
Phase 2: Pilot & Validation
Implement pilot projects focusing on mitigating key limitations (e.g., hallucination, bias) using validated methodologies. Evaluate effectiveness with human-in-the-loop validation.
Phase 3: Scaled Integration
Scale proven mitigation strategies across enterprise AI systems. Implement continuous monitoring for emerging limitations and performance shifts.
Phase 4: Future-Proofing & Innovation
Stay abreast of evolving LLLM research, integrate new techniques (e.g., multimodal limitation handling), and foster a culture of responsible AI development.
Ready to Address Your LLM Limitations?
Book a free consultation with our AI strategy experts to discuss a tailored approach for your enterprise.