Enterprise AI Analysis
A Data and Knowledge Cross-Level Fusion-Driven Learning Framework for Detecting Missing Diagnosis
Shaohui Liu, Xien Liu, Xinyue Fang, Chenwei Yan, Kaiyin Zhou, Xinxin You, Meiwei Li & Ji Wu
Received: 4 February 2025 | Accepted: 29 April 2026
This paper introduces DKFusion, a novel data and knowledge cross-level fusion-driven learning framework designed for the automated identification of missed diagnoses in Electronic Medical Records (EMRs). Addressing issues like inaccurate documentation, incorrect DRG assignments, and reduced reimbursements, DKFusion integrates diagnosis recall, contextual validation, and deduplication modules. Evaluated on real-world EMRs from six Chinese hospitals, the model significantly outperforms traditional and LLM-based baselines, demonstrating superior F1 scores and boosting precision. It can identify potential missed diagnoses in 37.8% of EMRs, leading to altered DRG groupings in 9.0% of cases and affecting 3.2% of insurance reimbursement. DKFusion also supports human-AI collaboration modes, boosting efficiency and precision in clinical workflows.
Quantifying the Impact on Healthcare Operations
DKFusion delivers tangible benefits across key healthcare metrics, from diagnostic accuracy to financial optimization and operational efficiency.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
DKFusion's Superior Performance
DKFusion demonstrates superior performance across both in-domain and out-of-domain settings, substantially surpassing traditional baselines (including BERT-based and expert-system approaches), as well as LLMs utilizing standard instruction prompting or general medical multi-agent frameworks. Specifically, DKFusion achieved F1 scores of 62.5% on the in-domain set and 59.9% on the out-of-domain set. Even when compared to LLMs utilizing targeted supervised SFT, DKFusion remains highly competitive; it outperforms the second-best model, Baichuan-M2-SFT-Section, by 6.4% on in-domain tasks while maintaining comparable out-of-domain generalization (+0.1%). Overall, while LLMs show immense potential in this task when subjected to targeted optimization, our model achieves greater efficiency and robustness through a fusion-driven strategy of knowledge and data. It accomplishes this despite having less than 1% of the parameter count of Baichuan-M2-32B.
Quantifying DRG & Reimbursement Benefits
In the DRG payment system, 9.03% of cases with missed diagnoses found by our model will lead to DRG grouping changes, resulting in an increase of 3.15% in medical insurance payments. This emphasizes the significant impact of missed discharge diagnoses within the DRG payment system. We found that the impact of missed diagnoses on costs varies significantly, related to CHS-DRG settings and the original DRG grouping of the EMRs. For medical groups with high original costs, missed diagnoses may have a more significant impact on the care process, leading to a greater impact on costs when CC/MCC is added.
Optimizing Clinical Workflows with AI
We explored two modes of human-machine collaboration: (1) Model-driven mode: Prioritizes precision under model leadership, with human supervision to minimize prediction errors, ideal for rapid evaluations and batch EMR processing. DKFusion-S offers higher precision and minimizes false alerts. (2) Specialist-driven mode: Human specialists lead decision-making, with model support reducing workload while ensuring accuracy, crucial for EMR quality control and complete diagnosis lists. Simulated tests showed that using the model for recommendations and expert verification reduces manual EMR review time from approximately 24 minutes to 2.3 minutes per EMR, a nearly tenfold increase in efficiency. For the model-driven mode, the system achieves physician-comparable precision with an average of 6.8 seconds per EMR.
Understanding Model Limitations & Future Directions
Our error analysis revealed two main categories: false positives (29.9%) and false negatives (47.6%). False positives occur when diagnoses are incorrectly identified in context or are already recorded. False negatives result from recall failures (e.g., diagnosis not in dictionary) or erroneous clinical associations (linking similar but distinct diagnoses). The study acknowledges limitations including restricted effective coverage of ICD-10 codes, empirical verification only in Chinese EMRs, challenges with long-tail errors, and focusing only on explicitly documented conditions. Future work will expand annotation scope, incorporate formal diagnostic criteria, and jointly optimize with related DRG tasks.
Enterprise Process Flow: DKFusion Framework
DKFusion leverages a three-step pipeline to identify and validate missed diagnoses, ensuring accuracy and efficiency in complex EMR data.
| Method | Precision | Recall | F1 | Infer Speed (Time per record) |
|---|---|---|---|---|
| Doctor | 80.8% | 77.6% | 79.1% | 1440.0 seconds |
| DKFusion followed by doctor (Specialist-Driven) | 97.9% | 53.9% | 69.5% | 138.0 seconds |
| DKFusion-S (Model-Driven) | 81.2% | 17.1% | 28.2% | 6.8 seconds |
This table highlights the significant efficiency gains and improved precision when integrating DKFusion into human review workflows.
Real-World Impact: Detecting a Missed Diagnosis
This case study exemplifies the significant impact of detecting a previously missed diagnosis using DKFusion. A patient with malignant intracranial tumors developed postoperative intracranial pneumatocoele after tumor resection. In the original EMR, this condition was omitted from the discharge diagnosis list as a CC diagnosis. DKFusion identified this crucial omission, leading to a revision from Current DRG BR25:Cerebral ischemic disorder without complications or comorbidities to New DRG BR21:Cerebral ischemic disorder with severe complications and comorbidities. This correction resulted in a substantial financial impact, preventing a financial loss of 18,849 RMB (approximately $2,600 USD) due to inaccurate grouping and payment. The system provides clear evidence by highlighting its mention in 'Special examinations' and 'Diagnosis and treatment process' sections of the EMR, facilitating rapid physician confirmation.
Calculate Your Potential AI ROI
Estimate the financial and operational benefits of implementing AI solutions in your enterprise.
Your AI Implementation Roadmap
A structured approach to integrating advanced AI, ensuring seamless adoption and maximum value.
Phase 1: Data & Knowledge Fusion
Establish robust data pipelines and integrate domain-specific knowledge to create a comprehensive foundation for AI models. This phase includes constructing diagnostic dictionaries and leveraging existing ICD knowledge with EMR data.
Phase 2: Model Training & Evaluation
Develop and rigorously test AI models using a combination of supervised and contrastive learning, ensuring high performance across various clinical scenarios. This involves fine-tuning modules like diagnosis recall, contextual validation, and deduplication.Phase 3: Human-AI Collaboration Integration
Implement and optimize human-machine collaboration workflows, allowing clinicians to efficiently review and validate AI-generated insights. This includes designing model-driven and specialist-driven modes for optimal efficiency.
Phase 4: Real-World Deployment & Impact Analysis
Deploy the AI solution within existing hospital information systems and continuously monitor its impact on diagnostic accuracy, DRG assignments, and insurance reimbursement. Regular evaluation ensures ongoing benefits and refinement.
Ready to Transform Your Healthcare Operations?
Book a personalized strategy session with our AI experts to explore how DKFusion can benefit your institution.