Enterprise AI Analysis
An exploratory study of multi-channel CNN for early detection of lung cancer from longitudinal healthcare records
Lung cancer remains the leading cause of cancer-related mortality worldwide, with poor survival rates due to late-stage detection. Current low-dose computed tomography screening faces barriers including high costs and false-positive rates reaching 24%, while artificial intelligence offers opportunities to enhance early detection through longitudinal clinical data analysis. This study developed a Multi-Channel Convolutional Neural Network (MCNN) for lung cancer risk prediction using Taiwan's National Health Insurance Research Database, encompassing 523,539 patients (2,809 lung cancer, 23,783 other cancer and 496,947 non-cancer). The MCNN was designed as a lightweight model processing nine channels of diagnostic codes, medications, and medical orders over a three-year observation period. Systematic feature selection reduced estimated feature storage requirements by 99.8%, from approximately 1,184 GB for the full ICD feature space to approximately 2.11 GB for the selected features, while retaining clinical relevance. Model performance was assessed using stratified 10-fold cross-validation against seven machine learning baselines, and interpretability was examined through SHAP analysis. The MCNN achieved an F1-score of 66.91%, precision of 84.47%, and recall of 59.79%. Ablation studies confirmed multi-modal integration benefits, with diagnostic codes providing primary predictive power. SHAP analysis revealed distinct temporal patterns validating the model's ability to identify pre-diagnostic phases through healthcare engagement patterns. Findings are based on internal validation within a single national database, and key risk factors such as smoking history are not captured in administrative claims data; future evaluation in independent external cohorts is therefore warranted to confirm these findings. The model's high precision minimizes false-positive rates while its computational efficiency and clinical interpretability support practical implementation as a complementary claims-based screening support tool for early cancer detection.
Executive Impact: At a Glance
This study developed a Multi-Channel Convolutional Neural Network (MCNN) for early lung cancer detection using Taiwan's National Health Insurance Research Database. The MCNN achieved an F1-score of 66.91%, precision of 84.47%, and recall of 59.79%, outperforming traditional machine learning methods and a strong deep learning baseline. The model effectively integrates diagnostic codes, medication records, and medical orders over a three-year observation period, significantly reducing computational complexity through systematic feature selection (99.8% storage reduction). SHAP analysis confirmed its ability to identify pre-diagnostic patterns, with diagnostic codes providing the primary predictive power. While limited by reliance on administrative data and internal validation, the MCNN offers a computationally efficient and clinically interpretable tool for risk stratification, complementing existing screening programs for early cancer detection.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Enterprise Process Flow
| Model | F1-score | Precision | Recall |
|---|---|---|---|
| MCNN (Our Method) | 66.91% | 84.47% | 59.79% |
| RETAIN (Baseline) | 66.06% | 81.38% | 59.30% |
| LightGBM | 64.05% | 82.01% | 56.21% |
| Random Forest | 57.35% | 81.82% | 51.10% |
| Decision Tree | 57.77% | 60.17% | 55.89% |
| Naive Bayes | 31.24% | 43.07% | 74.38% |
| DeepFFM | 32.42% | 31.56% | 33.33% |
| Notes: MCNN achieved superior F1-score and competitive performance against strong deep learning baselines, demonstrating balanced ability to identify true positives with high precision. | |||
SHAP Analysis for Lung Cancer Early Detection
Scenario: The SHAP analysis revealed distinct temporal patterns in healthcare engagement prior to lung cancer diagnosis. For lung cancer patients, diagnostic channels showed predominantly negative contributions (absence of definitive diagnoses early on), while examination channels showed strong positive signals for cancer-specific procedures preceding formal diagnosis. This indicates the model identifies pre-diagnostic phases through indirect markers of clinical workup.
Outcome: The model effectively distinguishes normal from pathological engagement patterns, highlighting its capacity to identify patients during critical pre-diagnostic evaluation phases where early intervention is most impactful, even before definitive diagnostic codes appear. This supports its clinical plausibility as a complementary screening tool.
Calculate Your Potential AI Impact
Estimate the transformative effect AI can have on your operational efficiency and cost savings, tailored to your enterprise context.
Your AI Implementation Roadmap
A strategic outline for integrating advanced AI into your operations, ensuring a smooth transition and measurable impact.
Phase 1: Data Integration & Model Setup
Securely integrate NHIRD data, preprocess and standardize diagnostic, medication, and medical order records. Establish the MCNN architecture and feature selection pipeline.
Phase 2: Initial Training & Internal Validation
Train the MCNN using stratified 10-fold cross-validation on the Taiwan NHIRD. Optimize hyperparameters and assess performance metrics (F1-score, precision, recall, AUROC, AUPRC).
Phase 3: Interpretability & Clinical Review
Conduct SHAP analysis to understand temporal patterns and feature contributions. Review findings with clinical experts to validate plausibility and identify key pre-diagnostic signals.
Phase 4: External Validation & Generalizability
Evaluate the model's performance on independent external cohorts and more recent datasets to confirm findings and assess broader applicability, potentially integrating additional risk factors.
Phase 5: Pilot Deployment & Real-World Impact
Pilot the MCNN as a complementary claims-based screening support tool in primary care or outpatient settings. Calibrate operating thresholds and assess real-world impact on early detection rates and resource utilization.
Ready to Transform Your Enterprise with AI?
Connect with our AI specialists to explore how these insights can be tailored to your organization's unique challenges and opportunities.