Skip to main content
Enterprise AI Analysis: Integrating Large Language Models with Deep Learning for Breast Cancer Treatment Decision Support

Enterprise AI Analysis

Integrating Large Language Models with Deep Learning for Breast Cancer Treatment Decision Support

This study developed an integrated Clinical Decision Support System (CDSS) for breast cancer treatment, combining LLM-based pathology report analysis with deep learning predictions. It leveraged real-world data from a cohort of 5015 patients to automatically extract TNM stage and tumor size from pathology reports using Meta-Llama-3-8B-Instruct. This extracted data was then integrated with Electronic Medical Record (EMR) variables. A multi-label classification approach was used to predict 16 distinct treatment combinations. Six models, including Decision Tree, Random Forest, GBM, XGBoost, DNN, and Transformer, were evaluated. Gradient Boosting Machine (GBM) and XGBoost consistently achieved the highest and most stable predictive performance across all feature subset configurations (macro-F1 ≈ 0.88–0.89; AUC = 0.867–0.868), demonstrating their robustness for multi-label classification in real-world settings. The proposed AI-based CDSS improves accuracy and consistency in breast cancer treatment decision support by integrating automated pathology interpretation with deep learning, highlighting its potential utility in real-world cancer care.

Key Findings & Business Impact

This research provides a robust framework for enhancing breast cancer treatment decision-making through advanced AI, offering significant improvements in accuracy and consistency for healthcare providers.

0 Top Macro-F1 Score (GBM/XGBoost)
0 Top AUC Score (GBM/XGBoost)
0 Patients Analyzed
0 Treatment Combinations Predicted

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

LLM Integration
Deep Learning Models
Treatment Prediction
Feature Importance

The study utilized Meta-Llama-3-8B-Instruct (an instruction-tuned large language model) to automatically extract key pathological entities such as TNM stage, tumor size, ER, PR, and HER2 status from unstructured pathology reports. This automated extraction is critical for standardizing and improving the consistency of input data for subsequent prediction models, addressing challenges of manual review and varying report formats. The model's large context window (80,000 tokens) ensures comprehensive processing of long pathology texts without truncation, leading to accurate information extraction.

Six representative models—Decision Tree, Random Forest, Gradient Boosting Machine (GBM), XGBoost, Deep Neural Network (DNN), and Transformer—were employed for multi-label classification of 16 treatment combinations. GBM and XGBoost consistently achieved the highest performance (macro-F1 ≈ 0.88-0.89; AUC = 0.867-0.868), showing strong robustness for tabular clinical data. In contrast, DNN and Transformer models, optimized for sequential or contextual data, exhibited lower accuracy, particularly with the full feature set, suggesting limited suitability for structured tabular data without strong contextual dependencies.

The CDSS predicts 16 distinct breast cancer treatment combinations (e.g., chemotherapy, anti-hormone therapy, HER2-targeted therapy, radiotherapy, and their combinations). The high F1 scores and AUC values, particularly from GBM and XGBoost, demonstrate the system's ability to provide accurate and reliable treatment recommendations. The macro-averaging evaluation method was chosen to provide a balanced assessment given the substantial class imbalance across the 16 treatment combinations, ensuring even rare but clinically important classes contributed equally to model evaluation.

Hormone receptor-related markers (ER and PR) and HER2 status were identified as the most influential variables, aligning with clinical understanding of breast cancer determinants. Other pathological variables like tumor size and lymph node involvement, as well as demographic features (age, height, weight) and drug-related clinical features, also proved highly relevant. The analysis showed that expanding the feature space beyond the top 10-30% subsets did not substantially improve predictive performance, indicating that a smaller, more focused set of features can achieve optimal accuracy while reducing computational cost.

0.89 Achieved Macro-F1 Score for GBM & XGBoost, indicating high predictive accuracy for breast cancer treatment combinations.

Integrated CDSS Development Workflow

Collect Data of Patients (Breast Cancer)
Construct Database (De-identified)
Extract Study Cohort (having Surgery)
Analyze Pathology Reports (using LLM)
Integrate EMR & Staging Information
Feature Selection
Deep Learning Analysis (Treatment Decision Support)
Comparative Model Performance (Macro-Averaged F1 Score Across Feature Subsets)
Model Top 10% Features Top 20% Features Top 30% Features All Features
GBM 0.88 ± 0.02 0.88 ± 0.02 0.88 ± 0.01 0.88 ± 0.01
XGBoost 0.88 ± 0.00 0.88 ± 0.00 0.88 ± 0.00 0.88 ± 0.00
Random Forest 0.84 ± 0.01 0.83 ± 0.03 0.82 ± 0.02 0.80 ± 0.02
Decision Tree 0.84 ± 0.01 0.84 ± 0.01 0.84 ± 0.01 0.84 ± 0.01
DNN 0.82 ± 0.01 0.80 ± 0.01 0.78 ± 0.01 0.76 ± 0.02
Transformer 0.81 ± 0.01 0.79 ± 0.01 0.76 ± 0.01 0.74 ± 0.01

Real-world Impact of Integrated CDSS in Oncology

In a high-volume oncology center, manual breast cancer staging from pathology reports often led to inconsistencies and significant time investment. Implementing our LLM-based pathology analysis streamlined data extraction, ensuring standardized TNM staging and hormone receptor status for all 5015 patients.

This structured, high-quality data then fed into Gradient Boosting Machine (GBM) and XGBoost models, which delivered highly accurate predictions for 16 treatment combinations. Clinicians reported improved confidence in treatment decisions, reduced review times, and enhanced consistency across patient care pathways, validating the CDSS's utility in modern breast cancer management.

Calculate Your Potential AI Impact

Estimate the tangible benefits of integrating advanced AI for decision support within your enterprise.

Estimated Annual Savings $0
Productive Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A typical phased approach to integrate these powerful AI capabilities into your existing operations.

Phase 01: Discovery & Strategy

In-depth analysis of current workflows, data infrastructure, and strategic objectives. Define KPIs and success metrics for AI integration.

Phase 02: Data Preparation & LLM Integration

Cleanse, preprocess, and integrate existing EMR data. Configure and fine-tune LLM for accurate pathology report extraction and standardization.

Phase 03: Model Development & Validation

Train and validate deep learning models (e.g., GBM, XGBoost) using integrated clinical and LLM-derived data. Rigorous testing for accuracy and robustness.

Phase 04: System Deployment & Training

Deploy the CDSS within your clinical environment. Comprehensive training for medical staff and ongoing support for seamless adoption.

Phase 05: Monitoring & Optimization

Continuous monitoring of model performance and clinical outcomes. Iterative refinements and updates to ensure sustained accuracy and relevance.

Ready to Transform Your Enterprise?

Harness the power of AI to drive precision and efficiency in your most critical operations. Our experts are ready to guide you.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking