COMMUNICATIONS MEDICINE ARTICLE IN PRESS
Real-world validation of a multimodal LLM-powered pipeline for high-accuracy clinical trial patient matching
Recruiting patients for clinical trials is time-consuming and resource-intensive because eligibility rules are complex and medical records are lengthy. We built an artificial intelligence (AI) system that helps match patients to trials by reading both text and images from medical records, including scans, tables, and handwriting. This digital tool finds the most relevant pages, checks each rule step by step, and clearly flags when medical information is missing. We evaluated it on a widely used public dataset and in real clinics across many sites and trials. The system produced reliable, high-quality eligibility assessments, and coordinators were able to review each patient in under nine minutes on average, much faster than manual chart review. Because it works without custom connections to hospital software, it can be deployed broadly to reduce delays and help more patients access studies and new treatments.
Key Impact Metrics
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Problem & Context
Clinical trial patient recruitment faces significant bottlenecks due to complex eligibility criteria and labor-intensive manual chart reviews. Traditional text-only AI models have struggled with reliability, scalability, and handling diverse medical record formats.
Technical Details: The challenge is exacerbated by the median number of eligibility criteria surging by 58% (from 31 to 49) and a fragmented EHR vendor market that complicates generic integrations. Manual pre-screening takes ~50 minutes per patient, with 88% ultimately failing, leading to ~7 hours to find one eligible patient.
Proposed Solution
This paper introduces an integration-free, multimodal LLM-powered pipeline to automate patient-trial matching using unprocessed EHR documents. It leverages advanced AI for reasoning, visual interpretation, and efficient information retrieval.
Technical Details: The pipeline consists of Trial Preprocessing (splitting criteria, generating relevance criteria and retrieval guidelines), Patient Preprocessing (splitting PDFs, de-identification, embedding images, vector storing), and a 2-step Patient x Trial Matching (relevance check, detailed assessment). It uses OpenAI's o1 for reasoning and VoyageAI's multimodal embeddings for retrieval.
Key Findings
The pipeline achieved state-of-the-art accuracy on a public benchmark (93%) and robust performance (87% accuracy) on a real-world dataset. It significantly reduced patient review time by 80% (to under 9 minutes).
Technical Details: On the n2c2 2018 cohort selection dataset, the method achieved 93% criterion-level accuracy. On a real-world dataset of 485 patients from 30 sites and 36 trials, it yielded 87% accuracy. User review efficiency showed a median time of 5.5 minutes, and a mean of 9 minutes per patient, compared to 50 minutes manually. Cost per criterion assessment was $0.09 with optimized retrieval.
Limitations & Future Work
Despite significant gains, challenges remain in uncertainty calibration, especially with incomplete medical records, and aggregating criterion-level assessments into a single patient-level recommendation. Further research on dynamic retrieval is needed.
Technical Details: User feedback indicated reluctance to exclude patients without further checks, highlighting the need for better calibration when records are incomplete. Aggregating criterion-level assessments into a robust patient-level recommendation is still complex, as simple heuristics (e.g., absolute count) proved insufficient. Dynamic retrieval, where the model can successively retrieve pages, is an area for improvement.
Enterprise Process Flow
| Dataset | Accuracy | Review Time | Key Advantage |
|---|---|---|---|
| n2c2 Public Dataset | 93% | N/A | State-of-the-art criterion-level accuracy on a benchmark with converted low-res images. |
| Real-World Dataset | 87% | 80% faster (avg. 9 min/patient) | Robust performance across 30 sites, 36 trials, 485 patients, demonstrating real-world scalability. |
Case Study: Crohn's Disease Trial Matching
The system successfully matched patients for a Crohn's Disease trial, demonstrating its ability to handle complex eligibility criteria and multimodal medical records. The AI-generated rationale and source quotes significantly streamlined the review process for CRCs.
Impact: Improved patient identification for complex indications, reducing manual effort and accelerating recruitment.
Calculate Your AI ROI
Estimate the potential time savings and cost reductions your organization could achieve by implementing AI for patient matching.
Your AI Implementation Timeline
A structured approach ensures a smooth transition and rapid value realization. Here’s a typical roadmap for integrating our AI solution.
Data Preparation
Gather and anonymize diverse medical records (text, images, handwriting) for ingestion into the multimodal LLM pipeline. Ensure compliance with data privacy regulations.
Model Deployment & Training
Deploy the LLM-powered pipeline, including multimodal embedding and reasoning models. Fine-tune for specific trial types and site-specific data nuances to optimize accuracy.
Integration & Workflow Adaptation
Integrate the AI system into existing clinical trial workflows. Train clinical research coordinators on leveraging AI-generated assessments and rationale for efficient patient review and feedback.
Continuous Optimization
Implement a feedback loop for continuous model improvement, incorporating user rectifications and new data. Monitor performance metrics like accuracy and review time savings to ensure sustained value.