Enterprise AI Analysis
Training-Free Adaptation of New-Generation LLMs using Legacy Clinical Models
Adapting language models to the clinical domain through continued pretraining and instruction tuning requires costly retraining for each new model generation. We propose Cross-Architecture Proxy Tuning (CAPT), a model-ensembling approach that enables training-free adaptation of state-of-the-art general-domain models using existing clinical models. CAPT supports models with disjoint vocabularies, leveraging contrastive decoding to selectively inject clinically relevant signals while preserving the general-domain model's reasoning and fluency. On six clinical classification and text-generation tasks, CAPT with a new-generation general-domain model and an older-generation clinical model consistently outperforms both models individually and state-of-the-art ensembling approaches (average +17.6% over UniTE, +41.4% over proxy tuning across tasks). Through token-level analysis and physician case studies, we demonstrate that CAPT amplifies clinically actionable language, reduces context errors, and increases clinical specificity. This technique especially benefits healthcare institutions with constrained computational capacity that cannot support iterative clinical training and want to adopt emerging general-domain model advances.
Authors: Sasha Ronaghi, Chloe Stanwyck, Asad Aali, Amir Ronaghi, Miguel Angel Fuentes Hernandez, Tina Hernandez Boussard, Emily Alsentzer
Executive Impact: Key Performance Indicators
CAPT offers a paradigm shift in clinical AI, enabling immediate, training-free adaptation of new-generation LLMs to the healthcare domain. This method not only bypasses costly retraining but also significantly boosts performance, clinical specificity, and reduces risks, making advanced AI accessible to resource-constrained healthcare institutions.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
The Challenge of Clinical AI Adaptation
Despite rapid advancements in general-domain large language models (LLMs), their direct application in clinical settings is hindered by issues like hallucinations, critical detail omission, and reasoning failures. This is primarily due to the limited representation of clinical data in pretraining corpora, often because of privacy concerns, and the encoding of biases or outdated information from general internet text.
Current domain adaptation techniques, such as continued pretraining and instruction tuning, are resource-intensive and must be repeated for each new model generation (e.g., MedPalm 1 to MedPalm 2). This creates a significant lag between base model capabilities and their clinical applicability, leading to general-domain state-of-the-art models often outperforming clinically adapted ones on comprehensive benchmarks.
Cross-Architecture Proxy Tuning (CAPT) addresses this bottleneck by providing a training-free adaptation mechanism. It enables the reuse of learned clinical knowledge from older, domain-adapted models to enhance newer, general-domain LLMs without the need for costly retraining.
Enterprise Process Flow: Training-Free Clinical LLM Adaptation with CAPT
CAPT: Cross-Architecture Proxy Tuning Explained
CAPT is a probability-level ensembling method designed to support models with disjoint vocabularies. It leverages contrastive decoding to selectively incorporate the specialized knowledge of a domain-adapted model while preserving the advanced reasoning and fluency of newer general-domain models.
The core idea is to hypothesize that general-domain models possess substantial medical knowledge but lack exposure to clinical practice patterns. CAPT re-ranks the top-k candidate tokens proposed by the general-domain model using a log-probability offset. This offset is defined by the difference between the clinically trained model and its untrained base counterpart, effectively "injecting" clinical relevance where needed.
Key steps in the CAPT decoding process:
- Candidate Token Selection: The new-generation general-domain model proposes its top-k most likely next-token candidates.
- Vocabulary Mapping: Each candidate token is re-tokenized using the clinical model's tokenizer to obtain a corresponding clinical token. This addresses the challenge of disjoint vocabularies between different model architectures.
- Clinical Knowledge Offset Calculation: The log-probability difference between the clinical model and its untrained base counterpart for the re-tokenized clinical token is computed. This delta represents the isolated clinical-domain signal.
- Candidate Re-ranking: This clinical knowledge offset is added to the general-domain model's original log-probabilities for each candidate. The token with the highest adjusted score is then selected.
This process allows CAPT to selectively inject clinically relevant signals without uniformly aggregating information, ensuring that the strengths of both models are utilized optimally.
Quantitative Performance Overview
CAPT significantly outperforms strong baselines and existing model-ensembling approaches across clinical classification and text generation tasks. This demonstrates its effectiveness in bridging the capabilities of new general-domain models with the domain knowledge of legacy clinical models.
| Method / Models | Avg. Classification (Macro-F1) | Avg. Text-Generation (LLM-J) | Avg. Risk-Free Outputs (%RF) |
|---|---|---|---|
| Mnew (Qwen3-30B) | 41.3% | 4.05 | 52.8% |
| Proxy Tuning | 27.1% | 3.61 | 32.4% |
| UniTE | 38.3% | 3.71 | 38.9% |
| CAPT | 41.5% | 4.09 | 52.8% |
CAPT outperforms Mnew on 8 of 12 metrics, showing an average improvement of +17.6% over UniTE and +41.4% over proxy tuning across all tasks and metrics. Its gains are particularly pronounced on text-generation tasks, where clinical reasoning and terminology precision are critical, yielding substantially more risk-free outputs (improving by 13.94 and 20.44 percentage points over UniTE and proxy tuning, respectively).
Token-Level Analysis: Selective Clinical Integration
CAPT selectively integrates clinical domain knowledge by influencing tokens related to clinical decision-making and documentation style, while allowing the general model to control linguistic structure and formatting. This demonstrates a nuanced integration, preserving the strengths of both models.
Token-Level Insights
CAPT's mechanism for combining models results in a selective influence over generated tokens. By analyzing the mean log-probability shifts:
- ✓ Clinical Decision & Documentation Style: The legacy clinical model (Mold-clin) strongly influences tokens related to clinical decision-making (e.g., 'Clinical Decision Action Headers', 'Clinical Assessment Terms') and documentation style ('Clinical Reporting Verbs', 'Clinical Hedging', 'Condition State Descriptors'). This reflects its learned conventions for structuring clinical assessments, expressing uncertainty, and framing care decisions, which are often absent from general-domain pretraining.
- ✓ Linguistic Structure & Formatting: The new-generation general-domain model (Mnew) dominates linguistic structure and formatting (e.g., 'Formatting Tokens', 'General Morphemes', 'Medical Suffixes'), reflecting its superior control over grammatical coherence and document structure.
- ✓ Medical Knowledge Convergence: Core medical terminology (e.g., 'Gynecologic Terms', 'Diagnoses', 'Medical Roots') shows near-zero shifts, indicating both models converge on foundational medical concepts. CAPT primarily affects how Mnew's medical knowledge is expressed to match clinical documentation conventions, rather than altering the fundamental knowledge itself.
Clinical Impact: Enhanced Specificity and Accuracy
Physician case studies demonstrate that CAPT significantly improves clinical outputs by increasing precision, accuracy, and context-appropriateness in treatment plans. This directly translates to more actionable and safer recommendations for healthcare providers.
Case Study: Postoperative Treatment Plan for Forearm Graft
For a forearm arteriovenous graft procedure, CAPT demonstrated crucial improvements:
- ✓ Improved Accuracy: CAPT corrected the postoperative monitoring timeline from a generic "24-72 hours" to a more appropriate "24-48 hours" in line with typical discharge practices. It also correctly emphasized "perfusion" (blood flow) monitoring over "cardiac" symptoms, which are irrelevant for a forearm graft.
- ✓ Increased Specificity: CAPT replaced Mnew's "neuro" token (implying "neuro check") with "dist." to direct generation towards monitoring "distal limb circulation," a more comprehensive and relevant assessment. It also specified "analgesics" instead of generic "agents" for pain medication.
- ✓ Preserved Explanatory Style: While making these targeted clinical improvements, CAPT maintained the general-domain model's explanatory tone. Although sometimes verbose, this style can be beneficial for less experienced clinicians.
These adjustments highlight CAPT's ability to inject clinically actionable language, reduce context errors, and increase clinical specificity, directly benefiting patient care and documentation quality.
Calculate Your Enterprise AI ROI
Estimate the potential cost savings and reclaimed employee hours by implementing CAPT in your organization.
Your Implementation Roadmap
Leverage CAPT's training-free approach to accelerate AI adoption in your clinical workflows with a clear, phased strategy.
Phase 1: Pilot & Validation (2-4 Weeks)
Identify critical clinical tasks for CAPT pilot. Integrate CAPT with existing general-domain LLMs and legacy clinical models. Conduct internal validation on accuracy, specificity, and safety using clinical benchmarks and expert review. Establish baseline performance metrics.
Phase 2: Customization & Integration (4-8 Weeks)
Fine-tune CAPT's parameters (e.g., α coefficient, k-value) based on pilot results for optimal clinical alignment. Integrate CAPT outputs into existing clinical documentation systems and workflows. Develop monitoring protocols for continuous performance and safety assessment.
Phase 3: Scaled Deployment & Training (8-16 Weeks)
Roll out CAPT-enhanced LLMs to a broader user group within the institution. Provide comprehensive training and support to clinical staff on leveraging CAPT for improved efficiency and accuracy. Establish feedback loops for ongoing optimization and identification of new application areas.
Phase 4: Continuous Optimization & Expansion (Ongoing)
Regularly update CAPT with new general-domain LLM generations and evolving legacy clinical models, leveraging its training-free nature. Explore expansion to additional clinical specialties and text generation tasks. Monitor long-term impact on clinical outcomes, efficiency, and physician satisfaction.
Ready to Transform Your Clinical AI?
Discover how Cross-Architecture Proxy Tuning can revolutionize your healthcare institution's AI capabilities, reducing costs and accelerating innovation. Let's build a smarter, safer future for clinical documentation and decision-making.