K-MaT: Knowledge-Anchored Manifold Transport for Cross-Modal Prompt Learning in Medical Imaging
K-MaT leverages Knowledge-Anchored Manifold Transport to overcome catastrophic forgetting and modality-specific shortcuts in medical Vision-Language Models (VLMs), achieving superior zero-shot generalization across diverse imaging modalities.
This paper introduces K-MaT, a novel prompt-learning framework that enables robust cross-modal transfer in medical imaging, particularly from high-end modalities (e.g., CT) to low-end ones (e.g., X-ray). K-MaT achieves this by factorizing prompts, anchoring them to LLM-generated clinical descriptions, and aligning prompt manifolds using Fused Gromov-Wasserstein optimal transport. This approach mitigates catastrophic forgetting and significantly improves generalization, outperforming existing state-of-the-art methods with a 44.1% average harmonic mean accuracy and 36.2% macro-F1 score across four demanding cross-modal benchmarks.
Key Metrics & Strategic Impact
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Problem: Catastrophic Forgetting & Modality-Specific Shortcuts
Existing VLM approaches optimized on high-end medical imaging often fail to generalize to low-end modalities due to catastrophic forgetting. Prompts collapse into modality-specific statistics, losing general textual knowledge critical for shared diagnostic semantics. This leads to poor performance in real-world frontline applications.
Enterprise Process Flow
Core Contributions
K-MaT introduces a strict zero-shot asymmetric transfer strategy, leverages LLM-generated textual prototypes as semantic anchors to prevent deviation from clinically meaningful semantics, and employs Fused Gromov-Wasserstein (FGW) optimal transport for cross-modal manifold alignment. This ensures the learned low-end prompt manifold mirrors the visually grounded relational structure of the high-end manifold.
Significant Performance Improvement
44.1% Average Harmonic Mean Accuracy across 4 benchmarks, outperforming BiomedCoOp by 2.1%.| Method | Avg. H ACC | Avg. H F1 | Key Advantage |
|---|---|---|---|
| BiomedCLIP | 31.7% | 24.1% |
|
| CoOp | 38.3% | 30.5% |
|
| CoCoOp | 36.2% | 31.1% |
|
| BiomedCoOp | 42.0% | 35.0% |
|
| K-MaT (Ours) | 44.1% | 36.2% |
|
Mitigating Catastrophic Forgetting
On the challenging breast imaging task, standard methods like CoOp drop to 27.0% accuracy on the low-end modality. K-MaT significantly mitigates this catastrophic forgetting, preserving robust performance across modalities and achieving 38.4% low-end accuracy and 50.3% H-mean on the breast dataset.
Impact of Context Modularization
The ablation study shows that optimizing with cross-entropy alone (baseline) leads to restricted generalization, biasing predictions towards high-end modalities. The combination of Class-Specific Context (CSC) and Modality-Specific Context (MSC) is crucial for mitigating interference and ensuring effective cross-modal transfer, improving relative ACC from -12.03% to -5.31% over baseline.
Contribution of Semantic Anchoring and Manifold Alignment
+10.10% Relative improvement in H accuracy by combining semantic anchoring (Lanc) and FGW alignment (Lfgw), crucial for distilling shared clinical knowledge.Effectiveness of Fused Gromov-Wasserstein (FGW)
FGW acts as a powerful structural regularizer, ensuring the low-end prompt manifold mirrors the relational geometry of the high-end space. Visualization (t-SNE) and breakdown analysis confirm that Lfgw prevents the model from collapsing into a single class, significantly improving low-end modality performance by structural transport from the high-end space.
Current Limitations
Despite promising results, K-MaT's absolute performance on low-end modalities shows limited improvement over the zero-shot BiomedCLIP baseline. The framework is also sensitive to severe visual discrepancies between modalities, which current text-anchored alignment cannot fully bridge.
Future Directions
Future work will explore incorporating more reliable visual signals to enhance stability and low-end transfer capabilities. This could involve integrating more advanced visual feature alignment techniques or developing dynamic weighting for text vs. visual anchors based on modality similarity.
Real-World Impact: Enhancing Diagnostic Accessibility
Problem: In resource-limited settings, advanced imaging (CT, MRI) is often unavailable, forcing reliance on basic modalities (X-ray, Ultrasound). Existing AI models, trained on high-end data, perform poorly on these frontline modalities, creating a gap in diagnostic support where it's most needed.
Solution: K-MaT's ability to transfer diagnostic knowledge from high-end to low-end modalities without requiring low-end training data directly addresses this. For example, a model trained on CT scans can reliably assist in X-ray diagnosis, improving diagnostic accuracy and accessibility in underserved areas.
Outcome: By maintaining robust performance across diverse modalities, K-MaT democratizes advanced AI diagnostics, enabling earlier and more accurate diagnoses in frontline healthcare settings, ultimately saving lives and improving patient outcomes globally.
Calculate Your Potential ROI with AI
Estimate the efficiency gains and cost savings your enterprise could realize by implementing advanced AI solutions like K-MaT.
Your AI Implementation Roadmap
A typical phased approach to integrating advanced AI, ensuring seamless adoption and measurable impact.
Phase 1: Discovery & Strategy
Initial consultation, needs assessment, data readiness evaluation, and custom AI strategy development tailored to your enterprise goals.
Phase 2: Pilot & Proof-of-Concept
Development and deployment of a targeted pilot AI solution. Validate performance with real-world data and gather key insights for scalability.
Phase 3: Full-Scale Integration
Deployment across target departments, comprehensive training for your teams, and establishing continuous monitoring and feedback loops.
Phase 4: Optimization & Expansion
Refine AI models, explore new applications, and scale the solution across the organization for maximum ROI and competitive advantage.
Ready to Transform Your Enterprise with AI?
Unlock the full potential of AI in your organization. Our experts are ready to design a tailored strategy for your unique challenges.