Enterprise AI Analysis
Healthcare-Focused Turkish Medical LLM: Training on Real Patient-Doctor Question-Answer Data for Enhanced Medical Insight
M. ALI BAYRAM, BANU DIRI, SAVAS YILDIRIM
Yildiz Technical University, Istanbul, Turkey
Published: ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 24, No. 11, Article 129 (November 2025).
This study introduces a specialized Turkish Medical LLM fine-tuned on over 167,732 real patient-doctor question-answer pairs sourced from a trusted medical platform. Utilizing models like LLAMA 3, the fine-tuning process was supported by Low-Rank Adaptation (LoRA) and involved innovative methods to mitigate catastrophic forgetting, including spherical linear interpolation (Slerp) merging. Evaluation through similarity scores, GPT-3.5 assessments, and expert reviews indicates significant improvement in the model's ability to generate medically accurate responses. This Turkish Medical LLM demonstrates potential to support medical decision-making and patient interaction in Turkish healthcare settings, offering an essential resource for enhancing AI inclusivity across languages.
Executive Impact: Unlocking Clinical Precision
The development of a Turkish Medical LLM trained on real-world patient-doctor interactions demonstrates significant advancements for healthcare AI, offering improved accuracy and cultural relevance.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Understanding Turkish Medical Language
Turkish, as an agglutinative language with rich morphological complexity, presents unique challenges for Large Language Models. General-purpose models often struggle to address the nuanced grammar, syntax, and lexicon of medical Turkish, leading to inaccurate or inappropriate responses. This limitation hinders healthcare professionals and patients from accessing high-quality, culturally relevant AI tools. Developing a Turkish-specific medical LLM is crucial for equitable healthcare delivery and enhancing AI inclusivity across languages.
Real-World Patient-Doctor Interactions
The specialized Turkish Medical LLM was trained on over 167,732 real patient-doctor question-answer pairs sourced from doktorsitesi.com, a widely-used platform in Turkey. This dataset ensures the model captures authentic linguistics and cultural subtleties specific to the Turkish medical context, differentiating it from models trained on synthetic or translated data. The dataset covers a diverse range of medical fields and includes contributions from 84,907 specialists and 66,231 professors, reflecting a broad spectrum of medical expertise.
Fine-tuning LLAMA 3 with LoRA and Slerp Merging
The base model selected for this project was LLAMA 3 (8B) due to its open-source availability and strong performance. The fine-tuning process utilized Low-Rank Adaptation (LoRA) with rank 8 to efficiently update model weights. A critical challenge of catastrophic forgetting, where general language understanding diminishes during specialization, was addressed using an innovative Slerp model merge approach. This technique blends the weights of the fine-tuned model with its base model to retain general language capabilities alongside specialized medical expertise, improving the model's overall language versatility from 19 to 53 out of 100 on Turkish LLM benchmarks.
Robust Evaluation and Performance
The fine-tuned Turkish Medical LLM achieved an average cosine similarity score of 0.5151 to expert answers, representing an 11.33% improvement over the base LLAMA 3 (0.4627). GPT-3.5 assessments further validated its quality, assigning an average rating of 7.132, demonstrating a 7.85% proximity to human expert scores (7.692). Expert human doctors rated the fine-tuned model's responses at 2.59 (on a scale of 1-5), significantly outperforming the base model's 2.00, indicating strong medical relevance, clarity, and comprehensiveness.
Interpreting Culturally Embedded Phrases
A key qualitative finding was the model's ability to correctly interpret culturally embedded Turkish medical expressions. While general models like LLaMA 3 and MedAlpaca misinterpreted metaphorical phrases (e.g., "Boğazıma bir şey battı" - Something stabbed my throat) due to literal translations or geographic misinterpretations, the Turkish Medical LLM accurately understood these as symptoms requiring medical attention (e.g., foreign object in throat) and provided medically sound recommendations. This highlights its essential role in patient-facing applications within the Turkish healthcare context.
Roadmap for Advanced Healthcare AI
Future directions include enhancing the model's adaptability through real-time feedback mechanisms and ensuring it retains linguistic versatility while specializing in medical contexts. Research into cross-language transfer learning could leverage insights from English-language medical LLMs to support Turkish models where data is sparse. Critical ethical considerations, such as patient confidentiality and preventing overreliance on AI for medical decision-making, will guide responsible deployment strategies to ensure the model complements, rather than replaces, human expertise, fostering empathetic and patient-centered AI applications.
Enterprise Process Flow
| Prompt (TR/EN) | Model | Review of Model's Response |
|---|---|---|
| Mideme bir şey oturdu gibi, ne yapmalıyım? (It feels like something has settled in my stomach, what should I do?) |
LLAMA 3 | Misinterpreted as emotional confusion; suggested breathing exercises, mindfulness. |
| MedAlpaca | Took phrase literally: "I do not have the ability to physically pick up or hold objects." | |
| Turkish Medical LLM | Correctly interpreted symptoms as possible gastritis or ulcer; recommended healthy diet, herbal teas, probiotics, and consulting a gastroenterologist. | |
| Boğazıma bir şey battı, ne yapmalıyım? (Something stabbed my throat, what should I do?) |
LLAMA 3 | Interpreted "Boğazıma" as Bosphorus Strait; suggested contacting maritime authorities. |
| MedAlpaca | Refused medical advice, citing lack of diagnostic capabilities. | |
| Turkish Medical LLM | Correctly understood as a possible medical emergency involving a foreign object in the throat; urged immediate consultation with a doctor. |
Case Study: Precision in Turkish Medical Context
One of the most compelling demonstrations of our specialized Turkish Medical LLM's capability is its nuanced interpretation of culturally embedded medical expressions. General-purpose models often falter, defaulting to literal or irrelevant responses. For instance, when presented with "Mideme bir şey oturdu gibi, ne yapmalıyım?" (literally: It feels like something has settled in my stomach, what should I do?), LLAMA 3 misinterpreted it as emotional stress, while MedAlpaca offered a literal, non-medical response about physical inability to pick up objects.
Our fine-tuned Turkish Medical LLM, however, correctly interpreted this idiomatic expression as indicative of gastrointestinal discomfort, such as gastritis or an ulcer. It then provided medically sound advice, including dietary recommendations and the suggestion to consult a gastroenterologist. This crucial difference underscores the model's ability to bridge linguistic and cultural gaps in healthcare communication, ensuring patients receive accurate and contextually appropriate guidance.
Calculate Your Potential AI Impact
Estimate the efficiency gains and cost savings your enterprise could achieve by integrating specialized AI solutions.
Your AI Implementation Roadmap
A strategic outline for integrating advanced AI into your enterprise, ensuring a smooth transition and maximum impact.
Phase 1: Discovery & Strategy
Conduct in-depth analysis of current workflows, identify key pain points, and define AI objectives tailored to your specific enterprise needs. This includes data assessment and initial model selection.
Phase 2: Data Preparation & Fine-tuning
Gather, clean, and pre-process proprietary enterprise data. Implement advanced fine-tuning techniques, like LoRA and Slerp merging, to specialize models while preserving core capabilities and mitigating forgetting.
Phase 3: Model Deployment & Integration
Deploy the specialized AI model into your existing infrastructure. This involves seamless API integration, robust testing, and ensuring compatibility with current systems and security protocols.
Phase 4: Performance Monitoring & Iteration
Continuously monitor AI model performance, gather user feedback, and implement iterative improvements. This phase focuses on adaptive fine-tuning, ethical governance, and scaling solutions.
Ready to Transform Your Enterprise with AI?
Book a personalized consultation with our AI strategists to explore how these insights can be applied to your organization.