Enterprise AI Analysis
AppHerb: Language Model for Recommending Traditional Thai Medicine
This article introduces AppHerb, a fine-tuned language model designed for Traditional Thai Medicine (TTM) to address the lack of objective standards and prevalence of misinformation. Leveraging Unsloth's Gemma-2 (9B parameters) fine-tuned with data from two TTM textbooks (Wat Ratcha-orasaram and Tamra Osot Phra Narai), AppHerb performs two specialized tasks: Treatment Prediction (TrP) and Herbal Recipe Generation (HRG). It achieved promising initial precision, recall, and F1 scores (TrP: 26.54%, 28.14%, 24.00%; HRG: 32.51%, 24.42%, 24.84%) on limited Thai data, comparable to much larger Traditional Chinese Medicine (TCM) models. The study highlights challenges with Thai's low-resource, abugida writing system but establishes a foundation for AI-assisted TTM, emphasizing the need for clinical validation.
Executive Impact: Key Metrics
AppHerb's foundational performance highlights the potential for AI-driven insights in Traditional Thai Medicine, setting a new benchmark for accuracy and efficiency in a low-resource language context.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Model Architecture
AppHerb leverages a fine-tuned Gemma-2 (9B parameters) model, enhanced with Unsloth framework and Low-Rank Adaptation (LoRA) for efficient TTM knowledge integration. Gemma-2's group query attention improves efficiency without compromising quality.
- Unsloth's Gemma-2 (9B) was selected for its best average exam score among API-excluded Thai LLMs.
- LoRA (rank=8, alpha=16) was used for fine-tuning, preserving original 9B parameters and improving efficiency.
- Gemma-2U architecture integrated 42 QKV layers, enhancing memory and time efficiency by 63% VRAM reduction.
Data Processing & Tasks
Data from Wat Ratcha-orasaram and Tamra Osot Phra Narai textbooks was manually transcribed, cleaned, and formatted into JSON. Two primary tasks were defined: Treatment Prediction (TrP) and Herbal Recipe Generation (HRG).
- Data extracted from WRO and NR TTM textbooks, covering herbal recipes and symptoms.
- Automated and manual cleaning methods used, with human-validated JSON for ontological mapping.
- TrP task (405 rows): model predicts treatment based on herbal recipes.
- HRG task (256 rows): model generates herbal recipes based on symptoms.
- Train-test split ratio of 9:1 for both datasets due to limited data.
Performance Evaluation
Models were evaluated using precision, recall, F1-score, BERTScore, and BLEU. AppHerb significantly outperformed the base Gemma-2U model in both TrP and HRG tasks, demonstrating effective domain adaptation.
- TrP model achieved: Precision 26.54%, Recall 28.14%, F1 24.00%.
- HRG model achieved: Precision 32.51%, Recall 24.42%, F1 24.84%.
- BERT F1 scores: AppHerb-TrP 84.39% (vs. Gemma-2U-TrP 77.33%), AppHerb-HRG 84.68% (vs. Gemma-2U-HRG 76.16%).
- BLEU scores: AppHerb-TrP 74.28% (vs. Gemma-2U-TrP 22.99%), AppHerb-HRG 76.16% (vs. Gemma-2U-HRG 10.33%).
- Comparable performance to TCM models (GSCCAM, RoKEPG) despite 575x smaller training data.
Challenges & Future Work
The study faced significant challenges with the low-resource and complex Thai language, which lacks explicit word delimiters. Future work includes expanding datasets, exploring larger models, and achieving clinical validation.
- Thai language complexity: abugida system, no spaces between words, unlimited nested sub-sentences.
- Traditional vs. modern Thai language difference in TTM texts posed data cleaning difficulties.
- Limited dataset size led to potential overfitting and constrained generalizability.
- Future work: expand data, enhance model transparency, collaborate with TTM practitioners, clinical validation.
AppHerb Development Process
| Feature | AppHerb | GSCCAM (TCM) | ROKEPG (TCM) |
|---|---|---|---|
| Base Model | Gemma-2 | Custom | RoBERTa |
| Training Data Size (Symptoms-prescription) | 229 rows | 106,168 data | 25,563 data (13.5M words) |
| Precision | 32.5% | 37.4% | 29.0% |
| Recall | 24.4% | 30.0% | 25.9% |
| F1 Score | 24.8% | 25.0% | 23.4% |
Addressing Misinformation in Traditional Thai Medicine
Traditional Thai Medicine faces a significant challenge with misinformation and lack of objective standards, which erodes public trust. AppHerb directly addresses this by providing an evidence-based system for treatment prediction and herbal recipe generation. For example, its TrP model can provide reliable suggestions, ensuring that patients receive accurate information. The system's ability to analyze complex relationships among formulations and herbs, learned from vetted textbooks, significantly mitigates the spread of false claims about traditional remedies, thereby improving public education and safety. This marks a crucial step towards restoring trust and validating TTM practices.
Calculate Your Potential ROI
Estimate the time savings and cost efficiencies your enterprise could achieve by integrating a custom AI solution tailored to your specific operational needs.
Implementation Timeline: Phased Approach
Our structured approach ensures a smooth integration of AppHerb into your existing workflows, maximizing impact with minimal disruption.
Phase 1: Foundation & Data Integration
Establish core model architecture (Gemma-2U LoRA), integrate TTM textbooks (WRO, NR), and preprocess data for TrP/HRG tasks. Initial validation of data quality and linguistic normalization.
Phase 2: Model Fine-tuning & Optimization
Conduct LoRA fine-tuning for TrP and HRG models. Perform hyperparameter tuning and iterative optimization to balance model performance and computational efficiency for Thai language nuances.
Phase 3: Comprehensive Evaluation & Refinement
Evaluate models using precision, recall, F1, BERTScore, and BLEU. Compare against baseline and TCM models. Address performance inconsistencies, especially for HRG, and refine based on linguistic challenges.
Phase 4: Clinical Validation & Deployment Preparation
Collaborate with TTM practitioners for clinical validation. Prepare models for practical applications, including API access (Hugging Face) and user interface (GitHub), ensuring real-world relevance and safety.
Phase 5: Future Enhancements & Expansion
Plan for dataset expansion, exploration of larger-scale models, multilingual support, and continuous improvement based on user feedback and new TTM knowledge to further enhance accuracy and generalizability.
Ready to Transform Your Enterprise?
Connect with our AI specialists to explore how AppHerb or a custom AI solution can drive innovation and efficiency within your organization.