Enterprise AI Analysis
Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR
This paper introduces Nwāchā Munā, a 5.39-hour Devanagari speech corpus for Nepal Bhasha (Newari), and establishes the first script-preserving ASR benchmark. It demonstrates that proximal cross-lingual transfer from Nepali can match multilingual pretraining (Whisper-Small) in an ultra-low-resource setting, significantly reducing Character Error Rate (CER) from 52.54% to 17.59% with data augmentation, while using fewer parameters. This computationally efficient approach is crucial for digitally enabling endangered languages.
Key Performance Indicators
This research highlights significant advancements in ASR for low-resource languages, demonstrating tangible improvements.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
The Nwāchā Munā corpus was meticulously curated from digital and print media, including Wikipedia, OSCAR dataset, regional newspapers, and primary school textbooks, totaling 5.39 hours of transcribed speech. A dual-modal strategy involved original field recordings and web-sourced audio, normalized to 16 KHz mono-channel WAV. Data augmentation techniques like speed perturbation, volume randomization, and noise injection proved critical, reducing CER from 52.54% to 17.59% for NepConformer, matching Whisper-Small’s performance with fewer parameters. Pseudo-labeling was attempted but degraded performance due to domain shift from broadcast data, highlighting the importance of domain alignment.
Newari ASR System Development Flow
The study utilized a comparative experimental framework to evaluate cross-lingual transfer strategies. A zero-shot NepConformer baseline yielded 52.54% CER, highlighting the need for explicit fine-tuning. Supervised fine-tuning of NepConformer reduced CER to 18.72%, comparable to Whisper-Small (18.76%), demonstrating that acoustic and orthographic proximity between Nepali and Newari enables effective encoder reuse. Decoder-only fine-tuning yielded similar results (18.77% CER), indicating the pre-trained Nepali encoder features are sufficiently generalized. Shallow fusion with an external KenLM n-gram model further improved lexical regularity.
| Strategy | CER (%) |
|---|---|
| Zero-Shot NepConformer | 52.54 |
| NepConformer (Fine-tuned on base data) | 18.72 |
| Whisper-Small (Fine-tuned on base data) | 18.76 |
| NepConformer + Augmented Data | 17.59 |
| Whisper-Small + Augmentation | 17.88 |
Proximal Transfer Success Story
The study demonstrated that proximal transfer from Nepali to Newari can yield performance comparable to large multilingual models like Whisper-Small. This is attributed to the acoustic and orthographic proximity between the two languages, enabling efficient reuse of pre-trained encoder features. This approach offers a computationally efficient alternative for developing ASR systems for ultra-low-resource South Asian languages.
Calculate Your Potential AI ROI
Estimate the transformative impact of AI on your enterprise by understanding potential cost savings and efficiency gains.
Your AI Implementation Roadmap
Our structured approach ensures a seamless and successful integration of AI, tailored to your enterprise needs.
Phase 1: Data Curation & Augmentation
Gather and meticulously transcribe speech data from target endangered language. Implement domain-aligned data augmentation strategies to expand the dataset effectively, avoiding domain shift issues.
Phase 2: Proximal Model Adaptation
Fine-tune a pre-trained ASR model from a proximal, higher-resource language. Focus on encoder reuse and decoder adaptation to leverage existing acoustic representations while adapting to target language's linguistic patterns.
Phase 3: Language Model Integration & Refinement
Integrate external n-gram language models via shallow fusion to enhance lexical regularity. Conduct iterative evaluation and error analysis to refine the model for optimal transcription accuracy.
Phase 4: Community Engagement & Deployment
Collaborate with language communities for feedback and ongoing data collection. Deploy the ASR system as an open-source tool, enabling broader digital access and linguistic preservation efforts.
Ready to Transform Your Enterprise with AI?
Our experts are ready to discuss your unique challenges and design a bespoke AI solution that drives measurable results.