Enterprise AI Analysis

Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR

This paper introduces Nwāchā Munā, a 5.39-hour Devanagari speech corpus for Nepal Bhasha (Newari), and establishes the first script-preserving ASR benchmark. It demonstrates that proximal cross-lingual transfer from Nepali can match multilingual pretraining (Whisper-Small) in an ultra-low-resource setting, significantly reducing Character Error Rate (CER) from 52.54% to 17.59% with data augmentation, while using fewer parameters. This computationally efficient approach is crucial for digitally enabling endangered languages.

Schedule Your AI Strategy Session

Key Performance Indicators

This research highlights significant advancements in ASR for low-resource languages, demonstrating tangible improvements.

5.39 hrs Hours of Newari Speech Data

52.54% Zero-Shot CER Baseline

17.59% Augmented Data CER (NepConformer)

3 hrs Hours Training Time (Conformer)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Data Collection & Augmentation

Model Training & Transfer Learning

The Nwāchā Munā corpus was meticulously curated from digital and print media, including Wikipedia, OSCAR dataset, regional newspapers, and primary school textbooks, totaling 5.39 hours of transcribed speech. A dual-modal strategy involved original field recordings and web-sourced audio, normalized to 16 KHz mono-channel WAV. Data augmentation techniques like speed perturbation, volume randomization, and noise injection proved critical, reducing CER from 52.54% to 17.59% for NepConformer, matching Whisper-Small’s performance with fewer parameters. Pseudo-labeling was attempted but degraded performance due to domain shift from broadcast data, highlighting the importance of domain alignment.

17.59% Final CER achieved with Augmented Data (NepConformer)

Newari ASR System Development Flow

Textual Data Acquisition (5727 sentences)

→

Audio Data Acquisition (5.39 hrs)

→

Data Pre-processing (16 KHz mono WAV)

→

Acoustic Modeling (Conformer/Whisper)

→

Language Modeling & Decoding (KenLM)

→

Evaluation (CER)

The study utilized a comparative experimental framework to evaluate cross-lingual transfer strategies. A zero-shot NepConformer baseline yielded 52.54% CER, highlighting the need for explicit fine-tuning. Supervised fine-tuning of NepConformer reduced CER to 18.72%, comparable to Whisper-Small (18.76%), demonstrating that acoustic and orthographic proximity between Nepali and Newari enables effective encoder reuse. Decoder-only fine-tuning yielded similar results (18.77% CER), indicating the pre-trained Nepali encoder features are sufficiently generalized. Shallow fusion with an external KenLM n-gram model further improved lexical regularity.

Comparison of ASR Model Strategies (CER %)

Strategy	CER (%)
Zero-Shot NepConformer	52.54
NepConformer (Fine-tuned on base data)	18.72
Whisper-Small (Fine-tuned on base data)	18.76
NepConformer + Augmented Data	17.59
Whisper-Small + Augmentation	17.88

Proximal Transfer Success Story

The study demonstrated that proximal transfer from Nepali to Newari can yield performance comparable to large multilingual models like Whisper-Small. This is attributed to the acoustic and orthographic proximity between the two languages, enabling efficient reuse of pre-trained encoder features. This approach offers a computationally efficient alternative for developing ASR systems for ultra-low-resource South Asian languages.

Calculate Your Potential AI ROI

Estimate the transformative impact of AI on your enterprise by understanding potential cost savings and efficiency gains.

Your Industry

Number of Employees Impacted

Average Hours Saved Per Employee Per Week

Average Hourly Cost Per Employee ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Your AI Implementation Roadmap

Our structured approach ensures a seamless and successful integration of AI, tailored to your enterprise needs.

Phase 1: Data Curation & Augmentation

Gather and meticulously transcribe speech data from target endangered language. Implement domain-aligned data augmentation strategies to expand the dataset effectively, avoiding domain shift issues.

Phase 2: Proximal Model Adaptation

Fine-tune a pre-trained ASR model from a proximal, higher-resource language. Focus on encoder reuse and decoder adaptation to leverage existing acoustic representations while adapting to target language's linguistic patterns.

Phase 3: Language Model Integration & Refinement

Integrate external n-gram language models via shallow fusion to enhance lexical regularity. Conduct iterative evaluation and error analysis to refine the model for optimal transcription accuracy.

Phase 4: Community Engagement & Deployment

Collaborate with language communities for feedback and ongoing data collection. Deploy the ASR system as an open-source tool, enabling broader digital access and linguistic preservation efforts.

Ready to Transform Your Enterprise with AI?

Our experts are ready to discuss your unique challenges and design a bespoke AI solution that drives measurable results.

Book a Free AI Consultation

Enterprise AI Analysis

Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR

Key Performance Indicators

Deep Analysis & Enterprise Applications

Newari ASR System Development Flow

Comparison of ASR Model Strategies (CER %)

Proximal Transfer Success Story

Calculate Your Potential AI ROI

Your AI Implementation Roadmap

Phase 1: Data Curation & Augmentation

Phase 2: Proximal Model Adaptation

Phase 3: Language Model Integration & Refinement

Phase 4: Community Engagement & Deployment

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai