Skip to main content
Enterprise AI Analysis: Sentiment Analysis and Emotion Classification using Machine Learning Techniques for Nagamese Language – A Low-resource Language

AI Research Analysis

Sentiment Analysis and Emotion Classification for Nagamese Language

This pioneering research explores the application of machine learning techniques for sentiment analysis and emotion classification in Nagamese, a low-resource creole language. It introduces the first sentiment polarity lexicon and an annotated corpus for Nagamese, paving the way for advanced NLP applications in the region.

Executive Impact & Key Findings

Unlock the strategic advantages of NLP for low-resource languages. This study demonstrates the feasibility and accuracy of sentiment and emotion detection, opening new avenues for localized AI solutions.

0 Sentiment Polarity Accuracy (SVM)
0 Emotion Classification Accuracy (SVM)
0 Nagamese Words in Lexicon
0 Annotated Sentences

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Pioneering Nagamese NLP

The Nagamese language, also known as Naga Pidgin, serves as a vital creole language for communication in Nagaland, North-East India. Despite its widespread use in media and government, it is a low-resource language, lacking the linguistic tools common for languages like English or Hindi. This study marks the first attempt at sentiment analysis and emotion classification for Nagamese, aiming to detect polarity (positive, negative, neutral) and basic emotions in textual content.

The research builds a sentiment polarity lexicon of 1,195 Nagamese words, using these along with other features for supervised machine learning techniques such as Naïve Bayes and Support Vector Machines.

Understanding Nagamese Language

Nagamese is an Assamese-lexified creole, spoken widely across Nagaland. It features 28 phonemes (6 vowels, 22 consonants) and exhibits disyllabic, trisyllabic, and tetrasyllabic word structures. Grammatically, it is a Subject-Object-Verb (SOV) language, similar to other Indian languages like Hindi and Assamese.

Key resources for Nagamese include a Nagamese-English-Assamese dictionary by Bhim Kanta Boruah (with POS information), online news articles from nagamesekhobor.com, the Nagamese Bible, and various books/hymns. Social media content also serves as a source for language data.

Context from Assamese NLP Research

Given Nagamese's Assamese-lexified nature, this section highlights related works in Assamese sentiment analysis, which inform the approach for a low-resource language.

  • Das et al. (2021) developed a sentiment polarity model for Assamese news using lexical features and ML classifiers.
  • Dev et al. (2021) adapted "Vader" for Assamese texts, leveraging "Bengali-Vader."
  • Recent works (Das et al. 2022, 2023, 2024; Dev et al. 2023a, 2023b) have explored multimodal sentiment analysis, deep neural networks (LSTM-CNN), and supervised ML on translated and user-generated Assamese content, demonstrating various feature engineering and model approaches.

Building the Nagamese Corpus

A crucial step for any learning model is the creation of a training dataset. For this study, an annotated corpus of 594 Nagamese sentences was manually labeled for sentiment polarity (positive, negative, neutral) and basic emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust).

The unlabelled dataset comprised approximately 25k tokens and 1,195 unique words, collected from online Nagamese News articles (nagamesekhobor.com) and bible passages (bible.com). This mixed corpus provides varied content from current affairs to religious texts, ensuring a diverse representation of the language.

AI Methodology & Feature Engineering

The core methodology involves building a sentiment dictionary of 1,195 unique Nagamese words, categorized into positive (162), negative (162), and neutral (871) sentiments. This dictionary is instrumental in generating features for classification.

Enterprise Process Flow: Nagamese Sentiment Analysis

Collect Nagamese Text Data
Create Annotated Corpus (594 Sentences)
Build Sentiment Polarity Lexicon (1,195 Words)
Generate Lexical & Other Features
Train ML Classifiers (Naïve Bayes, SVM)
Predict Sentiment Polarity & Emotion

Features generated include counts of positive, negative, and neutral words, intensity words, sentence length, and occurrences of specific Nagamese words ("bisi", "olop"), emoticons (😊, ☹️), exclamation marks, and question marks. A refined set of 9 best features was ultimately used for prediction.

Experimental Design & Model Selection

The experiments utilized Naïve Bayes (GaussianNB) and Support Vector Machines (SVC) classifiers, chosen for their established performance in sentiment analysis tasks. The class labels for sentence polarity were positive, negative, and neutral, while emotions included anger, anticipation, disgust, fear, joy, sadness, surprise, and trust.

The annotated dataset of 594 sentences was split into 494 sentences for training and 100 sentences for testing. Implementation was carried out using the scikit-learn library in Python. For SVM, experiments were conducted with linear, poly, and rbf kernels, with the best results reported using the rbf kernel.

Performance Outcomes & Insights

71% Highest Sentiment Polarity Accuracy Achieved (SVM with RBF Kernel)
67% Highest Emotion Classification Accuracy Achieved (SVM with Poly Kernel)

SVM significantly outperformed Naïve Bayes, achieving 71% accuracy for sentiment polarity and 67% for emotion classification. Detailed performance metrics (precision, recall, f1-score) revealed strong results for specific sentiment classes and emotions.

SVM Polarity Classification Performance

Sentiment Class Precision Recall F1-score
Negative0.640.540.58
Neutral0.700.680.69
Positive0.730.780.76

Detailed performance metrics for SVM classification of sentiment polarity classes, showing the highest F1-score for positive sentiment.

Emotion classification showed F1-scores of 1.00 for fear and surprise, but 0.00 for anger and trust, indicating challenges in detecting these specific emotions with the current dataset and methods. Confusion matrices highlighted common misclassifications, such as anticipation to joy and trust to anticipation.

Conclusion & Future Directions

This work successfully demonstrates the first sentiment polarity and emotion classification for the Nagamese Language. By building an annotated corpus of 594 sentences and a sentiment dictionary of 1,195 Nagamese words, the study provides foundational resources for future NLP efforts.

The use of Naïve Bayes and SVM classifiers yielded promising results, with SVM achieving 71% accuracy for polarity and 67% for emotion. Future work includes creating a larger dataset and exploring more advanced machine and deep learning models to further improve accuracy and address classification challenges for specific emotions.

Projected ROI for Your Enterprise

Estimate the potential cost savings and efficiency gains by integrating AI into your operational workflows, based on industry benchmarks and our predictive models.

Configure Your Enterprise Profile

Estimated Annual Impact

Potential Annual Savings --
Hours Reclaimed Annually --

Your AI Implementation Roadmap

A typical enterprise AI journey, from initial strategy to scaled deployment, outlining key phases and expected outcomes.

Discovery & Strategy

Duration: 2-4 Weeks
In-depth analysis of current workflows, identification of AI opportunities, data readiness assessment, and defining success metrics. Development of a tailored AI strategy and roadmap.

Pilot & Proof of Concept (POC)

Duration: 6-12 Weeks
Development and deployment of a small-scale AI pilot project focusing on high-impact areas. Rapid prototyping, model training, and initial performance validation. Demonstrating tangible value.

Integration & Optimization

Duration: 10-20 Weeks
Seamless integration of AI solutions into existing enterprise systems. Continuous model fine-tuning, performance monitoring, and iterative improvements based on real-world feedback. User training and change management.

Scale & Sustain

Duration: Ongoing
Expansion of AI capabilities across the organization, enabling new use cases and enhancing existing ones. Establishing governance frameworks, MLOps, and a culture of continuous AI innovation and value realization.

Ready to Transform Your Enterprise with AI?

Our experts are ready to guide you through the complexities of AI adoption, from strategic planning to successful implementation. Let's build your competitive advantage.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking