Skip to main content
Enterprise AI Analysis: Machine learning and deep learning approaches for fake news detection and related topics in multilingual contexts: a systematic literature review

Enterprise AI Analysis

Machine learning and deep learning approaches for fake news detection and related topics in multilingual contexts: a systematic literature review

The proliferation of fake news in low-resource languages poses significant challenges for information integrity. This systematic review comprehensively evaluates Machine Learning (ML) and Deep Learning (DL) techniques for Fake News Detection (FND) across diverse linguistic contexts, highlighting a critical research gap in low-resource settings compared to extensive monolingual English studies. By analyzing 85 studies, we explore definitions, datasets, evaluation tools, and both traditional and advanced ML/DL methods, identifying key challenges such as computational costs, bias capture in transformer models, and scalability limitations in low-resource or real-time environments. The study provides a roadmap for future research to mitigate biases, improve efficiency, and enhance model applicability.

Executive Impact: Key Findings & Opportunities

Our analysis highlights critical metrics and areas where AI can significantly enhance enterprise operations, particularly in combating misinformation across diverse linguistic contexts.

0 Studies Analyzed
0 Highest Accuracy (Bangla-BERT)
0 XLM-R Average Accuracy (Dravidian)
0 Best ML Accuracy (Hindi AdaBoost)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

ML Techniques
DL Techniques
Datasets
Evaluation Metrics

Machine Learning Approaches for FND

Traditional ML models like Logistic Regression (LR), Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF), k-Nearest Neighbors (kNN), and Multinomial Naive Bayes (MNB) are widely used. These methods often rely on manually crafted features such as TF-IDF. SVM with a linear kernel achieved 96.64% accuracy in Bangla, while AdaBoost reached 90.1% in Hindi. However, their performance can be sensitive to the quality of feature engineering and translation for low-resource languages, potentially losing subtle nuances.

Challenges include the intensive nature of feature engineering and the impact of machine translation quality on accuracy, as seen with Persian tweets where accuracy dropped by 4% after translation to English.

Deep Learning Approaches for FND

Advanced DL techniques, including Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), Convolutional Neural Networks (CNNs), and Transformer models (mBERT, XLM-RoBERTa, ELECTRA), have demonstrated significant advancements. Transformers, in particular, excel by learning rich contextual embeddings from massive text data across languages. Bangla-BERT achieved an impressive 99.41% accuracy, and XLM-R outperformed other models with 98% on Bengali. Hybrid models combining CNN and BiLSTM also show promise.

Despite their power, DL models face challenges such as high computational costs, potential capture of biases from training data, and limitations in low-resource or real-time applications due to reliance on large datasets and complex architectures.

Key Datasets for Fake News Detection in DLS

The research leverages various datasets tailored for diverse linguistic contexts:

  • BanFakeNews / Bangla Fake-Real News: Bangla news from popular portals, categorized into misleading/false, clickbait, and satire.
  • Hindi Data / HinFakeNews: Compiled from Indian fact-checking websites and news sources.
  • Urdu Data: Manually assembled and verified news articles across business, health, showbiz, sports, and technology.
  • Russian Data: Manually monitored online Russian newspapers, annotated for truthfulness.
  • AraNews / AraCOVID19-MFH: Arabic news articles from various countries, with a multi-label dataset for COVID-19 related tweets.
  • Fake.my-COVID19: Malaysian COVID-19 related news (Malay, English, Chinese, Tamil) from Twitter.
  • MM-COVID: Multilingual, multimodal repository for COVID-19 disinformation.
  • TALLIP: Crowdsourced and translated fake news dataset in English, Hindi, Swahili, Indonesian, Vietnamese.
  • ALB-FAKE-NEWS-CORPUS: Aligned true and fake news articles in Albanian.
  • CLIPS Stylometry Investigation (CSI) corpus: Dutch essays and reviews for deception detection.
  • FOOD23: Chinese food safety information.
  • Dravidian_Fake: Telugu, Kannada, Tamil, Malayalam news articles.
  • ETH_FAKE: Amharic news articles, the first dataset for FND in this language.
  • Fake News Filipino: Benchmark dataset for detecting fake news in Filipino.

Standard Evaluation Metrics in FND

Evaluating FND systems in Diverse Language Settings (DLS) is crucial for understanding performance and making improvements. Commonly used metrics include:

  • Accuracy: The ratio of correctly classified instances (True Positives + True Negatives) to the total number of instances. It measures the overall correctness of the model.
  • Precision: The proportion of correctly predicted positive cases (True Positives) among all instances predicted as positive (True Positives + False Positives). High precision indicates a low rate of false positives.
  • Recall (Sensitivity): The proportion of actual positive cases that are correctly identified (True Positives) among all actual positive cases (True Positives + False Negatives). High recall indicates a low rate of false negatives.
  • F1 Score: The harmonic mean of Precision and Recall, providing a balanced measure of a model's performance, especially useful with imbalanced datasets.
  • AUC-ROC Curve: A crucial metric for assessing the performance of classification models across various threshold settings. It measures the model's ability to distinguish between classes, with a higher AUC indicating better separability.
99.41% Highest Accuracy Achieved by Bangla-BERT in FND

Bangla-BERT demonstrated superior performance across all evaluated metrics, achieving an accuracy rate of 99.41% for binary text classification, establishing itself as a new SOTA model for Fake News Detection in the Bangla language.

Enterprise Process Flow

Initialize Databases (e.g., IEEE Xplore, ACM DL)
Set Timeframe (2014 - May 2024)
Define Keywords (e.g., "fake news detection", "multilingual")
Perform Search across Databases
Collect Initial Results (115 papers)
Apply Inclusion/Exclusion Criteria & Remove Duplicates
Final Selection (83 pertinent studies for qualitative assessment)
Comparison of ML vs. DL Effectiveness in FND across DLS
Feature Machine Learning (ML) Deep Learning (DL)
Approach
  • Relies on handcrafted features (TF-IDF, BoWs, linguistic features)
  • Leverages automated feature learning from raw data (embeddings, sequential patterns)
Complexity
  • Simpler, lower computational cost
  • More complex, higher computational cost (PLMs)
Data Requirement
  • Effective with smaller datasets
  • Requires large datasets for optimal performance
Language Adaptability
  • Can struggle with linguistic nuances in diverse languages (translation issues)
  • Multilingual PLMs designed for cross-lingual understanding (mBERT, XLM-R)
Performance (Examples)
  • SVM (Bangla) 96.64% accuracy, RF (Hindi) 90.1%
  • Bangla-BERT 99.41% accuracy, XLM-R (Dravidian) 93.31%
Key Challenge
  • Feature engineering intensive, translation quality
  • Computational demands, potential for bias capture from training data

Challenges in FND for Diverse Language Settings

The detection of fake news in diverse language settings (DLS) faces unique challenges. A primary issue is the ambiguity surrounding the definition of fake news, leading to biases and inaccuracies in data labeling. Most studies focus on textual data, overlooking psychological aspects or intentions behind misinformation. Poor interpretability of DL models and simplification of FND to binary classification also limit effectiveness. Reliable word embeddings are scarce for low-resource languages, with many studies relying on machine translation which can introduce loss of subtle nuances. The limited availability of annotated datasets for low-resource languages further hinders model training. Finally, existing models may not perform well due to generalized training on limited data-sets and unique linguistic features in low-resource languages, posing a domain adaptation challenge. Addressing these requires flexible models and extensive data annotation.

Calculate Your Potential ROI

Estimate the efficiency gains and cost savings your enterprise could realize by implementing advanced AI solutions based on our research.

Estimated Annual Savings $0
Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A strategic phased approach to integrate advanced FND solutions into your enterprise.

Phase 1: Discovery & Strategy Alignment

Duration: 2-4 Weeks

Conduct a comprehensive assessment of existing data infrastructure, current FND processes, and specific multilingual challenges. Define clear project goals, success metrics, and a tailored AI strategy in alignment with your business objectives.

Phase 2: Data Preparation & Model Selection

Duration: 6-10 Weeks

Curate, clean, and annotate multilingual datasets, with emphasis on low-resource languages. Select appropriate ML/DL models (e.g., fine-tuning mBERT or XLM-R) and establish a robust feature engineering pipeline. Develop data augmentation strategies if necessary.

Phase 3: Model Training & Validation

Duration: 8-14 Weeks

Train and optimize selected AI models on prepared datasets. Implement cross-validation and rigorous evaluation using metrics like F1-score and AUC-ROC, with a focus on cross-lingual and domain adaptability. Address potential biases and ensure model interpretability.

Phase 4: Deployment & Continuous Improvement

Duration: Ongoing

Integrate the FND system into your existing platforms, ensuring scalability and real-time performance. Implement continuous monitoring for model drift and new misinformation patterns. Establish a feedback loop for regular model retraining and adaptation to evolving linguistic and cultural contexts.

Ready to Transform Your Enterprise with AI-Powered FND?

Our experts are ready to help you navigate the complexities of multilingual fake news detection and implement a robust, scalable solution. Schedule a personalized consultation to discuss your specific needs and challenges.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking