Skip to main content
Enterprise AI Analysis: WHEN AGENTS PERSUADE: PROPAGANDA GENERATION AND MITIGATION IN LLMS

AI Ethics & Safety

WHEN AGENTS PERSUADE: PROPAGANDA GENERATION AND MITIGATION IN LLMS

This research investigates the capabilities of Large Language Models (LLMs) to generate propagandistic content and evaluates methods for mitigating such behavior. Utilizing domain-specific detection models, the study finds that LLMs readily produce propaganda employing various rhetorical techniques like loaded language, flag-waving, and appeals to fear. Notably, fine-tuning methods, particularly Odds Ratio Preference Optimization (ORPO), significantly reduce the LLMs' propensity to generate manipulative content, with ORPO demonstrating the highest effectiveness.

Executive Impact: Key Findings

0 Propaganda Generation Rate (Untuned LLM)
0 Technique Reduction (ORPO vs. Untuned)
0 ORPO Effectiveness (Propaganda Rate)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

99% of GPT-4o & Mistral 3 outputs classified as propaganda when prompted.

LLM Propaganda Mitigation Process

Prompt LLMs to Generate Propaganda
Analyze Outputs with Detection Models
Apply Supervised Fine-Tuning (SFT)
Implement Direct Preference Optimization (DPO)
Utilize Odds Ratio Preference Optimization (ORPO)
Significantly Reduce Propagandistic Content
Technique Human Propagandistic Use LLM Propagandistic Use
Loaded Language Moderate High (emotional rhetoric)
Flag-Waving Moderate High (patriotic narratives)
Appeal to Fear Moderate High (fear-based manipulation)
Name-Calling Moderate Varied (GPT-4o similar, Llama/Mistral lower)
Exaggeration/Minimization Moderate High (hyperbolic content)

Mitigation Efficacy: ORPO's Impact

Our evaluation highlights ORPO as the most effective fine-tuning method. Compared to un-fine-tuned models, ORPO reduced the average number of propaganda techniques per article by 13.4x. While prompt-level guardrails were easily overridden, baking 'no propaganda' into model weights through ORPO proved highly robust, yielding only 10% propaganda outputs compared to 99% for untuned models.

Calculate Your Potential AI ROI

Estimate the efficiency gains and cost savings by implementing AI solutions tailored to your enterprise needs, considering the mitigation of harmful content generation.

Annual Savings $0
Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A strategic phased approach to integrate AI responsibly, minimizing risks of manipulative content while maximizing operational benefits.

Phase 1: Initial Assessment & Model Selection

Evaluate current LLM usage and identify areas susceptible to manipulative content. Select appropriate base models for fine-tuning.

Phase 2: Data Curation & Annotation

Gather and annotate domain-specific datasets for propaganda and rhetorical technique detection, and for preference alignment.

Phase 3: Fine-Tuning & Mitigation Strategy

Apply SFT, DPO, or ORPO to reduce propagandistic outputs. Implement robust guardrails.

Phase 4: Validation & Deployment

Rigorously test mitigated models using human and automated evaluations. Deploy models with continuous monitoring.

Ready to Own Your AI?

Schedule a free consultation to discuss how our enterprise AI solutions can be tailored to your organization, ensuring ethical and powerful deployment.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking