Skip to main content
Enterprise AI Analysis: VILLAIN at AVerImaTeC: Verifying Image–Text Claims via Multi-Agent Collaboration

VILLAIN at AVerImaTeC: Verifying Image–Text Claims via Multi-Agent Collaboration

A cutting-edge multimodal fact-checking system, VILLAIN, utilizes prompt-based multi-agent collaboration with vision-language models to verify image-text claims, achieving top performance in the AVerImaTeC shared task.

Unlocking Advanced Multimodal Fact-Checking

VILLAIN demonstrates how multi-agent collaboration with VLMs can achieve superior performance in verifying complex image-text claims, setting a new benchmark for automated fact-checking systems.

0 Veracity Score (Test Set)
0 Leaderboard Rank
0 Q-Eval Score
0 Evid-Eval Score

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

AVerImaTeC Shared Task Focus

Real-world Image-Text Claims Automated verification using external evidence

Case Study: The Need for Multimodal Fact-Checking

Challenge: The proliferation of misleading multimodal content necessitates robust automated verification systems capable of handling both images and text.

Solution: VILLAIN addresses this by employing a multi-agent system with VLM capabilities to process and verify image-text claims.

Impact: Achieving state-of-the-art performance in real-world scenarios, VILLAIN sets a new standard for accuracy and reliability in multimodal fact-checking.

Enterprise Process Flow

Knowledge Store Ktxt
Knowledge Store Kimg
Knowledge Store K1
Textual/Visual Evidence Retrieval
Evidence Enrichment (URL content filling)

Evidence Retrieval Mechanism Comparison

Feature VILLAIN Baseline
Text Embedding Model
  • mxbai-embed-large-v1
  • Generic embeddings
Text Reranking Model
  • mxbai-rerank-large-v1
  • No reranking
Visual Embedding Model
  • Ops-MM-embedding-v1-7B
  • No dedicated VLM
Evidence Enrichment
  • URL content filling via Playwright
  • No enrichment

Enterprise Process Flow

Claim
Retrieved Evidence (Textual, Image, Cross-modal)
Text-Text Agent (ATT)
Image-Text Agent (AIT)
Cross-Modal Agent (ACM)
Analysis Outputs (OTT, OIT, OCM)

Impact of Evidence Analysis Agents

+0.040 Evid-Eval score improvement for Gemini-2.5-Flash with agents

Enterprise Process Flow

Claim, Analysis Outputs
AQA Agent (VLM)
Generate 5 QA pairs (iteration i)
Append to existing QA pairs
Loop (4 iterations for 20 total QA pairs)

Number of QA Pairs Generated

20 Total high-impact Question-Answer pairs

Enterprise Process Flow

Claim
Generated QA Pairs (Q)
Verdict Prediction Agent (Av)
Select Top-10 Relevant QA Pairs (Q*)
Predict Verdict (v)
Generate Justification (j)

Veracity Score on Test Set

0.546 VILLAIN's top performance across all metrics

Leaderboard Performance Comparison (Test Set)

Feature VILLAIN (HUMANE) ADA-AGGR
Veracity Score
  • 0.546 (1st)
  • 0.537
Q-Eval Score
  • 0.890 (1st)
  • 0.370
Evid-Eval Score
  • 0.536 (1st)
  • 0.463
Justification Score
  • 0.556 (1st)
  • 0.433

Case Study: Ablation Study: Knowledge Store Filling

Challenge: Many original evidence entries had empty URL2text fields or generic content, limiting the factual information available for verification.

Solution: Implemented an automated URL content extraction pipeline using Playwright to fill missing text fields and filter irrelevant content.

Impact: Consistently improved Evid-Eval scores across all models (e.g., +0.03 for Gemini-2.5-Flash), demonstrating the value of richer, cleaner evidence.

Gemini-2.5-Pro Performance

Superior Outperforms other models across most evaluation metrics

Case Study: Effectiveness of Multi-Agent Collaboration

Challenge: Verifying complex multimodal claims requires robust evidence processing, analysis, and reasoning across different modalities.

Solution: VILLAIN's multi-agent pipeline leverages VLM agents for modality-specific and cross-modal analysis, iterative QA generation, and final verdict prediction with justification.

Impact: Experimental results confirm VILLAIN's consistent outperformance, highlighting the effectiveness of multi-agent collaboration and iterative reasoning for multimodal fact-checking.

Calculate Your Potential ROI

Estimate the potential cost savings and efficiency gains your organization could achieve with AI implementation.

Annual Cost Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A phased approach to integrate AI seamlessly into your enterprise operations, ensuring maximum impact with minimal disruption.

Phase 1: Discovery & Strategy

Comprehensive analysis of existing workflows, identification of AI opportunities, and development of a tailored AI strategy.

Phase 2: Pilot & Prototyping

Deployment of AI prototypes in a controlled environment, performance evaluation, and iterative refinement based on feedback.

Phase 3: Integration & Scaling

Seamless integration of AI solutions into core enterprise systems and phased rollout across relevant departments.

Phase 4: Optimization & Monitoring

Continuous monitoring of AI system performance, ongoing optimization, and adaptation to evolving business needs.

Ready to Transform Your Enterprise?

Partner with OwnYourAI to navigate the complexities of AI adoption and unlock unparalleled operational efficiency and innovation.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking