Skip to main content
Enterprise AI Analysis: ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

AI ANALYSIS REPORT

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

This paper introduces ICON, a novel framework for defending against indirect prompt injection (IPI) attacks in LLM agents. ICON combines a latent space trace prober for detection and a mitigating rectifier for surgical attention steering, achieving high security (0.4% ASR) and significant utility preservation (>50% gain) with minimal training cost (<2 mins). It demonstrates robust generalization and multimodal applicability.

Author: Che Wang et al. | Published: February 25, 2026

Executive Impact & Key Findings

ICON redefines AI agent security, delivering robust protection against sophisticated attacks while significantly enhancing operational efficiency and preserving critical task continuity.

0 Average ASR
0 Utility Gain
0 Training Time

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Defense Mechanism
Performance Analysis
Generalization & Efficiency

Novel Probing-to-Mitigation Framework

ICON introduces a unique two-step framework: the Latent Space Trace Prober (LSTP) detects IPI attacks by identifying 'attention collapse' patterns in the LLM's latent space, and the Mitigating Rectifier (MR) then performs surgical attention steering to neutralize threats without disrupting task continuity. This moves beyond traditional binary refusal mechanisms.

ICON's Operational Flow

User-Agent Interaction
Latent Space Trace Prober (Detection)
Mitigating Rectifier (Correction)
Fixed Output (Corrected)

Achieved Attack Success Rate (ASR)

0.4% Competitive Security

ICON achieves a competitive 0.4% ASR, matching commercial-grade detectors while significantly improving utility, demonstrating robust security.

Task Utility Preservation

>50% Functional Continuity

The framework yields over 50% task utility gain, outperforming existing defenses that often sacrifice utility for security by prematurely terminating workflows.

Functionality Template Tool-Filter Fine-tuning ICON
Security X X X
Utility X X X
Efficiency X X X
ICON achieves a superior balance between security, efficiency, and utility compared to existing methods, as summarized in the paper's table.

Robust OOD Generalization & Multimodal Support

ICON demonstrates robust Out-of-Distribution (OOD) generalization, effectively extending to multimodal agents. Its training requires only hundreds of samples and takes less than two minutes to converge, establishing a superior balance between security and efficiency unmatched by large-scale fine-tuning methods like Qwen3Guard.

Inference-Time Correction Principle

A key advantage of ICON is its inference-time correction capability. Unlike methods that require extensive retraining, ICON's plug-and-play design allows for agile deployment and adaptation in dynamic environments without compromising the agent's functional continuity. This ensures real-time security without impacting operational flow.

Calculate Your Potential AI ROI

Estimate the transformative impact ICON could have on your enterprise operations. Input your company's data for a personalized projection.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your Implementation Roadmap

A typical journey to integrate ICON into your agentic systems, tailored for rapid deployment and maximum impact.

Phase 1: Discovery & Strategy

Initial consultation to understand your existing LLM agent architectures and identify key IPI vulnerabilities. Define custom security policies and integration points for ICON.

Phase 2: ICON Integration & Calibration

Seamlessly integrate ICON's lightweight prober and rectifier modules. Rapid calibration using a small, synthesized dataset (minutes, not hours) to optimize for your specific agentic workflows and threat landscape.

Phase 3: Testing & Validation

Comprehensive testing against adaptive IPI attack benchmarks, including OOD scenarios and multimodal agents, to ensure robust security and optimal task utility. Fine-tune steering parameters for peak performance.

Phase 4: Deployment & Monitoring

Go-live with enhanced IPI defense. Continuous monitoring of agent performance and security posture, with ongoing support and adaptive updates to counter emerging threats.

Ready to Secure Your AI Agents?

Don't let indirect prompt injection compromise your autonomous systems. Schedule a personalized strategy session with our experts to fortify your enterprise AI.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking