AI ANALYSIS REPORT
ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
This paper introduces ICON, a novel framework for defending against indirect prompt injection (IPI) attacks in LLM agents. ICON combines a latent space trace prober for detection and a mitigating rectifier for surgical attention steering, achieving high security (0.4% ASR) and significant utility preservation (>50% gain) with minimal training cost (<2 mins). It demonstrates robust generalization and multimodal applicability.
Author: Che Wang et al. | Published: February 25, 2026
Executive Impact & Key Findings
ICON redefines AI agent security, delivering robust protection against sophisticated attacks while significantly enhancing operational efficiency and preserving critical task continuity.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Novel Probing-to-Mitigation Framework
ICON introduces a unique two-step framework: the Latent Space Trace Prober (LSTP) detects IPI attacks by identifying 'attention collapse' patterns in the LLM's latent space, and the Mitigating Rectifier (MR) then performs surgical attention steering to neutralize threats without disrupting task continuity. This moves beyond traditional binary refusal mechanisms.
ICON's Operational Flow
Achieved Attack Success Rate (ASR)
0.4% Competitive SecurityICON achieves a competitive 0.4% ASR, matching commercial-grade detectors while significantly improving utility, demonstrating robust security.
Task Utility Preservation
>50% Functional ContinuityThe framework yields over 50% task utility gain, outperforming existing defenses that often sacrifice utility for security by prematurely terminating workflows.
| Functionality | Template | Tool-Filter | Fine-tuning | ICON |
|---|---|---|---|---|
| Security | X | X | X | ✓ |
| Utility | X | X | X | ✓ |
| Efficiency | X | X | X | ✓ |
| ICON achieves a superior balance between security, efficiency, and utility compared to existing methods, as summarized in the paper's table. | ||||
Robust OOD Generalization & Multimodal Support
ICON demonstrates robust Out-of-Distribution (OOD) generalization, effectively extending to multimodal agents. Its training requires only hundreds of samples and takes less than two minutes to converge, establishing a superior balance between security and efficiency unmatched by large-scale fine-tuning methods like Qwen3Guard.
Inference-Time Correction Principle
A key advantage of ICON is its inference-time correction capability. Unlike methods that require extensive retraining, ICON's plug-and-play design allows for agile deployment and adaptation in dynamic environments without compromising the agent's functional continuity. This ensures real-time security without impacting operational flow.
Calculate Your Potential AI ROI
Estimate the transformative impact ICON could have on your enterprise operations. Input your company's data for a personalized projection.
Your Implementation Roadmap
A typical journey to integrate ICON into your agentic systems, tailored for rapid deployment and maximum impact.
Phase 1: Discovery & Strategy
Initial consultation to understand your existing LLM agent architectures and identify key IPI vulnerabilities. Define custom security policies and integration points for ICON.
Phase 2: ICON Integration & Calibration
Seamlessly integrate ICON's lightweight prober and rectifier modules. Rapid calibration using a small, synthesized dataset (minutes, not hours) to optimize for your specific agentic workflows and threat landscape.
Phase 3: Testing & Validation
Comprehensive testing against adaptive IPI attack benchmarks, including OOD scenarios and multimodal agents, to ensure robust security and optimal task utility. Fine-tune steering parameters for peak performance.
Phase 4: Deployment & Monitoring
Go-live with enhanced IPI defense. Continuous monitoring of agent performance and security posture, with ongoing support and adaptive updates to counter emerging threats.
Ready to Secure Your AI Agents?
Don't let indirect prompt injection compromise your autonomous systems. Schedule a personalized strategy session with our experts to fortify your enterprise AI.