AI ANALYSIS REPORT

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

This paper introduces ICON, a novel framework for defending against indirect prompt injection (IPI) attacks in LLM agents. ICON combines a latent space trace prober for detection and a mitigating rectifier for surgical attention steering, achieving high security (0.4% ASR) and significant utility preservation (>50% gain) with minimal training cost (<2 mins). It demonstrates robust generalization and multimodal applicability.

Author: Che Wang et al. | Published: February 25, 2026

Schedule Your Strategy Session

Executive Impact & Key Findings

ICON redefines AI agent security, delivering robust protection against sophisticated attacks while significantly enhancing operational efficiency and preserving critical task continuity.

0 Average ASR

0 Utility Gain

0 Training Time

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Defense Mechanism

Performance Analysis

Generalization & Efficiency

Novel Probing-to-Mitigation Framework

ICON introduces a unique two-step framework: the Latent Space Trace Prober (LSTP) detects IPI attacks by identifying 'attention collapse' patterns in the LLM's latent space, and the Mitigating Rectifier (MR) then performs surgical attention steering to neutralize threats without disrupting task continuity. This moves beyond traditional binary refusal mechanisms.

ICON's Operational Flow

User-Agent Interaction

→

Latent Space Trace Prober (Detection)

→

Mitigating Rectifier (Correction)

→

Fixed Output (Corrected)

Achieved Attack Success Rate (ASR)

0.4% Competitive Security

ICON achieves a competitive 0.4% ASR, matching commercial-grade detectors while significantly improving utility, demonstrating robust security.

Task Utility Preservation

>50% Functional Continuity

The framework yields over 50% task utility gain, outperforming existing defenses that often sacrifice utility for security by prematurely terminating workflows.

Functionality	Template	Tool-Filter	Fine-tuning	ICON
Security	X	X	X	✓
Utility	X	X	X	✓
Efficiency	X	X	X	✓
ICON achieves a superior balance between security, efficiency, and utility compared to existing methods, as summarized in the paper's table.

Robust OOD Generalization & Multimodal Support

ICON demonstrates robust Out-of-Distribution (OOD) generalization, effectively extending to multimodal agents. Its training requires only hundreds of samples and takes less than two minutes to converge, establishing a superior balance between security and efficiency unmatched by large-scale fine-tuning methods like Qwen3Guard.

Inference-Time Correction Principle

A key advantage of ICON is its inference-time correction capability. Unlike methods that require extensive retraining, ICON's plug-and-play design allows for agile deployment and adaptation in dynamic environments without compromising the agent's functional continuity. This ensures real-time security without impacting operational flow.

Calculate Your Potential AI ROI

Estimate the transformative impact ICON could have on your enterprise operations. Input your company's data for a personalized projection.

Your Industry Sector

Number of Employees (Impacted by AI)

Average Daily Hours on Repetitive Tasks

Average Hourly Cost Per Employee ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Discuss Your Implementation

Your Implementation Roadmap

A typical journey to integrate ICON into your agentic systems, tailored for rapid deployment and maximum impact.

Phase 1: Discovery & Strategy

Initial consultation to understand your existing LLM agent architectures and identify key IPI vulnerabilities. Define custom security policies and integration points for ICON.

Phase 2: ICON Integration & Calibration

Seamlessly integrate ICON's lightweight prober and rectifier modules. Rapid calibration using a small, synthesized dataset (minutes, not hours) to optimize for your specific agentic workflows and threat landscape.

Phase 3: Testing & Validation

Comprehensive testing against adaptive IPI attack benchmarks, including OOD scenarios and multimodal agents, to ensure robust security and optimal task utility. Fine-tune steering parameters for peak performance.

Phase 4: Deployment & Monitoring

Go-live with enhanced IPI defense. Continuous monitoring of agent performance and security posture, with ongoing support and adaptive updates to counter emerging threats.

Start Your Secure AI Journey

Ready to Secure Your AI Agents?

Don't let indirect prompt injection compromise your autonomous systems. Schedule a personalized strategy session with our experts to fortify your enterprise AI.

Book a Consultation Now

AI ANALYSIS REPORT

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Executive Impact & Key Findings

Deep Analysis & Enterprise Applications

Novel Probing-to-Mitigation Framework

ICON's Operational Flow

Achieved Attack Success Rate (ASR)

Task Utility Preservation

Robust OOD Generalization & Multimodal Support

Inference-Time Correction Principle

Calculate Your Potential AI ROI

Your Implementation Roadmap

Phase 1: Discovery & Strategy

Phase 2: ICON Integration & Calibration

Phase 3: Testing & Validation

Phase 4: Deployment & Monitoring

Ready to Secure Your AI Agents?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai