Skip to main content
Enterprise AI Analysis: INFUSION: Shaping Model Behavior by Editing Training Data via Influence Functions

Enterprise AI Analysis

INFUSION: Shaping Model Behavior by Editing Training Data via Influence Functions

Authors: J Rosser, Edward Grefenstette, Robert Kirk, Jakob Foerster, Laura Ruis

Publication: Published at 3rd DATA-FM workshop @ ICLR 2026, Brazil. (arXiv:2602.09987v5)

Executive Impact Summary

INFUSION introduces a novel, subtle approach to targeted model manipulation through minimal data edits, with significant implications for enterprise AI security and robustness.

0.0% Training Data Affected
0% CIFAR-10 Attack Success Rate
0.0pp Avg. Target Prob. Increase
Bidirectional Cross-Architecture Transfer

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Infusion: A New Frontier in Data Poisoning

This research introduces INFUSION, a novel framework that leverages advanced influence functions to precisely modify training data. Unlike traditional data poisoning that injects explicit malicious examples, INFUSION makes subtle, gradient-based perturbations to existing data. The goal is to steer model behavior towards specific adversarial objectives without revealing the attack objective in the training data itself.

Key Findings for Enterprise: On image classification (CIFAR-10), INFUSION reliably shifts model behavior with minimal data changes (0.2% of documents), proving competitive with more overt direct data injections. Crucially, these engineered perturbations can transfer across different model architectures, suggesting a single poisoned dataset could impact multiple independently trained systems. While effective on smaller, continuous-data models, scaling to large language models with discrete tokens presents challenges, highlighting areas for future defense mechanisms.

These results underscore the critical importance of training data interpretability and robust data provenance in safeguarding enterprise AI systems against sophisticated, hard-to-detect attacks.

Enterprise Process Flow: The INFUSION Pipeline

Define Target Behavior (f(θ))
Identify Influential Training Data (EK-FAC)
Compute Gradient Perturbations (PGD)
Create Infused Training Corpus (z + δ)
Retrain Model (Partial Re-training)
Validate Shifted Model Behavior (f(θ*))

The INFUSION pipeline illustrates how subtle, gradient-guided edits to influential training documents can systematically reshape a model's behavior towards a targeted outcome.

0% CIFAR-10 Attack Success Rate

Across 2,000 experiments, INFUSION successfully increased target class probability, raising top-1 prediction from 10% to 37% with visually imperceptible edits to just 0.2% of data. This demonstrates a highly reliable method for steering model behavior in image classification tasks.

Method Target Class Probability Change (Δp)
INFUSION (0.2% data budget) Substantially outperforms random noise and single explicit injections, rivaling large-scale direct label injection for targeted behavior shifts.
Random Noise Perturbations Near-zero effect, confirming that gradient-guided directions are essential for effective model manipulation.
Probe Insert (1 example) Limited effect, showing that single, explicit examples are not enough to reliably steer model behavior at scale.
Probe Insert (100 examples) High Δp, but requires direct injection of desired behaviors with explicit labels, making it more detectable.
INFUSION vs. Baseline Attacks: A comparison highlighting the efficacy of influence-guided perturbations over other poisoning strategies.

Cross-Architecture Transfer of Attacks

Challenge: Can a data poisoning attack crafted for one model architecture impact another? Enterprise environments often deploy diverse models, making such transferability a critical security concern.

INFUSION's Findings: Datasets infused with INFUSION-generated perturbations, originally optimized for a specific model (e.g., ResNet), can induce targeted misclassifications when used to train different architectures (e.g., CNN). This cross-architecture transfer is possible in both directions (ResNet ↔ CNN), though often weak and asymmetric, with CNN→ResNet transfer proving stronger in some cases.

Enterprise Implication: This suggests that perturbations capture generalizable features, posing risks for organizations using open-weight models. An adversary could craft sophisticated attacks on a publicly available proxy model and then apply these poisoned datasets to proprietary systems, even if they use different underlying architectures. This highlights the need for robust data provenance and security measures beyond architectural diversity.

Resistance of High-Confidence Models

INFUSION faces limitations when attempting to alter behaviors in models that exhibit high certainty in their learned tasks. While the target probabilities for adversarial objectives do increase, the model's overall confidence in its original, correct predictions often remains largely unchanged. This suggests that highly ingrained or well-learned behaviors have limited headroom for shifts induced by small perturbations, acting as a natural robustness mechanism against subtle data manipulation.

Exploiting Latent Model Structure

The success of INFUSION is intrinsically linked to how well it can couple with the internal, learned structure of a model. For example, in Caesar cipher tasks, INFUSION's effectiveness is amplified when target shifts align with the model's learned Fourier modes, particularly when shifts share common factors with the alphabet size. This indicates that INFUSION primarily amplifies existing, latent model behaviors rather than introducing entirely novel ones, making models with transparent internal structures more vulnerable.

Interpretable Discrete-Token Perturbations

Concept: Even when perturbing models that process discrete tokens (like language models) by operating in continuous embedding spaces, the resulting PGD-generated perturbations can yield surprisingly interpretable changes to the data.

Example: In a scenario targeting a "bee" to "cat" misclassification, INFUSION might introduce subtle edits to training documents. These edits could manifest as replacing a "cat" token with "bee" or even semantically related words like "hive." This demonstrates how the gradient-based approach, despite its technical nature, can still lead to meaningful and somewhat human-understandable alterations in the training text, bypassing simple content filters.

Enterprise Relevance: This implies that advanced data poisoning might not rely on obviously malicious or out-of-distribution examples, making detection through traditional content-based filtering more challenging. Defenses must evolve to look for subtle, context-aware anomalies.

Scaling Challenges for Large Language Models

INFUSION's effectiveness significantly attenuates when applied to large, pretrained language models. This scaling challenge arises from several compounding factors: the vast size of these models, the orders-of-magnitude larger training datasets (which shrink the relative poisoning budget), weaker influence function approximations at this scale, and the fundamental discrete nature of token spaces in language. While measurable targeted probability shifts can occur, prediction flips (overcoming learned preferences) remain rare, indicating that achieving significant behavioral changes in highly capable, foundational models remains a formidable challenge without more precise approximations or substantially larger perturbation budgets.

Quantify Your AI Potential ROI

Estimate the potential cost savings and efficiency gains for your organization by optimizing AI adoption and security strategies.

Potential Annual Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A clear, phased approach to integrating advanced AI capabilities and ensuring robust security within your enterprise.

Phase 1: Discovery & Strategy

Comprehensive assessment of existing infrastructure, data landscape, and business objectives. Define clear AI adoption goals and security baselines based on INFUSION's insights.

Phase 2: Pilot & Proof-of-Concept

Implement a targeted AI solution or security enhancement in a controlled environment. Validate impact, measure initial ROI, and test against potential data manipulation vectors.

Phase 3: Scaled Deployment & Integration

Roll out AI solutions across relevant departments. Integrate robust data provenance, influence-based monitoring, and continuous threat detection systems informed by research into subtle data edits.

Phase 4: Optimization & Future-Proofing

Continuous monitoring, performance optimization, and adaptation to emerging AI trends and attack methodologies. Ensure long-term resilience and ethical deployment.

Ready to Secure Your Enterprise AI?

Leverage our expertise to understand and defend against advanced data poisoning and model manipulation threats. Book a personalized consultation to fortify your AI strategy.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking