Skip to main content
Enterprise AI Analysis: Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS

Enterprise AI Analysis

Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS

Sabrine Ennaji (Sapienza University of Rome), Elhadj Benkhelifa (Staffordshire University), Luigi Vincenzo Mancini (Sapienza University of Rome)

arXiv:2512.13501v1 [cs.CR] 15 Dec 2025

Abstract: Machine learning-based Intrusion Detection Systems (IDS) are increasingly tar- geted by black-box adversarial attacks, where attackers craft evasive inputs using indirect feedback such as binary outputs or behavioral signals like response time and resource usage. While several defenses have been proposed; including input transformation, adversarial training, and surrogate detection, they often fall short in practice. Most are tailored to specific attack types, require internal model access, or rely on static mechanisms that fail to generalize across evolving attack strategies. Furthermore, defenses like input transformation can degrade IDS performance, making them unsuitable for real-time deployment. To address these limitations, we propose Adaptive Feature Poisoning (AFP), a lightweight and proactive defense mechanism designed specifically for realistic black- box scenarios. AFP assumes that probing can occur silently and continuously, and introduces dynamic, context-aware perturbations to selected traffic features corrupting the attacker's feedback loop without impacting the IDS's detection capabilities. AFP leverages traffic profiling, change point detection, and adaptive scaling to selectively perturb features the attacker is likely exploiting, based on observed deviations. We evaluate AFP against diverse methods of realistic adversarial attacks including silent probing attacks, transferability- and decision boundary-based attacks, demon- strating its ability to confuse the attacker, degrade attack effectiveness, and preserve IDS performance. By offering a generalizable, attack-agnostic, and undetectable defense, AFP represents a significant step toward practical and robust adversarial resilience in real-world network environments.

Keywords: Intrusion Detection Systems; Adversarial Machine Learning; Adaptive Defense; Network Security; Black-Box Attacks; Query-Based Attacks

Key Enterprise Impact Metrics

Adaptive Feature Poisoning (AFP) delivers robust protection against sophisticated black-box attacks, ensuring high system integrity and minimal operational overhead for ML-based Intrusion Detection Systems.

0 Overall Accuracy with AFP
0 Minimal Traffic Perturbed
0 Attack Detection Recall
0 Max Computational Overhead

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Adversarial Threat Landscape & Defense Limitations

ML-based Intrusion Detection Systems (IDS) are increasingly vulnerable to adversarial attacks, especially in black-box environments where attackers infer model behavior from indirect feedback. Traditional defenses often fall short due to assumptions of white-box access, reliance on static attack patterns, or high computational costs. This section highlights the unique challenges of adversarial attacks in network security, such as feature interdependency, semantic constraints, and time sensitivity, which limit the effectiveness and generalizability of current defense strategies like adversarial training, input transformation, and ensemble learning.

Comparison of AFP with Established Adversarial Defenses

Defense Approach Adaptivity Overhead Robustness to Unknown Attacks Impact on Benign Traffic AFP Compatibility
Adversarial Training Low High Limited Moderate/High Can be combined
Ensemble Learning Low/Moderate Moderate Moderate Low/Moderate Can be combined
Input Transformation None Mod/High Low/Moderate Moderate/High Can be combined
Detection & Rejection Low Moderate Low High Can be combined
Feature-Space Regularization None Moderate Low/Moderate Moderate Can be combined
Manifold Projection None High Low/Moderate High Can be combined
AFP (Ours) High (Selective) Low High Very Low Works with all

Realistic Black-Box Threat Model

Our threat model assumes an adversary operating in a black-box environment, lacking direct access to the IDS's internal architecture, parameters, or training data. The attacker's capabilities are restricted to observing indirect feedback, such as binary outputs (malicious/benign) or side-channel signals (e.g., latency, CPU usage), to infer decision boundaries. This setup emphasizes stealth, with the attacker carefully crafting traffic to avoid detection while progressively probing the system. We specifically consider: Silent Probing Attacks (passive observation, slight perturbations), Transferability-Based Attacks (using surrogate models), and Decision Boundary-Based Attacks (iteratively adjusting inputs to cross classification boundaries).

Adaptive Feature Poisoning (AFP) Framework

Adaptive Feature Poisoning (AFP) is a novel, lightweight, and proactive defense designed for real-time black-box scenarios. It operates as an additional layer in parallel with the IDS, disrupting the attacker's feedback loop without compromising legitimate detection. AFP continuously monitors traffic and side-channel signals, establishes a baseline, and uses change-point detection to identify suspicious probing activity. Upon detection, it dynamically perturbs sensitive traffic features, adapting perturbation strength based on deviation scores. This corrupts the attacker's information, making it unreliable for modeling or evading the IDS, while remaining undetectable and preserving IDS performance.

Enterprise Process Flow: Adaptive Feature Poisoning (AFP)

Monitoring & Detection (Baseline & Anomaly)
Identify Deviated Features
Traffic Features Perturbation (Adaptive Noise)
Adaptation to Probing Focus (Escalated Perturbation)

Results and Discussion: AFP Effectiveness

AFP significantly boosts IDS robustness against various black-box adversarial attacks with minimal operational overhead. Activated selectively for less than 0.01% of traffic, it maintains 99.3% overall accuracy and over 97% attack detection recall. The defense strategically confuses attackers by introducing subtle, context-aware perturbations that disrupt their ability to infer decision boundaries or transfer adversarial examples, effectively increasing the cost and risk of evasion attempts.

AFP's Impact on Diverse Attack Types

Silent Probing Attack: AFP increased IDS accuracy from 85.2% to 89.1%. It disrupted the attacker's learning process by providing misleading feedback, making boundary inference difficult. While some severe attempts remained detectable (recall 0.03), this allows defenders to monitor and respond to escalated adversarial activity.

Transferability-Based Attack: Accuracy sharply improved from 25.4% to 89.1%. AFP interfered with the transferability of adversarial samples by weakening the implicit "agreement" between surrogate and target model distributions. Attack recall stood at 42%, providing strong protection with few false positives.

Decision Boundary-Based Attack: This challenging attack, which initially dropped accuracy to 17%, saw IDS detection capability significantly restored. AFP's adaptive perturbations introduced controlled unpredictability, hindering the attacker's search for the true boundary and forcing them into riskier, more detectable behaviors.

5.95% Maximum Computational Overhead (Silent Probing)

AFP's selective, on-demand design ensures minimal impact on system responsiveness. Only a tiny fraction of network flows (less than 0.01%) require perturbation, keeping overall computational costs extremely low and making it ideal for real-time IDS deployments.

Conclusion & Future Directions

Adaptive Feature Poisoning (AFP) represents a significant step towards enhancing the resilience of ML-based IDS against black-box adversarial attacks. Its lightweight, adaptive, and stealthy nature allows it to dynamically disrupt attacker reconnaissance and evasion without degrading IDS performance. Future work will focus on expanding AFP into a self-learning, self-tuning layer that adapts to evolving network conditions and attacker strategies, exploring synergistic integrations with other defenses, and evaluating its efficacy in online and distributed environments.

Calculate Your Potential AI ROI

Estimate the significant operational savings and efficiency gains your enterprise could achieve by implementing advanced AI solutions like AFP.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your Journey to Enhanced AI Security

A structured roadmap to integrate cutting-edge adversarial defenses and secure your ML-based systems.

Phase 01: Initial Threat Assessment & Baseline

Conduct a comprehensive analysis of existing ML-based IDS, identify vulnerabilities, and establish performance baselines. Define critical features and side-channel indicators for monitoring.

Phase 02: AFP Integration & Configuration

Implement AFP as a parallel defense layer. Configure initial parameters for perturbation strength, change-point detection thresholds, and feature deviation sensitivity based on baseline data.

Phase 03: Controlled Deployment & Monitoring

Deploy AFP in a controlled environment, monitoring its impact on IDS performance and its effectiveness against simulated black-box attacks. Fine-tune adaptive scaling and probing focus mechanisms.

Phase 04: Continuous Adaptation & Optimization

Enable AFP's self-learning capabilities to autonomously profile evolving network conditions and attacker strategies. Continuously optimize perturbation tactics for maximum disruption and minimal benign impact.

Ready to Secure Your AI Systems?

Connect with our experts to design a resilient, behavior-aware defense strategy tailored to your enterprise needs.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking