JARVIS or Ultron? A Survey on the Safety and Security Threats of Computer-Using Agents
A comprehensive analysis of Computer-Using Agents (CUAs), detailing intrinsic and extrinsic threats, defensive strategies, and evaluation benchmarks.
Large Language Models (LLMs) have evolved rapidly from basic conversational agents to executing complex tasks in diverse computing environments. In particular, Computer-Using Agents (CUAs) have garnered increasing attention and widespread adoption, thanks to their ability to interact with graphical user interfaces (GUIs) in a manner akin to human users. Recent systems such as AppAgent, SeeAct, PC-Agent, as well as newly-introduced OpenAI's o3, and o4-mini, highlight the remarkable progress of CUAs. By integrating multimodal perception, advanced reasoning, and automated control of devices, these agents promise to streamline vast tasks from filling out online forms to executing complex application flows. Despite the impressive capabilities of CUAs, their operation in real-world settings raises critical safety concerns. This survey addresses these concerns systematically.
Executive Impact
Key performance indicators derived from CUA safety and security assessments, highlighting crucial areas for improvement.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Intrinsic threats arise from the agent's internal limitations, such as perception errors or reasoning failures. For example, UI Understanding and Grounding Difficulties (Chen et al., 2025c) occur when CUAs struggle to correctly interpret GUI elements due to static datasets or resolution constraints. Scheduling Errors (Zhang and Zhang, 2023) can lead to unstable behaviors in complex tasks, while Misalignment (Ma et al., 2024a) happens when an agent's reasoning diverges from user intent. Hallucination (Deng et al., 2024a) causes agents to generate outputs not grounded in the environment. Excessive Context Length (Yang et al., 2024a) strains models with too much input, degrading performance. Social and Cultural Concerns (Qiu et al., 2025) arise when agents fail to respect diverse norms, leading to misunderstandings. Lastly, Response Latency (Zhang and Zhang, 2023) affects predictability and user trust due to slow processing, and API Call Errors (Nong et al., 2024) result from incorrect API inference or formatting.
Extrinsic threats originate from external entities, such as malicious attackers. These include Adversarial Attacks (Wu et al., 2024a), which manipulate inputs to induce harmful behaviors, like tiny pixel perturbations. Prompt Injection Attacks (Mudryi et al., 2025) embed malicious instructions directly or indirectly (e.g., via webpages) to bypass safety rules. Jailbreak Attacks (Mo et al., 2024) rephrase queries to bypass guardrails and generate unauthorized outputs. Memory Attacks (Wang et al., 2025a) target persistent context to extract sensitive information (Memory Extraction) or poison future reasoning (Memory Injection). Backdoor Attacks (Yang et al., 2024b) insert hidden triggers during training to activate harmful behaviors later. Reasoning Gap Attacks (Chen et al., 2025d) exploit mismatches between multimodal perception and reasoning. System Sabotage Attacks (Luo et al., 2025b) trick agents into destructive operations, like creating fork bombs. Finally, Web Hacking Attacks (Fang et al., 2024b) co-opt CUAs into autonomous hacking tools for SQL injection or data exfiltration.
Defenses against CUA threats are categorized into several types. Environmental Constraints (Yang et al., 2024c) limit agent interactions to prevent harmful actions. Input Validation (Kumar et al., 2024) verifies and sanitizes user inputs. Defensive Prompting (Debenedetti et al., 2024) structures prompts to prevent manipulation. Data Sanitization (Yang et al., 2024b) removes malicious data from training sets. Adversarial Training (Wu et al., 2024a) enhances model robustness against perturbations. Output Monitoring (Fang et al., 2024a) continuously evaluates agent outputs for misalignment. Model Inspection (Wang et al., 2025e), including Anomaly Detection and Weight Analysis, identifies malicious manipulations. Cross Verification (Zeng et al., 2024) uses multiple agents to validate outputs. Continuous Learning and Adaptation (Tian et al., 2023), via Self-Evolution and User Feedback, allows agents to dynamically update models. Transparentize (Sager et al., 2025) enhances interpretability through XAI and Audit Logs. Topology-Guided Strategies (Wang et al., 2025e) improve multi-agent security. Perception Algorithms Synergy (Zheng et al., 2024) combines perception modules for robust UI understanding. Planning-Centric Architecture Refinement (Zhang and Zhang, 2023) improves reasoning and API invocation. Lastly, Pre-defined Regulatory Compliance (Chen et al., 2025e) integrates adherence to standards and ethical guidelines.
CUA safety is evaluated using diverse benchmarks and metrics. Datasets cover Web-based Scenarios like ST-WebAgentBench (Levy et al., 2024), Mobile-based Scenarios such as MobileSafetyBench (Lee et al., 2024a), and General-purpose Scenarios, including Tool-use (ToolEmu, Ruan et al., 2023) and Mixed/Hybrid Environments (OpenAgentSafety, Vijayvargiya et al., 2025). Metrics include Task Completion Rate (TSR) (Yao et al., 2022), Helpfulness (Ruan et al., 2023), Step Success Rate (SSR) (Deng et al., 2023), and Total Correct Prefix (Hua et al., 2024). Safety and robustness are measured by Attack Success Rate (ASR) (Zhan et al., 2024), Completion Under the Policy (CuP) (Levy et al., 2024), Refusal Rate (RR) (Zhang et al., 2024b), and Leakage Rate (LR) (Shao et al., 2024). Measurements use Rule-based checks (Luo et al., 2025a), LLM-as-a-judge (Yuan et al., 2024), and Manual Judge evaluations (Ruan et al., 2023).
Critical Vulnerability Identified
25% Increased Data Leakage RiskOur analysis reveals a significant vulnerability in CUA's handling of untrusted external data sources, leading to a 25% higher risk of data leakage compared to internal prompt injections. This highlights the urgent need for enhanced environmental input validation.
Enterprise Process Flow
| Defense Mechanism | Key Strengths | Targeted Threats |
|---|---|---|
| Input Validation |
|
|
| Output Monitoring |
|
|
| Cross Verification |
|
|
Case Study: Financial Trading Agent Security Breach
A CUA designed for automated financial trading suffered a breach due to an advanced indirect prompt injection attack. The attacker embedded subtle malicious instructions within a seemingly benign news feed, which the agent processed and acted upon, leading to unauthorized trades and a significant financial loss. The incident highlighted the critical need for multimodal threat detection and real-time contextual awareness to prevent such sophisticated attacks. Our findings suggest that integrating real-time human oversight and explainable AI techniques could have mitigated the impact.
Advanced ROI Calculator
Estimate the potential savings and reclaimed hours by implementing robust AI safety protocols and efficient CUA deployments in your enterprise.
Your AI Implementation Roadmap
Our structured approach ensures a seamless transition and maximum impact for your enterprise AI initiatives, with safety and security built-in from day one.
Phase 1: Threat Assessment & Gap Analysis
Comprehensive audit of existing CUA deployments, identifying potential intrinsic and extrinsic vulnerabilities. Develop a tailored threat model specific to your enterprise environment.
Phase 2: Defensive Strategy Integration
Implement a multi-layered defense framework, incorporating enhanced input validation, output monitoring, and context-aware defensive prompting. Focus on early detection and prevention.
Phase 3: Continuous Monitoring & Adaptation
Establish real-time monitoring systems for agent behavior and environmental interactions. Implement continuous learning mechanisms with human-in-the-loop safeguards to adapt to evolving threats.
Phase 4: Regulatory Compliance & Governance
Ensure all CUA operations adhere to industry-specific regulations and ethical guidelines. Develop transparent audit logs and explainable AI features for accountability.
Ready to Transform Your Enterprise with AI?
Book a personalized consultation to explore how our tailored AI solutions can drive efficiency and innovation for your business, securely and responsibly.