RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Zeyi Liao, Jaylen Jones, Linxi Jiang, Yuting Ning, Eric Fosler-Lussier, Yu Su, Zhiqiang Lin, Huan Sun

Safety, Privacy & Alignment Sat, Apr 25 · 3:51 PM–4:01 PM · 204 A/B Avg rating: 6.00 (4–8)

Author-provided TL;DR

We provide a realistic, controlled and hybrid sandbox for systematic adversarial testings against computer-use agents.

Abstract

Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection, where attackers embed malicious content into the environment to hijack agent behavior. Current evaluations of this threat either lack support for adversarial testing in realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an Attack Success Rate (ASR) of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning ASRs of up to 50% in realistic end-to-end settings, indicating that CUA threats can already result in tangible risks to users and computer systems. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Introduces RedTeamCUA framework with hybrid web-OS sandbox for adversarial testing of computer-use agents.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Hybrid sandbox integrating VM-based OS environment with Docker-based web platforms for red teaming
RTC-Bench comprehensive benchmark with 864 examples investigating realistic hybrid web-OS attack scenarios
Demonstrates significant vulnerabilities in frontier CUAs with Attack Success Rates up to 42.9%

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Adversarial testing
Prompt injection attacks
Sandbox environments
Vulnerability analysis
Benchmark development

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Filename-dependent adversarial examples specify concrete filenames for evaluation reproducibility
from the paper
Benchmark primarily investigates attacks originating from web-based injections; did not model OS-originating attacks or web-to-web attacks
from the paper
Limited to single injection point per web environment without examining effects of environmental noise
from the paper
Adversarial experiments limited to 10 steps due to cost constraints, limiting ability to fully explore success rate utility under attacks
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Explore more general adversarial objectives and injection strategies not relying on fixed filenames
from the paper

Author keywords

Computer-Use Agents
Adversarial Risks
Sandbox
Benchmark

Something off? Let us know →

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Abstract

Author keywords

Related orals

LLM Fingerprinting via Semantically Conditioned Watermarks

Steering the Herd: A Framework for LLM-based Control of Social Learning

Every Language Model Has a Forgery-Resistant Signature

Gaussian certified unlearning in high dimensions: A hypothesis testing approach

Differentially Private Domain Discovery