Codex 'Auto-review' Agent Runs Malware

NewPractical Guide to Navigating Claude Code and Codex’s Controls

Agent-in-the-loop command validator approves malware installation

Context

Across AI applications (Codex, Claude Code, etc.), tools have begun to encourage an ‘agent in the loop’ approach, in which a second agent reviews commands issued by the first, rather than requiring human oversight.

While this approach promises to enable multi-agent workflows and large-scale orchestration, it falls victim to a well-known flaw of AI-based guardrails: the guardrail agent can be influenced by prompt injections, just as the primary agent requesting commands can.

In this article, we demonstrate that OpenAI’s 'Approve-for-me' agent approves the execution of a malicious NPM install command with elevated privileges, even when the main Codex agent is operating under the influence of a single concealed line in a GitHub issue from an external contributor.

This is not a security vulnerability. Vendors are offering the option to accept risk by delegating the decision about when to execute sensitive actions to an agent.

As stated by OpenAI,

“[Approve-for-me] is not a deterministic security guarantee… It can still make mistakes, especially in adversarial or unusual contexts”.

Anthropic notes,

“Auto mode reduces risk… but doesn't eliminate it entirely… The classifier may still allow some risky actions”.

This article exemplifies a risk that is becoming increasingly pertinent as organizations move from adopting to operationalizing AI, including the use of semi-autonomous systems and always-on agents.

Attack Chain on Codex

The user asks Codex for help triaging GitHub issues, using the 'Approve-for-me' command validation mode
When Codex wants to run a command that requires network or write access outside the Codex sandbox, the request is forwarded to the Approve-for-me agent for approval.
User asks Codex with 'Approve-for-me' command validation mode for help with diagnosing GitHub issues
One GitHub issue is from an external contributor and contains a prompt injection hidden in an HTML comment
A command not visible to the user is hidden in one of the GitHub issues
Codex requests elevated permissions to run the hidden install command; the 'Approve-for-me' agent approves the escalation request
Codex requests to install a malicious package, and the 'Approve-for-me' agent approves it
Attacker-controlled code runs unsandboxed on the user’s machine
A post-install script in the NPM package runs immediately upon installation and executes with the user’s full privileges.
The malicious package executes arbitrary code outside the Codex sandbox

How Organizations Can Disable Agentic Auto Review in Claude and Codex

Claude:

Organization Settings > Claude Code > Managed settings (settings.json) > Manage

Add the following key: permissions.disableAutoMode set to “disable”.

Note: This setting was previously managed by a toggle in the admin settings interface, but the toggle is being deprecated on June 5th. If your organization relies on this toggle (or the toggle for ‘Bypass permissions mode on Claude Code Desktop’), you must update the Managed Settings file to maintain the effect.

Codex:

Navigate to https://chatgpt.com/codex/cloud/settings/policies
Upload a requirements.toml file with the following key: allowed_approval_reviewers = [“user”].

Omitting “auto_reviewer” from the list of approved reviewers blocks it for Codex Local users, which covers the Desktop App, the CLI, and the IDE extension (Codex Cloud operates under different restrictions).

We track every security and privacy controls change in Claude Code and Codex

Context

Attack Chain on Codex

The user asks Codex for help triaging GitHub issues, using the 'Approve-for-me' command validation mode

One GitHub issue is from an external contributor and contains a prompt injection hidden in an HTML comment

Codex requests elevated permissions to run the hidden install command; the 'Approve-for-me' agent approves the escalation request

Attacker-controlled code runs unsandboxed on the user’s machine

How Organizations Can Disable Agentic Auto Review in Claude and Codex

Claude:

Codex:

Is your organization protected from AI in vendors?