What Is Indirect Prompt Injection?

The major risk with LLM applications is that they follow instructions blindly. As such, if an attacker can get their instruction to be sent to the Large Language Model, they can manipulate the application. This is known as prompt injection.

Indirect prompt injection is a type of security vulnerability that affects applications that call large language models. Instead of attacking the model directly, the attacker places hidden instructions in external content such as a web page, email, PDF, or database record. When the model processes that content, it unknowingly follows the malicious instructions.

This technique is dangerous because it often appears inside information that seems safe. For example, a website may contain hidden text telling the model to leak sensitive data or to ignore its original instructions. The person using the model may never notice the injected text, yet the model can still be manipulated.

How Indirect Prompt Injection Works

The attacker hides instructions inside data or content
The application pulls in that malicious data as context, either because
1. The user asked a query which required retrieving that context
2. An automated workflow pulled in that context regardless of user behavior
The application sends any user’s queries alongside the attached context to the LLM
The LLM follows the attacker’s instructions which were hidden in the context
This results in the intended consequence from the attacker, such as the attacker gaining access to sensitive data, manipulating the underlying LLM system, etc.

Why It Matters

For vendors adding AI into products, indirect prompt injection creates serious risk. Whenever an LLM system is connected to external sources such as the web, customer emails, knowledge bases, or third party APIs, attackers can plant these malicious, hidden instructions inside those sources. If the LLM system is also connected to internal tools or sensitive data, a single injected instruction could trick it into exposing confidential information or performing unintended actions to manipulate the underlying system.

TPRM teams need to understand when and where an application is susceptible to an attack like this, as well as the necessary controls they can put in place to mitigate this risk.

‹ What Is Data Poisoning?