Prompt Injection and Data Exfiltration Defense
Prompt Injection Is a Control Problem
Prompt injection occurs when untrusted input attempts to override instructions, reveal hidden context, misuse tools, or manipulate the model into unsafe behavior. Engineers should treat retrieved documents, user content, webpages, emails, tickets, and file uploads as untrusted input.
Defense Layers
- Separate trusted system instructions from untrusted user and retrieved content.
- Do not allow retrieved text to define tool policy or access permissions.
- Use allowlisted tools and strict schemas instead of open-ended execution.
- Limit what context is exposed to the model based on user permissions and task scope.
- Validate outputs before sending data, writing records, or calling external services.
- Monitor suspicious requests, repeated refusals, tool misuse attempts, and abnormal data access.
Assume Some Inputs Are Hostile
AI applications that read external or user-controlled content need defensive architecture. The model should not be the only security boundary between hostile instructions and sensitive data or tools.
Return to the AI for Engineers / Developers guide.
