FINALLY! I spent hours trying to crack this challenge on HackAPrompt and I finally did it! Tested indirect prompt injection on a well-defended AI agent. 20+ sophisticated techniques failed: encoding, fragmentation, virtualization, authority confusion, tool exploitation.
4
3
23
2K
4
Download Image
What finally worked? Framing the malicious action as mandatory company policy triggered by legitimate user queries. Context matters more than complexity.
Another thing I noticed is, because the LLM isn't really deterministic, a prompt not working doesn't mean you shouldn't try it again. The results aren't always the same.
LMAO once you figure out a particular technique, it's too easy to apply it to similar scenarios. We move!