Indirect Prompt Injection Into LLMs Using Images and Sounds

Multi-modal Large Language Models (LLMs) are advanced artificial intelligence models that can produce contextually rich responses that combine inputs of various types (text, audio, pictures). As a result, Bard already relies on such architecture, and the next generation of ChatGPT is expected to rely on them as well. In this talk, we demonstrate how images and audio samples can be used for indirect prompt and instruction injection against (unmodified and benign) multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction.... By: Ben Nassi, Eugene Bagdasaryan Full Abstract and Presentation Materials: #indirect-prompt-injection-into-llms-using-images-and-sounds-35320

1 view