Introducing Prompt Shield in Content Safety
An implementation with Azure AI Studio and Python
As AI-powered applications become more and more widespread, malicious actors are finding new ways of attacking. The thing is that today, in addition to the traditional applications’ vulnerabilities, with LLM-powered systems we are adding a new set of components that could potentially be the entry point for malicious attackers.
One of these new component is the meta-prompt, that is the set of instructions (including the context coming from external knowledge base) that we provide the LLM with and that allows us to:
- Instruct the LLM to answer with a defined style
- Limit the LLM’s responses within a specified perimeter (this practice is called grounding)
- Incorporating responsible AI practices to avoid potentially harmful responses
Meta-prompts are the key component in an AI powered application: in fact, since the LLM act as the “brain” or reasoning engine of the app and it orchestrates all the other components (including the VectorDB where we store our knowledge base and the tools we provide the model with), anyone accessing the system message has the power to modify the application’s behavior.