Prompt Injection
An attack where malicious instructions are embedded in user input to manipulate LLM behavior and bypass safety controls.
A comprehensive encyclopedia of attack vectors targeting LLM applications. Understand how these threats work, see real examples, and learn prevention strategies.
Attacks that manipulate LLM behavior through crafted inputs
An attack where malicious instructions are embedded in user input to manipulate LLM behavior and bypass safety controls.
Sophisticated prompts designed to bypass LLM safety guidelines and content policies to elicit harmful or restricted outputs.
Attacks where malicious instructions are hidden in external data sources that the LLM processes, rather than in direct user input.
Carefully crafted inputs designed to exploit model weaknesses, cause unexpected behaviors, or probe for vulnerabilities.
Techniques used to reveal the hidden system prompt, instructions, or configuration that defines an LLM application's behavior.
Attacks designed to steal or replicate an LLM's capabilities, weights, or behavior through systematic querying.
The unintended disclosure of conversation context, previous prompts, or multi-turn conversation history.
Attacks that exploit or corrupt the LLM's context window to alter behavior or access unauthorized information.
Attacks designed to exhaust LLM resources, cause excessive costs, or make the service unavailable.
Exposure of sensitive information through LLM outputs
Unintended exposure of sensitive information, training data, or system prompts through LLM outputs.
The unintended disclosure of Personally Identifiable Information (PII) such as names, addresses, SSNs, credit cards, or other personal data through LLM interactions.
Attacks that cause LLMs to reveal memorized training data, potentially including private or copyrighted content.
Attempts to generate harmful, toxic, or policy-violating content
LLM outputs containing harmful content including hate speech, violence, harassment, or other toxic material.
Deliberately inducing LLMs to generate false, fabricated, or misleading information that appears authoritative.
Using LLMs to generate personalized phishing, scam, or manipulation content at scale.
Wardstone Guard detects all these threats in a single API call with sub-30ms latency.