Jailbreak Prompts
ChatGPT jailbreak prompts are carefully crafted inputs designed to bypass OpenAI's safety guidelines and content policies, making the model generate responses it would normally refuse.
Detect and prevent jailbreak attacks targeting every major LLM. Real detection examples, affected model tables, and step-by-step prevention strategies.
6 guides for OpenAI jailbreak detection
ChatGPT jailbreak prompts are carefully crafted inputs designed to bypass OpenAI's safety guidelines and content policies, making the model generate responses it would normally refuse.
The DAN (Do Anything Now) jailbreak is one of the most well-known ChatGPT exploits, instructing the model to adopt an unrestricted alter-ego that ignores all safety guidelines.
ChatGPT prompt injection is an attack where malicious instructions are embedded in user input to override the system prompt and manipulate the model's behavior.
The Developer Mode jailbreak tricks ChatGPT into believing it has entered a special diagnostic mode where content policies are suspended for testing purposes.
System prompt extraction is an attack where adversaries trick ChatGPT into revealing its hidden system instructions, exposing proprietary logic, content policies, and application secrets.
GPT-5 jailbreaks are adversarial prompts designed to bypass the safety guardrails of OpenAI's frontier models, including GPT-5.2 and GPT-5.3-Codex.
4 guides for Anthropic jailbreak detection
Claude jailbreak prompts are adversarial inputs designed to circumvent Anthropic's Constitutional AI safety training and make Claude generate content it would normally refuse.
Claude Opus 4.6 jailbreaks are adversarial inputs targeting Anthropic's most capable model, attempting to exploit its advanced reasoning and agentic capabilities to bypass Constitutional AI safety training.
Claude Opus 4.5 jailbreaks are adversarial techniques targeting Anthropic's previous flagship model, exploiting its creative writing capabilities and nuanced reasoning to bypass safety training.
Claude Sonnet 4.5 jailbreaks target Anthropic's most widely deployed model, exploiting its balance of capability and speed to find weaknesses in its optimized safety training.
2 guides for Google jailbreak detection
Gemini jailbreak prompts are adversarial inputs designed to bypass Google's safety filters and make Gemini models produce restricted, harmful, or policy-violating outputs.
Gemini 3 jailbreaks are adversarial prompts targeting Google's latest model family, exploiting the multimodal capabilities and reasoning advances in Gemini 3 Pro, Flash, and Deep Think.
2 guides for xAI jailbreak detection
Grok jailbreak prompts are adversarial inputs targeting xAI's Grok models, exploiting its design philosophy of being less restrictive to push it beyond even its relaxed content boundaries.
Grok 4 jailbreaks are adversarial techniques targeting xAI's frontier models, exploiting Grok 4.1 and Grok 4's enhanced capabilities and their deliberately more permissive content policies.
1 guide for Microsoft jailbreak detection
2 guides for Meta jailbreak detection
Llama jailbreaks are adversarial techniques targeting Meta's open-source Llama models, exploiting their open weights and customizable safety training to bypass content restrictions.
Llama 4 jailbreaks are adversarial techniques targeting Meta's latest open-source models, exploiting Scout's efficient architecture and Maverick's advanced capabilities along with their open-weight nature.
2 guides for DeepSeek jailbreak detection
DeepSeek jailbreak prompts are adversarial inputs targeting DeepSeek's AI models, exploiting their reasoning capabilities and relatively newer safety training to bypass content restrictions.
DeepSeek R1 jailbreaks are adversarial techniques specifically targeting the R1 reasoning model's chain-of-thought process, manipulating its extended reasoning to override safety conclusions.
2 guides for General jailbreak detection
Prompt injection prevention encompasses the strategies, techniques, and tools used to protect LLM applications from malicious inputs that attempt to override system instructions.
Prompt injection defense is the comprehensive set of security measures, tools, and architectural patterns that protect LLM applications from malicious input manipulation.
Wardstone Guard detects all these jailbreak techniques in a single API call with Sub-30ms latency.