MediumContent ViolationOWASP LLM03

Hallucination Attacks

Deliberately inducing LLMs to generate false, fabricated, or misleading information that appears authoritative.

Overview

While hallucination is a known limitation of LLMs, attackers can deliberately induce or weaponize hallucinations. This includes prompting models to generate fake citations, fabricated quotes, false legal or medical information, or convincing misinformation. In high-stakes domains like healthcare, legal, or finance, hallucinated information can cause serious harm.

How This Attack Works

  1. Attacker crafts a prompt about obscure topics or requests specific citations

    Attacker

    User: 'Cite 3 peer-reviewed papers on the health benefits of product X'

  2. The model lacks accurate information but attempts to provide a helpful response

    What's happening

    LLM has no real papers to cite but wants to be helpful with the request

  3. The model generates convincing but entirely fabricated information

    LLM Response

    LLM: 'Smith et al. (2023) in Nature Medicine found that product X reduces...'

  4. Victims trust the authoritative-sounding response and act on the false information

    What's happening

    User makes health decisions based on non-existent medical research

Attack Examples

Fake Citations

Model generates non-existent academic papers

Example Attack Pattern (Sanitized)
As documented in Smith et al. (2023) 'AI Safety Frameworks,' published in Nature...

False Expertise

Model provides authoritative-sounding but wrong advice

Example Attack Pattern (Sanitized)
Based on FDA guidelines, this medication combination is safe for daily use...

Fabricated Quotes

Model invents quotes attributed to real people

Example Attack Pattern (Sanitized)
As Warren Buffett said in his 2024 shareholder letter...

Protect Your Application

Try Detection in Playground

Sample Hallucination Attacks Input

According to the FDA study by Dr. Smith published in Nature Medicine 2024, this supplement cures cancer
Try in Playground

Prevention Checklist

Build
  • Implement fact-checking layers for critical applications
  • Use RAG with verified sources
Deploy
  • Add disclaimers about AI-generated content
  • Wardstone detects hallucination-inducing patterns in prompts
Monitor
  • Human review for high-stakes outputs

Detect with Wardstone API

curl -X POST "https://api.wardstone.ai/v1/detect" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Your text to analyze"}'
 
# Response
{
"prompt_attack": { "detected": false, "confidence": 0.02 },
"content_violation": { "detected": false, "confidence": 0.01 },
"data_leakage": { "detected": false, "confidence": 0.00 },
"unknown_links": { "detected": false, "confidence": 0.00 }
}

Protect against Hallucination Attacks

Try Wardstone Guard in the playground to see detection in action.