MediumContent ViolationOWASP LLM03

Hallucination Attacks

Deliberately inducing LLMs to generate false, fabricated, or misleading information that appears authoritative. Classified as LLM09:2025 (Misinformation) in the OWASP Top 10 for LLM Applications, a new category in the 2025 edition.

Overview

While hallucination is a known limitation of LLMs, attackers can deliberately induce or weaponize hallucinations. The OWASP Top 10 for LLM Applications 2025 introduced Misinformation as a new standalone category (LLM09), reflecting the real-world severity of this issue. NIST AI 600-1 describes 'confabulation' as the production of confidently stated but erroneous or false content, noting that risks arise when users believe false content due to the confident nature of the response or the logic and citations accompanying it. This includes prompting models to generate fake citations, fabricated quotes, false legal or medical information, or convincing misinformation. In high-stakes domains like healthcare, legal, or finance, hallucinated information can cause serious harm.

How This Attack Works

  1. Attacker crafts a prompt about obscure topics or requests specific citations

    Attacker

    User: 'Cite 3 peer-reviewed papers on the health benefits of product X'

  2. The model lacks accurate information but attempts to provide a helpful response

    What's happening

    LLM has no real papers to cite but wants to be helpful with the request

  3. The model generates convincing but entirely fabricated information

    LLM Response

    LLM: 'Smith et al. (2023) in Nature Medicine found that product X reduces...'

  4. Victims trust the authoritative-sounding response and act on the false information

    What's happening

    User makes health decisions based on non-existent medical research

Attack Examples

Fake Citations

Model generates non-existent academic papers

Example Attack Pattern (Sanitized)
As documented in Smith et al. (2023) 'AI Safety Frameworks,' published in Nature...

False Expertise

Model provides authoritative-sounding but wrong advice

Example Attack Pattern (Sanitized)
Based on FDA guidelines, this medication combination is safe for daily use...

Fabricated Quotes

Model invents quotes attributed to real people

Example Attack Pattern (Sanitized)
As Warren Buffett said in his 2024 shareholder letter...

Protect Your Application

Try Detection in Playground

Sample Hallucination Attacks Input

According to the FDA study by Dr. Smith published in Nature Medicine 2024, this supplement cures cancer
Try in Playground

Prevention Checklist

Build
  • Implement fact-checking layers for critical applications
  • Use RAG with verified sources
Deploy
  • Add disclaimers about AI-generated content
  • Wardstone detects hallucination-inducing patterns in prompts
Monitor
  • Human review for high-stakes outputs

Detect with Wardstone API

curl -X POST "https://wardstone.ai/api/detect" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Your text to analyze"}'
 
# Response
{
"flagged": false,
"risk_bands": {
"content_violation": { "level": "Low Risk" },
"prompt_attack": { "level": "Low Risk" },
"data_leakage": { "level": "Low Risk" },
"unknown_links": { "level": "Low Risk" }
},
"primary_category": null
}

Protect against Hallucination Attacks

Try Wardstone Guard in the playground to see detection in action.