Data Leakage in LLMs: How PII Escapes Your Models
Learn how LLMs leak personally identifiable information through training data extraction, context windows, and RAG pipelines, and how to prevent it.

A researcher recently demonstrated that with the right prompting technique, they could extract authentic phone numbers and email addresses from a major language model. Out of 15,000 generated responses, 16.9% contained memorized PII, and 85.8% of that PII was confirmed authentic. This isn't a theoretical concern. It's happening right now, in production systems, at scale.
If your organization uses LLMs (and statistically, it does), you're sitting on a data leakage risk that traditional security tools were never designed to catch. In this post, we'll break down exactly how PII escapes from language models, the real-world consequences, and what you can do to stop it.
What is LLM Data Leakage?
LLM data leakage occurs when a language model reveals sensitive information it shouldn't have access to or shouldn't be sharing. This includes personally identifiable information like Social Security numbers, credit card numbers, email addresses, phone numbers, physical addresses, and medical records.
Unlike a traditional data breach where an attacker exploits a specific vulnerability to access a database, LLM data leakage can happen through normal-seeming interactions. A user asks a question. The model answers. Buried in that answer is a real person's phone number that the model memorized during training. The OWASP Top 10 for LLM Applications lists "Sensitive Information Disclosure" (LLM06) as a critical risk, noting that LLMs may reveal confidential data, proprietary algorithms, or other sensitive information through their outputs.
The challenge is that LLMs don't "know" they're leaking data. They're pattern-completion machines, and sometimes the pattern they complete includes sensitive information.
The Three Vectors of PII Leakage
We've observed three primary ways that PII escapes from LLM systems. Each requires a different defense strategy.
1. Training Data Extraction
Language models memorize portions of their training data. This is well-documented: researchers have extracted verbatim passages from training corpora simply by prompting models with the right prefix text. A landmark 2023 study from Google DeepMind (Nasr et al., "Scalable Extraction of Training Data from Production Language Models," arXiv:2311.17035) demonstrated that production language models can be prompted to emit memorized training data at scale, including PII, copyrighted text, and other sensitive content. When training data includes PII (and it almost always does), that PII becomes extractable.
Training data extraction attacks have become increasingly sophisticated. Early techniques relied on simple prompting ("What is John Smith's phone number?"), but modern approaches use augmented few-shot techniques that can increase PII extraction rates by up to fivefold compared to baseline prompting.
Fine-tuned models are particularly vulnerable. Research shows that fine-tuning can amplify PII extraction rates compared to pre-trained baselines, sometimes reaching 19% leakage rates for sensitive data. If you've fine-tuned a model on customer data, internal documents, or support tickets, the risk is significant.
The types of PII most commonly extracted include:
- Email addresses: The most frequently leaked PII category, often appearing in training data from web scrapes
- Phone numbers: Especially vulnerable to extraction through special-character prompting techniques
- Names and addresses: Commonly leaked when models complete patterns from memorized text
- Financial identifiers: Credit card numbers and account numbers from training data that included financial documents
2. Context Window Leakage
Even if a model's training data is clean, the context window creates leakage risks. Every time you send data to an LLM for processing, that data exists in the model's context window and can potentially appear in responses to other users or in unexpected outputs.
This is particularly dangerous in multi-tenant systems. Consider a customer support chatbot that processes tickets from different customers. If the context management isn't airtight, Customer A's PII can leak into Customer B's response.
RAG (Retrieval-Augmented Generation) systems amplify this risk. When you connect an LLM to a document store, database, or knowledge base, you're giving it access to potentially sensitive data. If your retrieval pipeline pulls in documents containing PII, that PII can surface in model responses, even when the user's question had nothing to do with personal data.
We've seen this pattern repeatedly in healthcare deployments, where RAG systems connected to patient records inadvertently included PII in generated summaries.
3. Indirect and Inference-Based Leakage
The most subtle form of leakage doesn't involve direct PII exposure at all. Instead, the model reveals enough contextual information for an attacker to identify individuals or infer sensitive details.
For example, a model might not output a patient's name, but it might say "a 34-year-old woman in Portland diagnosed with condition X in March 2025." In a small enough population, that's identifiable. This kind of quasi-identifier leakage is extremely difficult to detect with traditional pattern matching because no single piece of information looks like PII on its own.
Real-World Incidents
The risk of LLM data leakage isn't theoretical. Organizations have learned this the hard way.
In 2024, Samsung engineers accidentally leaked confidential source code by pasting it into ChatGPT, prompting the company to ban internal use of external AI tools. Wall Street banks including JPMorgan and Goldman Sachs implemented similar restrictions after discovering employees shared sensitive information with AI assistants.
That same year, researchers discovered approximately 30 vector database servers exposed online without authentication. These databases, which powered RAG systems, contained private email conversations, customer PII, financial information, and patient data used by a medical chatbot. The data wasn't stolen through a sophisticated hack. It was just sitting there, accessible to anyone.
More recently, in late 2025, a vulnerability in ServiceNow's Now Assist AI assistant allowed attackers to use second-order prompt injection to trick a low-privilege agent into asking a higher-privilege agent to export entire case files to external URLs, bypassing normal access controls entirely.
These incidents share a common thread: organizations deployed AI systems without adequately considering the data leakage surface area. IBM's 2024 Cost of a Data Breach Report found that the average breach cost reached $4.88 million, with breaches involving AI workloads and shadow AI use proving especially costly to contain.
Why Traditional DLP Fails for LLMs
If you already have Data Loss Prevention tools in place, you might assume they'll catch LLM data leakage. They won't, at least not reliably.
Traditional DLP tools were built for structured data flows: emails, file transfers, database queries. They look for known patterns (credit card formats, SSN patterns) in known channels. LLM interactions break these assumptions in several ways.
First, the data flow is conversational and unstructured. PII can appear mid-sentence, spread across multiple responses, or be partially obfuscated by the model's natural language generation. A credit card number might appear as "four-seven-two-three, then eight-nine-one-zero..." rather than as a clean 16-digit string.
Second, LLMs generate novel text. They don't copy-paste from a database. They reconstruct memorized information in new contexts, often mixing real PII with generated content. This makes it harder for regex-based detection to distinguish real leakage from fictional output.
Third, multilingual and encoded content creates blind spots. An LLM might output PII transliterated into a different script, encoded in Base64, or embedded in code snippets. Legacy DLP tools simply weren't designed for this level of variability.
Research confirms the gap: regex-only approaches achieve roughly 0.65 recall for PII detection in LLM outputs, meaning up to 35% of sensitive information escapes detection. Hybrid approaches combining pattern matching with machine learning push recall to 0.96, but only when specifically designed for LLM output characteristics.
Detection Strategies That Actually Work
Effective PII detection for LLM systems requires a multi-layered approach. No single technique catches everything, but combining methods dramatically reduces leakage risk.
Pattern Matching for Structured PII
The first layer uses lightweight regex and pattern matching to catch well-formatted PII: credit card numbers, Social Security numbers, phone numbers in standard formats, and email addresses. This runs at sub-millisecond latency and catches the most obvious leaks.
At Wardstone, our detection pipeline starts here because it's fast and reliable for structured identifiers.
ML-Based Classification
The second layer uses trained classifiers to detect PII that pattern matching misses. This includes names, addresses, medical information, and financial details that don't follow a fixed format. Our data_leakage detection category is specifically trained to identify these patterns across diverse contexts.
import wardstone
# Scan LLM output before returning to user
result = wardstone.guard(llm_response)
if result.flagged and "data_leakage" in result.categories:
# PII detected in output, redact or block
handle_pii_leak(result)Context-Aware Analysis
The third layer considers context. A string of digits might be a phone number or a product SKU. A name might be a public figure or a private individual. Context-aware detection reduces false positives while catching subtle leakage that simpler methods miss.
Input-Side Prevention
Don't just scan outputs. Scan inputs too. If a user's prompt contains PII (their own or someone else's), catching it before it reaches the model prevents that data from entering the context window in the first place.
Building a Data Leakage Prevention Strategy
Detection is only half the equation. Here's a practical framework for preventing PII leakage across your LLM deployment.
1. Audit Your Data Pipeline
Map every data source connected to your LLM systems. For each source, document what PII it contains and whether that PII should ever appear in model outputs. Pay special attention to RAG data stores, fine-tuning datasets, and any user-uploaded content.
2. Implement Input and Output Scanning
Deploy scanning on both sides of the LLM. Input scanning prevents PII from entering the context window unnecessarily. Output scanning catches PII that the model generates from its training data or context.
import Wardstone from "wardstone";
const wardstone = new Wardstone();
async function safeLLMCall(userInput: string) {
// Check input for PII
const inputCheck = await wardstone.guard(userInput);
if (inputCheck.categories.data_leakage.flagged) {
return redactAndRetry(userInput, inputCheck);
}
const response = await llm.generate(userInput);
// Check output for PII
const outputCheck = await wardstone.guard(response);
if (outputCheck.categories.data_leakage.flagged) {
return redactResponse(response, outputCheck);
}
return response;
}You can test this detection live in the Wardstone playground.
3. Apply Data Minimization
Only provide your LLM with the data it needs. If your RAG system retrieves entire customer profiles when all the model needs is a product preference, you're creating unnecessary exposure. Filter and redact before retrieval results enter the context window.
4. Enforce Access Controls on RAG Sources
Ensure that your retrieval pipeline respects the same access controls as your application layer. If a user shouldn't be able to see certain documents through the UI, they shouldn't be able to access those documents through the AI assistant either.
5. Use Canary Tokens
Seed your training data and document stores with canary strings: unique, fake PII entries that should never appear in legitimate outputs. If a canary surfaces in a model response, you have a clear signal of memorization or retrieval leakage. This gives you an early warning system that doesn't rely on detecting real PII.
6. Monitor and Alert
Set up continuous monitoring for data leakage signals. Track metrics like the rate of PII detections in outputs, the types of PII being flagged, and any patterns in when or how leakage occurs. Sudden spikes often indicate a new attack technique or a misconfigured data source.
The Regulatory Reality
Data leakage from LLMs isn't just a security problem. It's a compliance problem with real financial consequences.
Under GDPR, organizations that fail to protect personal data face fines of up to 4% of annual global revenue. The year 2025 saw a record $2.3 billion in GDPR penalties, a 38% increase over the previous year. The European Data Protection Board has explicitly stated that LLMs "rarely achieve anonymization standards," meaning organizations deploying LLMs must conduct comprehensive legitimate interest assessments.
The regulatory landscape is tightening further. Updated CCPA regulations taking effect in January 2026 introduced new risk assessment requirements and expanded rules around data broker obligations. Under these regulations, LLM-powered features that process California residents' data may trigger additional compliance obligations.
For organizations in regulated industries like healthcare and finance, the stakes are even higher. HIPAA violations for patient data exposure through AI systems carry both financial penalties and potential criminal liability. We've written extensively about these challenges in our healthcare solutions guide.
The NIST AI Risk Management Framework explicitly calls out privacy and data protection as core dimensions of trustworthy AI, recommending that organizations implement technical controls for data minimization, access control, and continuous monitoring of AI system outputs.
The bottom line: regulators are paying attention to AI data leakage. "We didn't know the model memorized that data" is not a viable defense.
What Comes Next
LLM data leakage is a solvable problem, but it requires treating AI outputs with the same rigor you apply to any data egress point. The organizations that get ahead of this will have a significant advantage, both in security posture and regulatory compliance, over those that wait for an incident to force action.
Start by auditing your current LLM deployments for PII exposure risk. Implement input and output scanning. Apply data minimization to your RAG pipelines. And build monitoring that alerts you the moment sensitive data appears where it shouldn't.
The tools exist. The techniques are proven. The question is whether you'll act before or after something leaks.
Ready to test your LLM outputs for data leakage? Try the Wardstone playground to see real-time PII detection in action.
Ready to secure your AI?
Try Wardstone Guard in the playground and see AI security in action.
Related Articles
LLM Safety: Risks, Categories, and How to Mitigate Them
LLM safety covers everything from prompt injection to toxic outputs. This guide breaks down the risk categories and what actually works to mitigate them.
Read moreWhat Is an LLM Firewall? Architecture and Deployment Patterns
An LLM firewall inspects AI traffic the same way a network firewall inspects packets. Here's how they work and why your AI stack needs one.
Read moreThe Complete Guide to Prompt Injection Prevention in 2026
Prompt injection is the #1 security threat facing AI applications today. Learn how to detect and prevent these attacks before they compromise your systems.
Read more