AI Gateway with PII Redaction for LLM Applications
Every time a developer sends a prompt to an LLM, there is a risk that the request contains personally identifiable information — email addresses, Social Security numbers, customer records, internal API keys, or medical data. Without an interception layer, that sensitive data flows directly to a third-party model provider and may be logged, cached, or used for training.
An AI Gateway with PII Redaction sits between your application and the LLM provider. It scans every prompt in real time, detects 28 entity types using a multi-layered AI Firewall, and either redacts or blocks the sensitive content — before the request ever leaves your infrastructure. Processing is stateless and in-memory (no prompt body persistence); operational logs are metadata-only — prompts are not stored for data-sovereignty-friendly operations.
Why This Matters
Sending unfiltered prompts to LLMs creates regulatory, reputational, and security exposure that grows with every API call your organization makes.
- ■Regulatory violations — GDPR, HIPAA, CCPA, and PCI-DSS all impose fines for exposing protected data to unauthorized third-party processors.
- ■Training data leaks — Some providers may use API inputs for model fine-tuning, embedding your sensitive data permanently into their weights.
- ■Prompt logging — Provider-side request logs can persist for weeks. A single prompt containing an SSN or credit card number creates an indefinite liability window.
- ■Internal secret exposure — Developers routinely paste code snippets containing API keys, AWS credentials, and database connection strings into prompts.
Why PII Protection Matters for LLMs
Consider a common scenario in a healthcare application:
{
"model": "gpt-4.1",
"messages": [
{
"role": "user",
"content": "Summarize this patient record: John Smith, SSN 123-45-6789, DOB 03/15/1982, diagnosed with Type 2 diabetes on 01/10/2025. Prescribed Metformin 500mg."
}
]
}Without a gateway, this prompt — containing a real name, SSN, date of birth, and medical diagnosis — is sent unmodified to the model provider's servers.
Architecture: AI Gateway with PII Detection
AI ModelGate implements a multi-stage pipeline that inspects every request before it reaches any downstream provider:
Prompt enters the gateway — Your application sends a standard OpenAI-compatible request to the ModelGate endpoint instead of directly to a provider.
PII entities detected — The AI Firewall scans every message field for 28 entity types using a combination of pattern matching, checksums, intelligent entity recognition, and context-aware heuristics.
Policies applied — Each detected entity is matched against the project's DLP policy (with strict, balanced, or relaxed sensitivity presets). Per-entity rules determine whether to REDACT, BLOCK, or LOG the match. Policies are versioned as immutable releases for audit trails.
Prompt redacted or blocked — Matched entities are replaced with type-safe tokens (e.g., [EMAIL_ADDRESS], [US_SSN]). If a BLOCK-level entity is found (like a prompt injection), the entire request is rejected with a 400 response.
Request forwarded — The cleaned prompt is routed to the selected LLM provider. The gateway can auto-route to the cheapest qualified provider for the same model family when enabled, often cutting spend by roughly 40–60% versus always using a single vendor. The provider never sees the original sensitive data.
The AI Firewall Detection Engine
Unlike simple regex-based filters, the AI ModelGate firewall uses a multi-layered detection engine that combines four techniques to minimize false negatives without sacrificing latency:
Pattern Matching
High-precision patterns for structured formats like SSNs (XXX-XX-XXXX), credit card numbers (Luhn-validated), and API key prefixes (sk-*, ghp_*, AKIA*). Add custom regex for proprietary IDs, internal codes, and domain-specific formats.
Checksum Validation
Luhn algorithm for credit cards, mod-check for IBANs, and format-specific validation to eliminate false positives from random digit sequences.
Intelligent Entity Recognition
Identifies person names, locations, organizations, and dates that don't follow fixed patterns — catching 'John Smith' where pattern matching alone can't.
Context Heuristics
Surrounding text analysis to disambiguate. A 9-digit number near 'SSN' or 'social security' is scored higher than an isolated digit sequence.
This combined approach adds fewer than 50 milliseconds of latency per request for text and approximately 0.5–1 second for image payloads (vision security). Every response includes x-dlp-latency timing headers so you can verify performance in production.
Example: PII Redaction Flow
Before — Raw prompt
Send this email to john.doe@email.com about invoice #99342 for customer James Wilson, card ending 4242-4242-4242-4242.
After — Redacted prompt
Send this email to [EMAIL_ADDRESS] about invoice #99342 for customer [PERSON], card ending [CREDIT_CARD].
Supported Entity Types (28 total)
API_KEYAWS_ACCESS_KEYAWS_SECRET_KEYPRIVATE_KEYGITHUB_TOKENSLACK_WEBHOOKCREDIT_CARDIBAN_CODEUS_BANK_NUMBERCRYPTO_ADDRESSUS_ITINEMAIL_ADDRESSPHONE_NUMBERUS_SSNUS_PASSPORTPERSONSTREET_ADDRESSDATE_TIMENRPUK_NINOUK_NHS_NUMBERIP_ADDRESSMAC_ADDRESSLOCATIONURLMEDICAL_LICENSEUS_DRIVER_LICENSEPROMPT_INJECTION(blocked, not redacted)Implementing PII Redaction in an AI Gateway
AI ModelGate is a drop-in replacement for the OpenAI API. Point your existing SDK at the ModelGate endpoint and PII redaction happens automatically — typically a 2-line change: set baseURL and apiKey. Optional bring-your-own-key (BYOK) lets you supply provider credentials while ModelGate still enforces DLP and routing.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "os_hub_your_key_here",
baseURL: "https://api.aimodelgate.ai/v1",
});
const response = await client.chat.completions.create({
model: "oah/gpt-4.1",
messages: [
{
role: "user",
content: "Summarize this patient record: John Smith, SSN 123-45-6789",
},
],
});
// ModelGate automatically:
// 1. Scans the prompt for PII entities
// 2. Redacts "John Smith" → [PERSON], "123-45-6789" → [US_SSN]
// 3. Forwards the cleaned prompt to the provider
// 4. Returns the response with x-dlp-latency timing headerfrom openai import OpenAI
client = OpenAI(
api_key="os_hub_your_key_here",
base_url="https://api.aimodelgate.ai/v1",
)
response = client.chat.completions.create(
model="oah/gpt-4.1",
messages=[
{
"role": "user",
"content": "Summarize this patient record: John Smith, SSN 123-45-6789",
}
],
)
# PII is redacted before the request reaches OpenAI.
# Check response headers for scan timing and violation counts.curl -X POST https://api.aimodelgate.ai/v1/chat/completions \
-H "Authorization: Bearer os_hub_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "oah/gpt-4.1",
"messages": [
{
"role": "user",
"content": "Summarize this patient record: John Smith, SSN 123-45-6789"
}
]
}'
# Response headers include:
# x-dlp-latency: 12
# x-request-id: req_xxxx
# hub_metadata.entity_types_detected: ["PERSON","US_SSN"]Zero-config protection: Every request is scanned against the default “Maximum Protection” policy that covers all 28 entity types. Tune strict / balanced / relaxed sensitivity, layer in custom regex for proprietary data, and pin immutable policy versions per environment. For granular control, create custom DLP policies per project in the dashboard — including per-project usage and reporting.
Violation Response Format
When the firewall detects a BLOCK-level entity (like a prompt injection attempt), it rejects the request immediately with a structured error response:
{
"error": {
"message": "Security policy violation: request blocked.",
"type": "security_violation",
"code": 400,
"violations": [
{
"entity": "PROMPT_INJECTION",
"action": "BLOCK",
"start": 0,
"end": 47
},
{
"entity": "US_SSN",
"action": "REDACT",
"start": 52,
"end": 63
}
],
"correlation_id": "req_a1b2c3d4"
}
}Supported AI Providers
PII redaction works identically across all providers. Use a single gateway endpoint and virtual model names (oah/*) — ModelGate handles provider routing automatically, including smart cost routing to the least expensive compatible backend when you opt in.
Benefits of an AI Security Gateway
Prevent data leaks
PII is redacted before it leaves your infrastructure. The model provider never sees raw sensitive data.
Central policy enforcement
Define DLP policies once with strict, balanced, or relaxed sensitivity; immutable policy versions roll forward safely. Apply them to every model, provider, and request from one dashboard.
Provider-agnostic governance
Switch between OpenAI, Groq, Anthropic, or any provider — optionally with BYOK. The same security policies follow your traffic.
Audit logging
Metadata-only logs: entity types, actions, correlation IDs — not full prompt bodies. Per-project dashboards summarize scans and spend for compliance reviews.
Compliance readiness
Demonstrate GDPR, HIPAA, and PCI-DSS controls with documented, automated PII handling across all AI integrations.
Low-latency scanning
The firewall typically adds under 50ms per text request (~30ms median). Verify with the x-dlp-latency response header.
Smart cost routing
The gateway can auto-select the cheapest qualified provider for equivalent models — many teams see roughly 40–60% lower inference spend versus a single-vendor default.
Wallet & spending limits
Wallet-based credits and project-level caps help prevent runaway spend while DLP stays on for every call.
Try It with AI ModelGate
Get started in under five minutes. No credit card required for the free tier — every request is protected from your very first API call. Integration is usually a 2-line OpenAI SDK change (base URL + ModelGate key); use wallet top-ups and per-project dashboards to track usage alongside DLP events.
FAQ
- Are prompts stored?
- No. Processing is in-memory and stateless; logging is metadata-only (e.g., entity types, actions, correlation IDs) — not full prompt content.
- What are strict, balanced, and relaxed?
- Sensitivity presets that trade off false positives vs. coverage. Start balanced, tighten to strict for regulated workloads, or use relaxed for low-risk internal tools.
- Can I use my own provider API keys?
- Yes. BYOK lets you attach provider credentials while the gateway still applies DLP, routing, and wallet limits.
Related Documentation
- Enterprise AI DLP in 60 Seconds — Add PII protection to any app with a two-line code change
- Prompt-Level DLP & PII Redaction — Architecture, latency benchmarks & 4-layer detection pipeline
- LLM Budget Enforcement — Token quotas, threshold alerts & recursive loop protection
- OpenAI-Compatible Proxy — Drop-in replacement for the OpenAI SDK
- OpenRouter Alternative — AI gateway with built-in governance
- Vercel AI Gateway Alternative — Active security vs passive logging
- AI Firewall (DLP) — Full entity reference and policy configuration
- Quickstart — Connect your first application in 2 minutes
- Billing & Wallet Docs — Credit system, top-ups, and deduction mechanics
- Model Catalog — Pricing across 300+ models and 9 providers
- Enterprise Security & Trust Center
- Product Roadmap — Phase 1.1 Budget Enforcement & beyond
Join the Community