OpenAI-Compatible Proxy for Multi-Provider AI Models
AI ModelGate exposes a single POST /v1/chat/completions endpoint that is fully compatible with the OpenAI API specification. Point the official OpenAI SDK — or any OpenAI-compatible library — at ModelGate and gain instant access to 300+ models across 9 providers, with built-in PII redaction, budget enforcement, and smart cost routing.
Migrate your existing OpenAI integration in two lines of code: change the baseURL and the apiKey. Everything else — your model names, message format, streaming, function calling — works exactly the same.
Why This Matters
Directly integrating with each AI provider creates fragile, expensive architectures that are painful to maintain and impossible to govern centrally.
- ■Vendor lock-in — Direct API integrations tie your codebase to a single provider. Switching from OpenAI to Anthropic means rewriting every API call, message format, and error handler.
- ■SDK sprawl — Each provider has its own SDK, authentication scheme, and response format. Your dependency tree grows, and so does the surface area for breaking changes.
- ■No unified governance — PII filtering, cost limits, and audit logging must be reimplemented for every provider integration. Miss one and you have a compliance gap.
- ■Cost opacity — Comparing prices across providers requires manual spreadsheet work. You can't programmatically route to the cheapest option without building your own routing layer.
The Problem with Multiple AI APIs
A typical production application may use Groq for fast inference, OpenAI for complex reasoning, Anthropic for long-context tasks, and Mistral for code generation. Each requires its own integration:
| Provider | API Format | Auth Method |
|---|---|---|
| OpenAI | REST + SDK | Bearer token |
| Anthropic | Custom Messages API | x-api-key header |
| Groq | OpenAI-compatible | Bearer token |
| Together.ai | OpenAI-compatible | Bearer token |
| Google Gemini | Vertex / Gemini API | OAuth / API key |
| xAI | OpenAI-compatible | Bearer token |
| Mistral AI | OpenAI-compatible | Bearer token |
| AWS Bedrock | Bedrock API | AWS SigV4 |
| DeepInfra | OpenAI-compatible | Bearer token |
That is 9 SDKs, 4 different authentication schemes, 3 distinct API formats, and 9 separate error-handling paths. Each provider upgrade is a potential breaking change across your entire stack.
What Is an OpenAI-Compatible Proxy?
An OpenAI-compatible proxy accepts requests in the exact format the OpenAI API expects, then translates and routes them to the correct downstream provider. Your application code uses the standard OpenAI SDK — it never needs to know which provider is actually serving the request.
Single endpoint: POST https://api.aimodelgate.ai/v1/chat/completions
What the proxy handles for you
- •Protocol translation — Converts the OpenAI message format to Anthropic Messages API, Google Vertex, AWS Bedrock SigV4, etc.
- •Authentication — One ModelGate API key replaces 9 provider credentials. In BYOK mode, your stored keys are decrypted and injected at request time.
- •PII & DLP — The AI Firewall scans for 28 entity types. Use REDACT to mask spans or BLOCK to reject the call. Tune with strict, balanced, or relaxed sensitivity, plus custom regex for internal IDs and proprietary tokens.
- •Model-tier controls — Restrict or block premium / high-cost model tiers per project so only approved classes reach the upstream API.
- •Budget enforcement — Pre-flight balance checks prevent overspending in Managed Mode.
- •Response normalization — All provider responses are returned in the standard OpenAI response format, regardless of the upstream provider.
Privacy, policy & reporting
- •Stateless path — The proxy does not store full prompts for auditing; governance is metadata-only (IDs, counts, policy outcomes, routing) to support data sovereignty and minimize sensitive retention.
- •Per-project dashboards — Usage, policy hits, and spend roll up per API key / project for operational reporting.
- •Policy versioning — Policies ship as immutable versions so you always know which rules were active for a given request window.
Migrate in Two Lines of Code
If your application already uses the OpenAI SDK, migration is a two-line change — the apiKey and the baseURL. Toggle between “OpenAI Direct” and “Via ModelGate” to see exactly what changes:
import OpenAI from "openai";const client = new OpenAI({apiKey: process.env.OPENAI_API_KEY,// baseURL defaults to https://api.openai.com/v1});const response = await client.chat.completions.create({model: "gpt-4",messages: [{ role: "user", content: "Explain quantum computing" }],max_tokens: 512,});console.log(response.choices[0].message.content);
The same pattern works in every language. Here are the complete examples:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "os_hub_your_key_here",
baseURL: "https://api.aimodelgate.ai/v1",
});
const response = await client.chat.completions.create({
model: "oah/llama-3-70b", // virtual model → smart-routed
messages: [
{ role: "user", content: "Explain quantum computing" }
],
max_tokens: 512,
});
console.log(response.choices[0].message.content);
// Response headers:
// x-dlp-latency: 12 (DLP scan time)
// x-request-id: req_xxxx (audit trail)
// x-modelgate-model: llama-3-70b
// x-modelgate-provider: groqfrom openai import OpenAI
client = OpenAI(
api_key="os_hub_your_key_here",
base_url="https://api.aimodelgate.ai/v1",
)
response = client.chat.completions.create(
model="oah/llama-3-70b", # virtual model → smart-routed
messages=[
{"role": "user", "content": "Explain quantum computing"}
],
max_tokens=512,
)
print(response.choices[0].message.content)from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="oah/gpt-4.1-mini",
openai_api_key="os_hub_your_key_here",
openai_api_base="https://api.aimodelgate.ai/v1",
max_tokens=512,
)
response = llm.invoke("Explain quantum computing")
print(response.content)
# All LangChain features work: chains, agents, tools, streaming.
# PII redaction and budget enforcement happen transparently.curl -X POST https://api.aimodelgate.ai/v1/chat/completions \
-H "Authorization: Bearer os_hub_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "oah/llama-3-70b",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"max_tokens": 512
}'Example Request and Response
{
"model": "oah/llama-3-70b",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
"max_tokens": 256,
"temperature": 0.7
}{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709234567,
"model": "oah/llama-3-70b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits (qubits) instead of classical bits..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 87,
"total_tokens": 99
}
}Smart Provider Routing
When you use a virtual model name (prefixed with oah/), ModelGate's Smart Router automatically selects the best provider for that request:
Cost optimization — The router indexes pricing across all providers that serve the requested model and selects the cheapest available option on a best-effort basis. Teams often see roughly 40–60% lower spend on routed open-weight traffic versus sticking to a single default provider.
Availability-aware — If your primary provider is down or rate-limited, the router can fall back to an alternative that serves the same model (for open-source models available on multiple providers).
BYOK passthrough — If you have BYOK keys stored for a provider, the router will use your credentials (zero ModelGate cost). If not, it falls back to Managed Mode and deducts from your wallet.
// If you need a specific provider, use their native model ID:
const response = await client.chat.completions.create({
model: "groq/llama-3.3-70b-versatile", // forces Groq
messages: [{ role: "user", content: "Hello" }],
});
// Or force Together.ai:
// model: "together/meta-llama/Llama-3.3-70B-Instruct"
// ModelGate still applies PII redaction and budget checks,
// but skips the Smart Router's provider selection.Supported OpenAI Features
The proxy supports the full /v1/chat/completions specification. Everything you use with the OpenAI SDK works through ModelGate:
Benefits of a Unified AI Proxy
Zero vendor lock-in
Switch between providers by changing a model name — not rewriting your integration. Move from GPT-4.1 to Claude Sonnet 4.6 or Llama 4 Maverick without touching your SDK code.
Single dependency
One SDK (openai), one endpoint, one API key. Remove the Anthropic, Groq, Google, and Mistral SDKs from your dependency tree.
Automatic cost optimization
The Smart Router finds the cheapest provider for each open-source model — often ~40–60% savings versus a single-vendor default. Wallet enforcement keeps spend visible and capped.
Centralized governance
PII redaction, prompt injection detection, and DLP policies apply to every provider through one control plane — no per-provider reimplementation.
Built-in audit trail
Every request is tagged with a correlation ID, scan timing, model used, and provider selected. Debugging and compliance reporting are built in.
Future-proof
New providers and models are added to ModelGate without any code changes on your side. Use them immediately via virtual model names.
Configuration Reference
For most applications, you only need two environment variables:
# Required: your ModelGate API key (os_hub_* or oah_* for project-scoped)
OPENAI_API_KEY=os_hub_your_key_here
# Required: ModelGate endpoint
OPENAI_BASE_URL=https://api.aimodelgate.ai/v1
# Optional: default model for your application
DEFAULT_MODEL=oah/llama-3-70bMany frameworks (LangChain, LlamaIndex, Vercel AI SDK) read OPENAI_API_KEY and OPENAI_BASE_URL automatically. Setting these environment variables may be all you need — zero code changes.
Try the Unified API
Create an account, get your API key, and point your OpenAI SDK at ModelGate. Every request is automatically scanned for PII, budget-checked, and routed to the best available provider — from your very first API call.
Related Documentation
- AI Gateway with PII Redaction — How 28-entity detection protects every request
- Prompt-Level DLP & PII Redaction — Architecture, latency benchmarks & detection pipeline
- LLM Budget Enforcement — Token quotas, threshold alerts & recursive loop protection
- OpenRouter Alternative — AI gateway with built-in governance
- Vercel AI Gateway Alternative — Active security vs passive logging
- Quickstart — Connect your first application in 2 minutes
- Billing & Wallet Docs — Credit system, top-ups, and deduction mechanics
- Model Catalog — Pricing across 300+ models and 9 providers
- Enterprise Security & Trust Center
- Product Roadmap — Phase 1.1 Budget Enforcement & beyond
Join the Community