OpenAI-Compatible Proxy for Multi-Provider AI Models

AI ModelGate exposes a single POST /v1/chat/completions endpoint that is fully compatible with the OpenAI API specification. Point the official OpenAI SDK — or any OpenAI-compatible library — at ModelGate and gain instant access to 300+ models across 9 providers, with built-in PII redaction, budget enforcement, and smart cost routing.

Migrate your existing OpenAI integration in two lines of code: change the baseURL and the apiKey. Everything else — your model names, message format, streaming, function calling — works exactly the same.

Why This Matters

Directly integrating with each AI provider creates fragile, expensive architectures that are painful to maintain and impossible to govern centrally.

■Vendor lock-in — Direct API integrations tie your codebase to a single provider. Switching from OpenAI to Anthropic means rewriting every API call, message format, and error handler.
■SDK sprawl — Each provider has its own SDK, authentication scheme, and response format. Your dependency tree grows, and so does the surface area for breaking changes.
■No unified governance — PII filtering, cost limits, and audit logging must be reimplemented for every provider integration. Miss one and you have a compliance gap.
■Cost opacity — Comparing prices across providers requires manual spreadsheet work. You can't programmatically route to the cheapest option without building your own routing layer.

The Problem with Multiple AI APIs

A typical production application may use Groq for fast inference, OpenAI for complex reasoning, Anthropic for long-context tasks, and Mistral for code generation. Each requires its own integration:

Provider	API Format	Auth Method
OpenAI	REST + SDK	Bearer token
Anthropic	Custom Messages API	x-api-key header
Groq	OpenAI-compatible	Bearer token
Together.ai	OpenAI-compatible	Bearer token
Google Gemini	Vertex / Gemini API	OAuth / API key
xAI	OpenAI-compatible	Bearer token
Mistral AI	OpenAI-compatible	Bearer token
AWS Bedrock	Bedrock API	AWS SigV4
DeepInfra	OpenAI-compatible	Bearer token

That is 9 SDKs, 4 different authentication schemes, 3 distinct API formats, and 9 separate error-handling paths. Each provider upgrade is a potential breaking change across your entire stack.

What Is an OpenAI-Compatible Proxy?

An OpenAI-compatible proxy accepts requests in the exact format the OpenAI API expects, then translates and routes them to the correct downstream provider. Your application code uses the standard OpenAI SDK — it never needs to know which provider is actually serving the request.

Your App→OpenAI SDK→ModelGate Proxy→PII + Budget→Any Provider

Single endpoint: POST https://api.aimodelgate.ai/v1/chat/completions

What the proxy handles for you

•Protocol translation — Converts the OpenAI message format to Anthropic Messages API, Google Vertex, AWS Bedrock SigV4, etc.
•Authentication — One ModelGate API key replaces 9 provider credentials. In BYOK mode, your stored keys are decrypted and injected at request time.
•PII & DLP — The AI Firewall scans for 28 entity types. Use REDACT to mask spans or BLOCK to reject the call. Tune with strict, balanced, or relaxed sensitivity, plus custom regex for internal IDs and proprietary tokens.
•Model-tier controls — Restrict or block premium / high-cost model tiers per project so only approved classes reach the upstream API.
•Budget enforcement — Pre-flight balance checks prevent overspending in Managed Mode.
•Response normalization — All provider responses are returned in the standard OpenAI response format, regardless of the upstream provider.

Privacy, policy & reporting

•Stateless path — The proxy does not store full prompts for auditing; governance is metadata-only (IDs, counts, policy outcomes, routing) to support data sovereignty and minimize sensitive retention.
•Per-project dashboards — Usage, policy hits, and spend roll up per API key / project for operational reporting.
•Policy versioning — Policies ship as immutable versions so you always know which rules were active for a given request window.

Migrate in Two Lines of Code

If your application already uses the OpenAI SDK, migration is a two-line change — the apiKey and the baseURL. Toggle between “OpenAI Direct” and “Via ModelGate” to see exactly what changes:

Node.js / TypeScript

import OpenAI from "openai";
 
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  // baseURL defaults to https://api.openai.com/v1
});
 
const response = await client.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "user", content: "Explain quantum computing" }
  ],
  max_tokens: 512,
});
 
console.log(response.choices[0].message.content);

Direct to OpenAI

Standard OpenAI SDK integration. Toggle to to see the two-line change.

The same pattern works in every language. Here are the complete examples:

Node.js / TypeScript — OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "os_hub_your_key_here",
  baseURL: "https://api.aimodelgate.ai/v1",
});

const response = await client.chat.completions.create({
  model: "oah/llama-3-70b",          // virtual model → smart-routed
  messages: [
    { role: "user", content: "Explain quantum computing" }
  ],
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

// Response headers:
// x-dlp-latency:           12        (DLP scan time)
// x-request-id:            req_xxxx  (audit trail)
// x-modelgate-model:             llama-3-70b
// x-modelgate-provider:          groq

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="os_hub_your_key_here",
    base_url="https://api.aimodelgate.ai/v1",
)

response = client.chat.completions.create(
    model="oah/llama-3-70b",          # virtual model → smart-routed
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    max_tokens=512,
)

print(response.choices[0].message.content)

Python — LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="oah/gpt-4.1-mini",
    openai_api_key="os_hub_your_key_here",
    openai_api_base="https://api.aimodelgate.ai/v1",
    max_tokens=512,
)

response = llm.invoke("Explain quantum computing")
print(response.content)

# All LangChain features work: chains, agents, tools, streaming.
# PII redaction and budget enforcement happen transparently.

cURL

curl -X POST https://api.aimodelgate.ai/v1/chat/completions \
  -H "Authorization: Bearer os_hub_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "oah/llama-3-70b",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "max_tokens": 512
  }'

Example Request and Response

Request — POST /v1/chat/completions

{
  "model": "oah/llama-3-70b",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "max_tokens": 256,
  "temperature": 0.7
}

Response — standard OpenAI format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709234567,
  "model": "oah/llama-3-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits) instead of classical bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 87,
    "total_tokens": 99
  }
}

Smart Provider Routing

When you use a virtual model name (prefixed with oah/), ModelGate's Smart Router automatically selects the best provider for that request:

Cost optimization — The router indexes pricing across all providers that serve the requested model and selects the cheapest available option on a best-effort basis. Teams often see roughly 40–60% lower spend on routed open-weight traffic versus sticking to a single default provider.

Availability-aware — If your primary provider is down or rate-limited, the router can fall back to an alternative that serves the same model (for open-source models available on multiple providers).

BYOK passthrough — If you have BYOK keys stored for a provider, the router will use your credentials (zero ModelGate cost). If not, it falls back to Managed Mode and deducts from your wallet.

Sovereign models

Open-weight models like Llama 4, DeepSeek, Qwen 3, and Mixtral are hosted by multiple providers. The Smart Router compares prices across Groq, Together.ai, DeepInfra, and others — then picks the best rate.

External models

Closed-source models like GPT-4.1 (OpenAI), Claude (Anthropic), Gemini (Google), and Grok (xAI) are only available from their creator. These are routed directly to the single provider — no routing decision needed.

Bypass the router — use a provider-specific model ID

// If you need a specific provider, use their native model ID:
const response = await client.chat.completions.create({
  model: "groq/llama-3.3-70b-versatile",  // forces Groq
  messages: [{ role: "user", content: "Hello" }],
});

// Or force Together.ai:
// model: "together/meta-llama/Llama-3.3-70B-Instruct"

// ModelGate still applies PII redaction and budget checks,
// but skips the Smart Router's provider selection.

Supported OpenAI Features

The proxy supports the full /v1/chat/completions specification. Everything you use with the OpenAI SDK works through ModelGate:

Chat completionsFull supportMessages array, system/user/assistant roles

StreamingFull supportServer-sent events (stream: true)

Function / tool callingFull supporttools array, tool_choice parameter

JSON modeFull supportresponse_format: { type: "json_object" }

Vision (images)Full supportimage_url content type with OCR security scan

Temperature / top_pFull supportAll sampling parameters forwarded

max_tokensFull supportAuto-capped in Managed Mode by wallet balance

stop sequencesFull supportStop tokens forwarded to provider

Benefits of a Unified AI Proxy

Zero vendor lock-in

Switch between providers by changing a model name — not rewriting your integration. Move from GPT-4.1 to Claude Sonnet 4.6 or Llama 4 Maverick without touching your SDK code.

Single dependency

One SDK (openai), one endpoint, one API key. Remove the Anthropic, Groq, Google, and Mistral SDKs from your dependency tree.

Automatic cost optimization

The Smart Router finds the cheapest provider for each open-source model — often ~40–60% savings versus a single-vendor default. Wallet enforcement keeps spend visible and capped.

Centralized governance

PII redaction, prompt injection detection, and DLP policies apply to every provider through one control plane — no per-provider reimplementation.

Built-in audit trail

Every request is tagged with a correlation ID, scan timing, model used, and provider selected. Debugging and compliance reporting are built in.

Future-proof

New providers and models are added to ModelGate without any code changes on your side. Use them immediately via virtual model names.

Configuration Reference

For most applications, you only need two environment variables:

.env — minimal configuration

# Required: your ModelGate API key (os_hub_* or oah_* for project-scoped)
OPENAI_API_KEY=os_hub_your_key_here

# Required: ModelGate endpoint
OPENAI_BASE_URL=https://api.aimodelgate.ai/v1

# Optional: default model for your application
DEFAULT_MODEL=oah/llama-3-70b

Many frameworks (LangChain, LlamaIndex, Vercel AI SDK) read OPENAI_API_KEY and OPENAI_BASE_URL automatically. Setting these environment variables may be all you need — zero code changes.

Try the Unified API

Create an account, get your API key, and point your OpenAI SDK at ModelGate. Every request is automatically scanned for PII, budget-checked, and routed to the best available provider — from your very first API call.

Start for Free API Documentation Model Index (300+ Models)