Step 5
Gemini · OpenAI-compatible APIs
25 min
Gemini · OpenAI-compatible APIs
Unify LM Studio, Gemini, OpenAI, Anthropic behind the OpenAI-compatible interface and switch with a single env var.
1. The de-facto standard
POST /v1/chat/completions schema — supported by LM Studio, Ollama, Gemini, Groq, Together, and more.
2. Single client abstraction
from openai import OpenAI
def make_client(provider):
if provider == "lmstudio":
return OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
if provider == "gemini":
return OpenAI(
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
api_key=os.environ["GEMINI_API_KEY"],
)
if provider == "openai":
return OpenAI(api_key=os.environ["OPENAI_API_KEY"])
raise ValueError(provider)
client = make_client(os.environ.get("LLM_PROVIDER", "lmstudio"))
3. Model name mapping
| Provider | Chat model | Embedding |
|---|---|---|
| LM Studio | gemma-2-9b-it |
separate model |
| Gemini | gemini-1.5-flash · gemini-2.0-flash-exp |
models/text-embedding-004 |
| OpenAI | gpt-4o-mini · gpt-4o |
text-embedding-3-small |
| Groq | llama-3.1-70b-versatile |
— |
4. Cost · latency (rough 2026)
| Provider | In 1M tok | Out 1M tok | p50 |
|---|---|---|---|
| Gemini 1.5 flash | $0.075 | $0.30 | 500ms |
| GPT-4o mini | $0.15 | $0.60 | 600ms |
| Claude Haiku | $0.25 | $1.25 | 700ms |
| Local Gemma 9B (GPU) | $0 | $0 | 100–300ms |
5. Streaming
stream = client.chat.completions.create(
model=model, messages=[...], stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta: yield delta
Pair with FastAPI StreamingResponse.
6. Fallback
async def chat_with_fallback(messages):
for p in ["lmstudio", "gemini", "openai"]:
try:
return make_client(p).chat.completions.create(model=MODEL_MAP[p]["chat"], messages=messages)
except Exception as e:
logger.warning(f"{p} failed: {e}")
raise RuntimeError("all providers failed")
Local → free quota → paid.
7. Gotchas
- Model typos — use
client.models.list() - Token limits differ per provider (LM Studio at load time)
- Sync vs async —
openaishipsAsyncOpenAIseparately - API key leakage — env vars only, never logs
Closing
The OpenAI-compatible interface reduces vendor lock-in. One env var switches between dev, free quota, and production.
Next
- 06-prompt-design