OpenAI API integration guide for production apps

How to integrate the OpenAI API into a real product — keys, rate limits, retries, cost control, streaming, and the gotchas docs gloss over.

Treat the model like any other external API

It will rate-limit you, slow down, return malformed JSON, and occasionally go down. Build for that from day one.

The minimum production setup

Server-side calls only — never ship your API key to the browser
A gateway — one place that adds keys, picks the model, and logs usage
Retries with backoff — on 429 and 5xx, with a max attempt count
Timeouts — every call, no exceptions
Structured outputs — use JSON schema, not "please respond in JSON"

Streaming for chat UX

Stream tokens to the client over SSE. Perceived latency drops by 5–10x even when total time is unchanged. Don't block on the full response when the user is watching.

Cost control

Cap per-user and per-day spend in the gateway
Pick the smallest model that passes your evals
Cache identical prompts (especially system prompts)
Log input/output tokens per request — you can't optimise what you don't measure

What still bites teams

Schema drift when models update — pin model versions
Tool/function calls returning empty args — handle as a retryable error
Long context costing 10x without obviously better answers — measure before you scale context