·6 min read
OpenAI API integration guide for production apps
How to integrate the OpenAI API into a real product — keys, rate limits, retries, cost control, streaming, and the gotchas docs gloss over.
OpenAIAPI integrationAI engineering
Treat the model like any other external API
It will rate-limit you, slow down, return malformed JSON, and occasionally go down. Build for that from day one.
The minimum production setup
- Server-side calls only — never ship your API key to the browser
- A gateway — one place that adds keys, picks the model, and logs usage
- Retries with backoff — on 429 and 5xx, with a max attempt count
- Timeouts — every call, no exceptions
- Structured outputs — use JSON schema, not "please respond in JSON"
Streaming for chat UX
Stream tokens to the client over SSE. Perceived latency drops by 5–10x even when total time is unchanged. Don't block on the full response when the user is watching.
Cost control
- Cap per-user and per-day spend in the gateway
- Pick the smallest model that passes your evals
- Cache identical prompts (especially system prompts)
- Log input/output tokens per request — you can't optimise what you don't measure
What still bites teams
- Schema drift when models update — pin model versions
- Tool/function calls returning empty args — handle as a retryable error
- Long context costing 10x without obviously better answers — measure before you scale context



