All posts
·6 min read

OpenAI API integration guide for production apps

How to integrate the OpenAI API into a real product — keys, rate limits, retries, cost control, streaming, and the gotchas docs gloss over.

OpenAIAPI integrationAI engineering

Treat the model like any other external API

It will rate-limit you, slow down, return malformed JSON, and occasionally go down. Build for that from day one.

The minimum production setup

  • Server-side calls only — never ship your API key to the browser
  • A gateway — one place that adds keys, picks the model, and logs usage
  • Retries with backoff — on 429 and 5xx, with a max attempt count
  • Timeouts — every call, no exceptions
  • Structured outputs — use JSON schema, not "please respond in JSON"

Streaming for chat UX

Stream tokens to the client over SSE. Perceived latency drops by 5–10x even when total time is unchanged. Don't block on the full response when the user is watching.

Cost control

  • Cap per-user and per-day spend in the gateway
  • Pick the smallest model that passes your evals
  • Cache identical prompts (especially system prompts)
  • Log input/output tokens per request — you can't optimise what you don't measure

What still bites teams

  • Schema drift when models update — pin model versions
  • Tool/function calls returning empty args — handle as a retryable error
  • Long context costing 10x without obviously better answers — measure before you scale context
Building something with AI?

Let's design the AI layer of your product together.

A 30-minute discovery call. Free. You leave with a clear, written direction either way.

Start QuizBook a Call