·4 min read
RAG that actually answers
Most RAG systems retrieve plausible chunks and produce confident nonsense. Here's what changes that.
RAGAI
The usual failure mode
Embedding search returns five chunks that look related. The model stitches them into a paragraph that sounds right and isn't.
What we change
- Chunk on semantic boundaries, not character counts
- Re-rank retrieved chunks with a cheaper model before the answer step
- Force citations — if the model can't cite, it shouldn't answer
- Evaluate retrieval and generation separately
The boring infra
A vector store is 20% of the work. The rest is ingestion, freshness, access control, and observability over the queries you didn't anticipate.



