Notes

Working notes

Performance, cost, and the boring details of running embedding caches in production. No vendor pitches.

2026-05-12

Embedding API spend is a tax on stable inputs

Most of the budget you burn on hosted embedding APIs pays for vectors you've already computed. A cache turns that line item into a one-time cost.

#cost#rag#caching
2026-04-08

Content-hash keying vs LRU: which actually saves money

Two cache strategies for embeddings — one keyed on content, one keyed on access recency. They optimize for different things, and only one of them is correct for stable corpora.

#caching#architecture#cost
2026-03-03

Stale-while-revalidate for embeddings: the corner cases

SWR works well for HTTP, less obviously for embeddings. A pragmatic walk through when it helps, when it bites, and what to do about model migrations.

#caching#invalidation#swr