notes
Working notes
Performance, cost, and the boring details of running embedding caches in production. No vendor pitches.
- 2026-05-12 Embedding API spend is a tax on stable inputs
Most of the budget you burn on hosted embedding APIs pays for vectors you've already computed. A cache turns that line item into a one-time cost.
- 2026-04-08 Content-hash keying vs LRU: which actually saves money
Two cache strategies for embeddings — one keyed on content, one keyed on access recency. They optimize for different things, and only one of them is correct for stable corpora.
- 2026-03-03 Stale-while-revalidate for embeddings: the corner cases
SWR works well for HTTP, less obviously for embeddings. A pragmatic walk through when it helps, when it bites, and what to do about model migrations.