v0.1 · GPL-3.0 · Rust 1.70+

Stop recomputing embeddings.
Start shipping faster.

EmbedCache generates text embeddings locally using FastEmbed and caches them in SQLite. No API keys, no per-token billing, no rate limits. Use it as a Rust library or run it as a REST service.

cache hit < 5 ms embed ~10–50 ms / text cold 100–500 ms model load
cargo install embedcache
$ embedcache
listening on http://127.0.0.1:8081
models loaded: BGESmallENV15, AllMiniLML6V2
cache: ./cache.db (sqlite)

$ curl -X POST localhost:8081/v1/embed \
    -d '{"text":["hello world"]}'
{"vectors":[[ -0.021, 0.114, ... ]],
 "model":"BGESmallENV15",
 "cache":"hit",
 "elapsed_ms":3}
01 / local

22+ models, no network

BGE, MiniLM, Nomic, multilingual E5 — all run on-box via FastEmbed. No OpenAI, no Cohere, no egress bill.

02 / cached

SQLite, survives restart

Process a URL or a string once; subsequent embeddings hit the cache in single-digit milliseconds.

03 / shippable

Library or service

Embed the crate in your Rust app, or run the REST API behind your pipeline. Swagger, ReDoc, RapiDoc, Scalar included.

What it is honest about

Yes

  • Local inference via fastembed (CPU; ONNX runtime).
  • SQLite-backed cache keyed by content + model.
  • REST endpoints: /v1/embed, /v1/process, /v1/params.
  • Optional LLM-based semantic chunking via Ollama or OpenAI.
  • 22+ embedding models from the FastEmbed catalogue.

Not

  • A drop-in cache wrapper for OpenAI / Cohere / Voyage. It replaces them.
  • A vector database. Pair it with Qdrant, pgvector, LanceDB, etc.
  • A managed cloud service. You run it.
  • A GPU-required system. CPU is the path; GPU is not advertised.

Recent notes

All notes →