mimir

LLM Semantic Cache

mimir is a drop-in proxy that caches LLM API responses using semantic similarity, reducing costs and latency for repeated or similar queries.

Features

Semantic Caching - Cache hits for semantically similar prompts, not just exact matches
Free Local Embeddings - Use Ollama for embeddings with zero API costs
OpenAI-Compatible - Drop-in replacement proxy for OpenAI API
Configurable Threshold - Tune similarity sensitivity (0.0-1.0)
TTL Support - Time-based cache expiration
Zero Dependencies - Single binary, no external database required
Docker Ready - Simple containerized deployment

How It Works

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client    │────▶│    mimir    │────▶│  LLM API    │
│  (app/pod)  │◀────│   (proxy)   │◀────│ (OpenAI/..) │
└─────────────┘     └──────┬──────┘     └─────────────┘
                           │
                    ┌──────▼──────┐
                    │ Vector Store│
                    │ (embeddings)│
                    └─────────────┘

Incoming request is converted to an embedding
Cache is searched for semantically similar previous requests
If similarity exceeds threshold → return cached response
Otherwise → forward to upstream, cache response

Quick Start

Option 1: Local Embeddings with Ollama (Free)

# Install Ollama (if not already installed)
brew install ollama  # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Start Ollama and pull embedding model
ollama serve &
ollama pull nomic-embed-text

# Clone and run mimir
git clone https://github.com/aqstack/mimir.git
cd mimir
make build
./bin/mimir

Option 2: OpenAI Embeddings

# Clone and build
git clone https://github.com/aqstack/mimir.git
cd mimir
make build

# Run with OpenAI
export OPENAI_API_KEY=sk-...
./bin/mimir

Using Docker

# With Ollama (requires Ollama running on host)
docker run -p 8080:8080 -e OLLAMA_BASE_URL=http://host.docker.internal:11434 ghcr.io/aqstack/mimir:latest

# With OpenAI
docker run -p 8080:8080 -e OPENAI_API_KEY=$OPENAI_API_KEY ghcr.io/aqstack/mimir:latest

Usage

Point your OpenAI client to mimir instead of the OpenAI API:

from openai import OpenAI

# Point to mimir proxy
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-api-key"  # or use OPENAI_API_KEY env var
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

# Check cache status in response headers
# X-Mimir-Cache: HIT or MISS
# X-Mimir-Similarity: 0.9823 (if HIT)

Configuration

Environment Variable	Default	Description
`MIMIR_EMBEDDING_PROVIDER`	`ollama`	Embedding provider: `ollama` or `openai`
`MIMIR_EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model name
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`OPENAI_API_KEY`	-	OpenAI API key (auto-switches provider if set)
`OPENAI_BASE_URL`	`https://api.openai.com/v1`	Upstream API URL
`MIMIR_PORT`	`8080`	Server port
`MIMIR_HOST`	`0.0.0.0`	Server host
`MIMIR_SIMILARITY_THRESHOLD`	`0.95`	Minimum similarity for cache hit (0.0-1.0)
`MIMIR_CACHE_TTL`	`24h`	Cache entry time-to-live
`MIMIR_MAX_CACHE_SIZE`	`10000`	Maximum cache entries
`MIMIR_LOG_JSON`	`false`	JSON log format

Embedding Models

Ollama (free, local):

nomic-embed-text (768 dims, recommended)
mxbai-embed-large (1024 dims)
all-minilm (384 dims, fastest)

OpenAI (paid):

text-embedding-3-small (1536 dims, recommended)
text-embedding-3-large (3072 dims)
text-embedding-ada-002 (1536 dims)

API Endpoints

Endpoint	Description
`POST /v1/chat/completions`	Chat completions (cached)
`GET /health`	Health check
`GET /stats`	Cache statistics
`* /v1/*`	Other OpenAI endpoints (passthrough)

Cache Statistics

curl http://localhost:8080/stats

{
  "total_entries": 150,
  "total_hits": 1234,
  "total_misses": 567,
  "hit_rate": 0.685,
  "estimated_saved_usd": 1.234
}

Tuning the Similarity Threshold

The MIMIR_SIMILARITY_THRESHOLD controls how similar a query must be to trigger a cache hit:

Threshold	Behavior
`0.99`	Nearly exact matches only
`0.95`	Very similar queries (recommended)
`0.90`	Moderate similarity
`0.85`	Loose matching (may return less relevant)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
cmd/mimir		cmd/mimir
docs		docs
internal		internal
pkg/api		pkg/api
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mimir

Features

How It Works

Quick Start

Option 1: Local Embeddings with Ollama (Free)

Option 2: OpenAI Embeddings

Using Docker

Usage

Configuration

Embedding Models

API Endpoints

Cache Statistics

Tuning the Similarity Threshold

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

aqstack/mimir

Folders and files

Latest commit

History

Repository files navigation

mimir

Features

How It Works

Quick Start

Option 1: Local Embeddings with Ollama (Free)

Option 2: OpenAI Embeddings

Using Docker

Usage

Configuration

Embedding Models

API Endpoints

Cache Statistics

Tuning the Similarity Threshold

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages