Recipe-driven recommender training and serving, built on
irspack. One YAML recipe describes
where the data lives, how to train, and where to write the result —
recotem train produces a signed binary artifact, recotem serve
mounts it as a /predict/{name} HTTP endpoint and hot-swaps when a new
artifact appears. No database, no message broker, no admin UI.
Most recommender stacks pull in a service mesh of databases, queues, and control planes before you can train your first model. Recotem keeps the moving parts to a recipe file and a binary artifact:
- Single binary, two commands.
recotem trainruns as a batch job;recotem serveruns as a long-lived FastAPI process. They share nothing but the artifact file on disk (or object storage). - Reproducible by construction. Recipes are versioned with your code; artifacts are HMAC-signed with a SHA-checked header you can inspect without loading the model.
- Hot-swap, no restart. The serving process watches the artifact directory and atomically swaps the in-memory model when training emits a new file.
- Bring-your-own scheduler.
recotem trainis a normal process — drive it from cron, Airflow, a Kubernetes CronJob, or anything else.
- Recipe-driven: 1 YAML = 1 model = 1
/predict/{name}endpoint - Hyperparameter search across irspack algorithms via Optuna
- Pluggable data sources (built-in: CSV / Parquet / BigQuery / SQL / GA4; extend via Python entry points)
- HMAC-signed artifacts with multi-key rotation and a deterministic FQCN allow-list at deserialization time
- API-key authentication (
X-API-Key); keys hashed at rest - fsspec paths everywhere — local, S3, GCS, HTTPS, anything fsspec speaks
- Optional Prometheus metrics endpoint, structured JSON logs with built-in secret redaction
- CSV / Parquet — local files or any fsspec-reachable URL (S3, GCS, Azure, HTTPS).
- BigQuery — SQL queries with Storage Read API support.
- SQL (PostgreSQL / MySQL / MariaDB / SQLite) — via SQLAlchemy 2. See
docs/data-sources/sql.md. - Google Analytics 4 — direct Data API integration (no BigQuery Export needed). See
docs/data-sources/ga4.md. - Custom plugins — implement the
DataSourceProtocol and register viarecotem.datasourcesentry-points.
pip install recotem # core
pip install "recotem[bigquery]" # BigQuery data source
pip install "recotem[metrics]" # Prometheus metrics endpoint
pip install 'recotem[postgres]' # PostgreSQL via psycopg
pip install 'recotem[mysql]' # MySQL/MariaDB via PyMySQL
pip install 'recotem[sqlite]' # SQLite (stdlib)
pip install 'recotem[ga4]' # Google Analytics 4 Data APIRequires Python 3.12+. A multi-arch Docker image is published to
ghcr.io/codelibs/recotem.
The repository ships with a self-contained example at
examples/quickstart/ — recipe, dataset, and
artifact directory all in one place. Train a TopPop recommender from a
60-user CSV in under a minute.
# 1. Set demo keys. DEMO ONLY — for production, generate fresh keys with
# `recotem keygen --type signing` and `recotem keygen --type api`.
export RECOTEM_SIGNING_KEYS="dev:0000000000000000000000000000000000000000000000000000000000000000"
export RECOTEM_API_PLAINTEXT="recotem-quickstart-demo-key-0000"
export RECOTEM_API_KEYS="dev:sha256:21be5c3be85b8d68123df9f9b6a26d8e307db30350ea8bcc844883e22ebcf125"
# 2. Train, serve
recotem train examples/quickstart/recipe.yaml
recotem serve --recipes examples/quickstart/ &
# Wait for the server to become ready before sending traffic.
until curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health | grep -q "200"; do sleep 1; done
# 3. Predict
curl -X POST http://localhost:8080/predict/top_picks \
-H "X-API-Key: $RECOTEM_API_PLAINTEXT" \
-H "Content-Type: application/json" \
-d '{"user_id": "u01", "cutoff": 5}'{
"items": [{"item_id": "i00", "score": 0.91}],
"model": {"recipe": "top_picks", "trained_at": "...",
"best_class": "TopPopRecommender", "kid": "dev"},
"request_id": "..."
}The recipe itself is 11 lines — every other field has a sensible default.
See examples/quickstart/recipe.yaml
for the source of truth and
docs/recipe-reference.md for the full schema.
| Variable | Required by | Purpose |
|---|---|---|
RECOTEM_SIGNING_KEYS |
train and serve |
HMAC sign / verify artifact files (server keeps plaintext; needed for both sides) |
RECOTEM_API_KEYS |
serve |
Authenticate /predict callers (server keeps hash only) |
X-API-Key: <plaintext> |
HTTP clients | Sent by clients on every /predict call; server re-hashes and compares |
Both variables accept multiple comma-separated entries (kid:value,kid2:value,…)
to enable zero-downtime key rotation — that is why they are pluralised.
┌────────────────────────────────────────────────────────────────────────┐
│ recotem (single Python package) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ recipe.yaml ──▶ recotem train ──▶ artifact.recotem ──▶ recotem serve │
│ (batch job) (HMAC-signed) (FastAPI, │
│ hot-swap) │
│ │
│ any scheduler local FS, S3, POST /predict/{name}│
│ (cron / k8s / …) GCS, fsspec X-API-Key auth │
│ │
└────────────────────────────────────────────────────────────────────────┘
train and serve communicate only via signed artifact files. They
can run on different machines; the watcher swaps models per recipe based
on file mtime.
- Getting started — Docker Compose / pip walkthrough end-to-end
- Recipe reference — every field documented
- Operations — key rotation, sizing, troubleshooting
- Security — threat model, IAM scopes, secrets handling
- Plugin authoring — write a custom data source
- Documentation index
Issues and pull requests welcome. Development uses uv for dependency management:
uv sync --all-extras
uv run pytest tests
uv run ruff check src testsSee CLAUDE.md (or the project guidelines therein) for the full
contributor workflow.