Recotem

Recipe-driven recommender training and serving, built on irspack. One YAML recipe describes where the data lives, how to train, and where to write the result — recotem train produces a signed binary artifact, recotem serve mounts it as a /predict/{name} HTTP endpoint and hot-swaps when a new artifact appears. No database, no message broker, no admin UI.

Why Recotem

Most recommender stacks pull in a service mesh of databases, queues, and control planes before you can train your first model. Recotem keeps the moving parts to a recipe file and a binary artifact:

Single binary, two commands. recotem train runs as a batch job; recotem serve runs as a long-lived FastAPI process. They share nothing but the artifact file on disk (or object storage).
Reproducible by construction. Recipes are versioned with your code; artifacts are HMAC-signed with a SHA-checked header you can inspect without loading the model.
Hot-swap, no restart. The serving process watches the artifact directory and atomically swaps the in-memory model when training emits a new file.
Bring-your-own scheduler. recotem train is a normal process — drive it from cron, Airflow, a Kubernetes CronJob, or anything else.

Features

Recipe-driven: 1 YAML = 1 model = 1 /predict/{name} endpoint
Hyperparameter search across irspack algorithms via Optuna
Pluggable data sources (built-in: CSV / Parquet / BigQuery / SQL / GA4; extend via Python entry points)
HMAC-signed artifacts with multi-key rotation and a deterministic FQCN allow-list at deserialization time
API-key authentication (X-API-Key); keys hashed at rest
fsspec paths everywhere — local, S3, GCS, HTTPS, anything fsspec speaks
Optional Prometheus metrics endpoint, structured JSON logs with built-in secret redaction

Data Sources

CSV / Parquet — local files or any fsspec-reachable URL (S3, GCS, Azure, HTTPS).
BigQuery — SQL queries with Storage Read API support.
SQL (PostgreSQL / MySQL / MariaDB / SQLite) — via SQLAlchemy 2. See docs/data-sources/sql.md.
Google Analytics 4 — direct Data API integration (no BigQuery Export needed). See docs/data-sources/ga4.md.
Custom plugins — implement the DataSource Protocol and register via recotem.datasources entry-points.

Install

pip install recotem                 # core
pip install "recotem[bigquery]"     # BigQuery data source
pip install "recotem[metrics]"      # Prometheus metrics endpoint
pip install 'recotem[postgres]'     # PostgreSQL via psycopg
pip install 'recotem[mysql]'        # MySQL/MariaDB via PyMySQL
pip install 'recotem[sqlite]'       # SQLite (stdlib)
pip install 'recotem[ga4]'          # Google Analytics 4 Data API

Requires Python 3.12+. A multi-arch Docker image is published to ghcr.io/codelibs/recotem.

Quickstart

The repository ships with a self-contained example at examples/quickstart/ — recipe, dataset, and artifact directory all in one place. Train a TopPop recommender from a 60-user CSV in under a minute.

# 1. Set demo keys. DEMO ONLY — for production, generate fresh keys with
#    `recotem keygen --type signing` and `recotem keygen --type api`.
export RECOTEM_SIGNING_KEYS="dev:0000000000000000000000000000000000000000000000000000000000000000"
export RECOTEM_API_PLAINTEXT="recotem-quickstart-demo-key-0000"
export RECOTEM_API_KEYS="dev:sha256:21be5c3be85b8d68123df9f9b6a26d8e307db30350ea8bcc844883e22ebcf125"

# 2. Train, serve
recotem train examples/quickstart/recipe.yaml
recotem serve --recipes examples/quickstart/ &

# Wait for the server to become ready before sending traffic.
until curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health | grep -q "200"; do sleep 1; done

# 3. Predict
curl -X POST http://localhost:8080/predict/top_picks \
  -H "X-API-Key: $RECOTEM_API_PLAINTEXT" \
  -H "Content-Type: application/json" \
  -d '{"user_id": "u01", "cutoff": 5}'

{
  "items": [{"item_id": "i00", "score": 0.91}],
  "model": {"recipe": "top_picks", "trained_at": "...",
            "best_class": "TopPopRecommender", "kid": "dev"},
  "request_id": "..."
}

The recipe itself is 11 lines — every other field has a sensible default. See examples/quickstart/recipe.yaml for the source of truth and docs/recipe-reference.md for the full schema.

Which env var is needed where?

Variable	Required by	Purpose
`RECOTEM_SIGNING_KEYS`	`train` and `serve`	HMAC sign / verify artifact files (server keeps plaintext; needed for both sides)
`RECOTEM_API_KEYS`	`serve`	Authenticate `/predict` callers (server keeps hash only)
`X-API-Key: <plaintext>`	HTTP clients	Sent by clients on every `/predict` call; server re-hashes and compares

Both variables accept multiple comma-separated entries (kid:value,kid2:value,…) to enable zero-downtime key rotation — that is why they are pluralised.

Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                  recotem (single Python package)                       │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   recipe.yaml ──▶ recotem train ──▶ artifact.recotem ──▶ recotem serve │
│                   (batch job)        (HMAC-signed)        (FastAPI,    │
│                                                            hot-swap)   │
│                                                                        │
│   any scheduler          local FS, S3,             POST /predict/{name}│
│   (cron / k8s / …)       GCS, fsspec               X-API-Key auth      │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

train and serve communicate only via signed artifact files. They can run on different machines; the watcher swaps models per recipe based on file mtime.

Documentation

Getting started — Docker Compose / pip walkthrough end-to-end
Recipe reference — every field documented
Operations — key rotation, sizing, troubleshooting
Security — threat model, IAM scopes, secrets handling
Plugin authoring — write a custom data source
Documentation index

Contributing

Issues and pull requests welcome. Development uses uv for dependency management:

uv sync --all-extras
uv run pytest tests
uv run ruff check src tests

See CLAUDE.md (or the project guidelines therein) for the full contributor workflow.

License

Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
.github		.github
docs		docs
examples		examples
helm/recotem		helm/recotem
src/recotem		src/recotem
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
compose.yaml		compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recotem

Why Recotem

Features

Data Sources

Install

Quickstart

Which env var is needed where?

Architecture

Documentation

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Recotem

Why Recotem

Features

Data Sources

Install

Quickstart

Which env var is needed where?

Architecture

Documentation

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages