Lumen is a self-hosted AI chat portal for research institutions. It lets your users chat with AI models through a web browser, while giving administrators control over who can access which models and how many tokens each user or group can spend.
Key features:
- Chat interface for AI models (OpenAI-compatible endpoints, Ollama, vLLM, etc.)
- Login via your institution's identity provider through CILogon
- Token budgets per user and group — with optional auto-refresh
- Per-model access control: whitelist, blacklist, and graylist (requires user acknowledgment)
- Admin panel to manage users, groups, and usage
- Round-robin load balancing across multiple model backends
- Docker and Docker Compose
- A public domain name (required for CILogon OAuth)
CILogon provides federated login for research institutions (universities, national labs, etc.).
- Register your application at https://cilogon.org/oauth2/register
- Set the callback URL to
https://your-domain/callback - Request these scopes:
openid email profile org.cilogon.userinfo - Note your
client_idandclient_secret
Copy the example config and edit it:
cp config.yaml.example lumen/config.yamlAt minimum, set:
app.secret_key— a long random stringoauth2.client_idandoauth2.client_secret— from CILogonoauth2.redirect_uri—https://your-domain/callbackadmins— your email addressmodels— at least one model endpoint (see below)
docker compose up -dLumen will be available at https://your-domain.
If you want to run Lumen locally without Docker or CILogon credentials:
uv synccp config.yaml.example config.yamlEdit config.yaml with at minimum:
app:
secret_key: "any-random-string"
encryption_key: "another-random-string"
database_url: sqlite:///lumen_dev.db
debug: true
dev_user: # bypasses OAuth — logs in as this email automatically
email: dev@example.com
groups: # optional: assign groups on every dev login
- staffAnd at least one model under models:. Two options:
Option A: Built-in echo server (no external dependencies)
The repo includes a lightweight echo server that mirrors your message back with sample math. Add this to your config.yaml:
models:
- name: dummy
active: true
input_cost_per_million: 0.0
output_cost_per_million: 0.0
endpoints:
- url: http://localhost:9999/v1
api_key: dummyStart it in a separate terminal before running Lumen:
uv run dummyOption B: Ollama (real local models)
Install Ollama, pull a model, and keep the llama3 entry in config.yaml pointing at http://localhost:11434/v1:
ollama pull llama3.2uv run flask db upgrade
uv run lumenVisit http://localhost:5000, click Login, and you'll be auto-logged in as dev@example.com.
Note: The
dev_useroption skips OAuth entirely. Remove it (or leave it empty) to use normal CILogon authentication.
app:
name: Lumen
tagline: Illuminating AI access
secret_key: change-me-to-something-random # any long random string; used for session cookies
encryption_key: change-me-to-something-different # separate secret used to hash user API keys
database_url: sqlite:///lumen.db # or a postgres:// URL
debug: falseencryption_key can also be supplied via the LUMEN_ENCRYPTION_KEY environment variable, which takes precedence over the value in config.yaml. This is useful for injecting secrets at deploy time (e.g. via Docker secrets or a Kubernetes secret) without writing them into the config file.
Warning: Rotating
encryption_key(orLUMEN_ENCRYPTION_KEY) invalidates all existing user API keys — users will need to generate new ones.
oauth2:
client_id: cilogon:/client_id/...
client_secret: ...
server_metadata_url: https://cilogon.org/.well-known/openid-configuration
redirect_uri: https://your-domain/callback
scopes: openid email profile org.cilogon.userinfo
# Optional: restrict login to one institution
# params:
# idphint: urn:mace:incommon:uiuc.eduadmins:
- you@example.eduAdmins have full access to the admin panel (users, groups, usage stats).
Each model entry defines a name users will see and one or more backend endpoints. Lumen round-robins across endpoints and skips unhealthy ones.
models:
- name: gpt-4o
active: true
input_cost_per_million: 5.0 # for usage tracking only
output_cost_per_million: 15.0
description: "OpenAI GPT-4o" # optional short description shown in the UI
url: https://huggingface.co/... # optional HuggingFace URL — enables README tab on model page
knowledge_cutoff: "2024-04" # optional, shown in model details
supports_reasoning: false # set true to stream chain-of-thought tokens
input_modalities: ["text", "image"] # optional, shown in model details
output_modalities: ["text"]
context_window: 128000 # optional token limit shown in model details
max_output_tokens: 4096 # optional
endpoints:
- url: https://api.openai.com/v1
api_key: sk-...
# model: gpt-4o # optional — overrides the name sent to this endpoint
- name: llama3
active: true
input_cost_per_million: 0.0
output_cost_per_million: 0.0
endpoints:
- url: http://localhost:11434/v1
api_key: ollama
model: llama3.2Set active: false to hide a model without removing it.
Lumen supports three access levels for each model:
| Level | Meaning |
|---|---|
| whitelist | Explicitly allowed — no acknowledgment required |
| graylist | Visible to users, but requires a one-time acknowledgment before use |
| blacklist | Blocked — model is hidden from users |
Access is resolved in this order for each user + model combination:
- User override (admin-set per-user rule) — wins over everything else
- Global blacklist — absolute; no group can override it
- Group per-model rules — blacklist beats whitelist beats graylist
- Global per-model rules (graylist / whitelist)
- Effective default — most permissive group
model_access.defaultwins; falls back to globalmodel_access.default
model_access:
default: whitelist # default for models not listed: whitelist|blacklist|graylist (default: whitelist)
blacklist:
- old-model # always blocked for everyone
graylist:
- experimental # requires one-time user acknowledgment
whitelist:
- safe-model # always allowed, no acknowledgment everUse * as a shorthand for setting the default:
model_access:
blacklist: ["*"] # same as: default: blacklistEach group can define its own model_access: with the same structure:
groups:
restricted:
model_access:
default: blacklist # deny all models for this group
whitelist: [safe-a, safe-b]
vip:
model_access:
whitelist: [experimental] # VIP users skip graylist acknowledgment
all-allowed:
model_access:
default: whitelist # allow everything for this groupWhen a user belongs to multiple groups, the most permissive default wins (e.g. if one group has default: whitelist and another has default: blacklist, the user gets whitelist). For per-model rules, blacklist always beats whitelist/graylist.
Groups control how many coins users can spend. Coins map to cost in USD (e.g. 1 coin ≈ $1 of model usage at your configured rates). Every user gets the default group automatically. You can create additional groups and assign users manually via the admin panel, or auto-assign them based on CILogon attributes.
groups:
default:
max: 0 # coin budget (0 = blocked, -2 = unlimited)
refresh: 0 # coins added per hour (0 = no auto-refresh)
starting: 0 # coins granted on first login
faculty:
max: 50 # $50 total budget
refresh: 0.5 # $0.50/hr auto-refresh
starting: 10 # $10 on first loginAutomatically add users to a group at login based on their CILogon attributes (requires the org.cilogon.userinfo scope):
groups:
staff:
rules:
- field: affiliation
contains: staff@illinois.edu # substring match
- field: idp
equals: urn:mace:incommon:uiuc.edu # exact match
max: 20
refresh: 0.05
starting: 20Supported fields: affiliation, member_of, idp, ou. Groups assigned by rules are automatically removed if the rule no longer matches on next login.
chat:
remove: hide # "hide" = soft-delete (recoverable) | "delete" = permanentAll endpoints are rate-limited per authenticated user (API key ID for /v1/* routes, session user ID for /chat/* routes). The limit is a single string in flask-limiter notation (N per second/minute/hour):
rate_limiting:
limit: "30 per minute"
# storage_url: redis://localhost:6379/0 # optional; use Redis in multi-worker deploymentsBy default, limits are tracked in-memory (per-process). For multi-worker deployments (e.g. gunicorn with multiple workers), set storage_url to a shared Redis instance so limits are enforced across all workers. Changing storage_url requires a restart; changing limit takes effect within ~5 seconds (hot-reloaded).
Controls how SQLAlchemy manages database connections. The defaults (pool size 5, overflow 10) are fine for light use; increase them for production or high-concurrency workloads. Changes require a restart.
app:
db_pool:
pool_size: 20 # persistent connections kept open
max_overflow: 30 # burst connections allowed above pool_size
pool_timeout: 10 # seconds to wait for a free connection before returning an error
pool_recycle: 1800 # recycle connections after 30 min to avoid stale-connection errors
pool_pre_ping: true # test each connection before use; silently replaces stale onespool_size + max_overflow is the maximum number of simultaneous DB connections. For 50 concurrent requests, set these to at least 50 combined.