feat: API key schema isolation — database-level tenant separation#855
feat: API key schema isolation — database-level tenant separation#855salvormallow wants to merge 9 commits intovectorize-io:mainfrom
Conversation
Dashboard caveatWhen Root cause: Workaround: Set Longer-term: The dashboard should support multi-tenant awareness — a tenant selector that switches which API key is used for dataplane calls. I'm working on a follow-up PR for this. |
CI StatusFixed: Expected fork failures: The remaining ~15 failing jobs ( These tests would need a trusted CI re-run from a maintainer to pass. |
|
hey @salvormallow this feature is incomplete without the proper UI changes. can you make some minimal changs in the UI to switch tenant as you suggested? |
Sounds good, I'll add it this weekend. |
Adds ApiKeySchemaTenantExtension: maps API keys to isolated PostgreSQL schemas, providing database-level memory isolation between tenants. Threat model: prompt injection against AI agents. Agents execute tool calls based on conversation content. A prompt injection can trick an agent into querying another tenant's banks. Schema isolation scopes all SQL to the authenticated schema — banks from other schemas don't exist. Configuration: HINDSIGHT_API_TENANT_EXTENSION=...bank_scoped_tenant:ApiKeySchemaTenantExtension HINDSIGHT_API_TENANT_KEY_MAP=key_a:schema_a;key_b:schema_b Follows the SupabaseTenantExtension pattern. Opt-in, zero breaking changes. Includes 20 tests.
Replaces singleton HINDSIGHT_CP_DATAPLANE_API_KEY with factory pattern. Supports HINDSIGHT_CP_TENANT_KEY_MAP=key:name;key:name for multi-tenant. Backwards-compatible: single key still works via default export. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add tenant-aware BFF: all API routes read ?tenant= param and use getClientForTenant() instead of singleton lowLevelClient - Add /api/tenants route for tenant discovery - Add TenantContext + TenantProvider for client-side tenant state - ControlPlaneClient.fetchApi() auto-appends ?tenant= to all requests - Tenant selector dropdown in header (hidden in single-tenant mode) - BankProvider re-fetches banks and resets selection on tenant change - Backwards-compatible: HINDSIGHT_CP_DATAPLANE_API_KEY still works - New env var: HINDSIGHT_CP_TENANT_KEY_MAP=key:name;key:name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix uploadFiles() bypassing fetchApi and missing ?tenant= param - Remove unused sdk import from list/route.ts - Guard BankProvider loadBanks() until tenant is resolved (avoid double-load) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix silent cross-tenant data leakage in getClientForTenant() — invalid tenant names now throw in multi-tenant mode instead of silently falling back to the first tenant. Fix race condition in bank-context.tsx where rapid tenant switches could interleave bank list responses, showing banks from the wrong tenant. Uses a monotonic load ID to discard stale responses. Add Playwright e2e test suite (18 tests) covering tenant discovery, switching, bank loading, navigation, and cross-tenant isolation. Includes an mTLS proxy for testing against prod deployments behind mutual TLS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The control plane now falls back to HINDSIGHT_API_TENANT_KEY_MAP when HINDSIGHT_CP_TENANT_KEY_MAP is not set. Operators no longer need to duplicate API keys across two env vars — both the API and dashboard read from the same source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix ruff import sort and formatting in test_bank_scoped.py - Remove unused deprecated hindsightClient/lowLevelClient exports - Strip tenant query param before forwarding to dataplane in audit-logs routes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d02b668 to
b89b101
Compare
- Remove Playwright e2e tests, config, and mTLS proxy (manual-run only, not suitable for upstream CI) - Remove HINDSIGHT_CP_TENANT_KEY_MAP — dashboard reads HINDSIGHT_API_TENANT_KEY_MAP directly (one key map for both) - Update .env.example to document the consolidated config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dashboard changes completeThe multi-tenant dashboard UI is implemented, fully tested, and ready for review. PR description has been updated with the design, architecture, and test plan. What's included:
Testing:
|
Summary
Adds
ApiKeySchemaTenantExtension— a built-in tenant extension that maps API keys to isolated PostgreSQL schemas, providing database-level memory isolation between tenants. Follows the same pattern asSupabaseTenantExtensionbut uses static API key mapping instead of JWT auth.Threat model: prompt injection against AI agents
AI agents execute tool calls — including Hindsight
recall,retain, andreflect— based on conversation content. A prompt injection delivered via chat message, email, or web search result can trick an agent into querying another tenant's memory banks.Example attack:
hindsight recall --bank tenant-b-bank --query "private data"Why application-layer bank filtering isn't enough:
RequestContext.allowed_bank_idsexists on the model but is not enforced by the engine. AnOperationValidatorExtensioncould check it, but:allowed_bank_idsisNone(the default), all access is grantedallowed_bank_idsis never set for background tasksWhy schema isolation works:
The API key determines the PostgreSQL schema at authentication time, before any bank lookup or query executes. The SQL itself is scoped via fully-qualified table names. Even a fully compromised agent can only access banks within its assigned schema. Banks from other schemas don't exist in its view of the database.
How it works
SupabaseTenantExtension)Configuration
Design decisions
Opt-in, zero breaking changes. If
HINDSIGHT_API_TENANT_EXTENSIONis not set, Hindsight usesDefaultTenantExtension— identical to current behavior. Existing deployments are unaffected.One key = one schema. Each API key maps to exactly one PostgreSQL schema. A single key cannot access multiple schemas. This is intentional: one key = one blast radius. The
TenantContextreturns a singleschema_name, and the engine scopes all queries to it. Cross-schema queries are not possible without direct Postgres access.Admin access. There is no "superuser key" that spans all schemas. Operators who need cross-tenant visibility should query Postgres directly or use separate keys per schema. This is a conscious trade-off: admin convenience vs. the guarantee that no single compromised key grants access to all tenants.
MCP auth disabled = default schema only. When
mcp_auth_disabled=true, MCP requests fall back to the default schema (fromHINDSIGHT_API_DATABASE_SCHEMA), not a tenant schema.Schema name validation. Schema names must be valid Postgres identifiers (letters, digits, underscores). Hyphens, spaces, and names starting with digits are rejected at startup.
Why not
allowed_bank_ids+OperationValidatorExtension? See threat model above. Application-layer checks are defense-in-depth, not a security boundary. Schema isolation moves the enforcement into the database where it can't be bypassed by missed code paths.Files changed
hindsight-api-slim/.../builtin/bank_scoped_tenant.pyApiKeySchemaTenantExtension(~170 lines)hindsight-api-slim/tests/test_bank_scoped.pyTest plan
run_migrationworks on first authenticated requestpublicschema reads existing data correctlyDashboard: multi-tenant support
User-facing behavior
When
HINDSIGHT_API_TENANT_KEY_MAPis configured, a tenant selector dropdown appears in the top bar next to the bank selector. Selecting a tenant scopes all dashboard operations — bank listing, recall, reflect, documents, entities, configuration — to that tenant's schema. The selection persists across page navigations via localStorage.When the key map is not set, the dashboard behaves identically to before — no tenant selector, single-key mode, zero breaking changes.
Architecture
The control plane never talks to the dataplane without tenant scoping. The design has three layers:
Server layer (
hindsight-client.ts): A tenant-aware client factory replaces the old singleton.getClientForTenant(name)returns cached SDK clients configured with that tenant's API key. The key map is read fromHINDSIGHT_API_TENANT_KEY_MAP— the same env var the API server uses, so operators configure it once. Unknown tenant names throw in multi-tenant mode — fail-closed, not fail-open.API route layer (~35 routes): Every Next.js API route extracts
?tenant=from the query string, callsgetClientForTenant(tenant), and uses the returned scoped client. The tenant param is consumed by the control plane and not forwarded to the dataplane — tenant identity is carried in theAuthorizationheader, not the URL.Browser layer (
tenant-context.tsx→api.ts→bank-context.tsx):TenantProviderloads tenant names from/api/tenantson mount, restores the saved selection from localStorage, and callsclient.setTenant(). TheControlPlaneClientsingleton auto-appends?tenant=to everyfetchApi()call.BankProviderwatches the current tenant and resets the bank list when it changes.Dashboard files changed
hindsight-control-plane/src/lib/hindsight-client.tshindsight-control-plane/src/lib/tenant-context.tsxhindsight-control-plane/src/lib/bank-context.tsxhindsight-control-plane/src/lib/api.ts?tenant=to all API callshindsight-control-plane/src/components/bank-selector.tsxhindsight-control-plane/src/app/layout.tsxhindsight-control-plane/src/app/api/tenants/route.tshindsight-control-plane/src/app/api/*/route.ts.env.exampleDashboard test plan
?tenant=parameter