Skip to content

perf: eliminate N+1 query patterns causing slow Heroku response times #190

@ericfitz

Description

@ericfitz

Problem

TMI on Heroku is significantly slower than local development:

  • Login providers list: several seconds (sub-second locally)
  • Threat model sub-entity lazy-loading: 10-20 seconds
  • Threat model edit page appears to have missing sub-entities until they eventually load

Root cause: Excessive database round-trips. Locally each query is <1ms over Unix socket; on Heroku each crosses a network boundary at ~2-5ms per round-trip. The multiplier effect on N+1 query patterns makes this severe.

Analysis

Issue 1: N+1 Query Cascade on Threat Model Fetch (CRITICAL)

Every ThreatModelStore.Get() (database_store_gorm.go:123) calls convertToAPIModel() which issues separate queries for each sub-resource:

Step Queries Location
Main TM + preloads (Owner, CreatedBy, SecurityReviewer) 1+3 database_store_gorm.go:137
loadAuthorization() — 1 for access entries, then 1 per entry for user/group lookup 1 + N database_store_gorm.go:698-749
loadMetadata() 1 database_store_gorm.go:752
loadThreats() — 1 for threats, then 1 loadThreatMetadata() per threat 1 + N database_store_gorm.go:757-827
loadDiagramsDynamically() — 1 for IDs, then DiagramStore.Get() per diagram (multiple preloads each) 1 + N×M database_store_gorm.go:849-887

A threat model with 3 auth entries, 5 threats, and 2 diagrams = ~20-30 DB round-trips. At 3-5ms each on Heroku = 60-150ms per single GET.

Issue 2: ThreatModelMiddleware Loads Full Model on Every Sub-Resource Request (CRITICAL)

middleware.go:328 calls ThreatModelStore.Get(id) for every request to /threat_models/{id}/* — including sub-resource endpoints. The middleware only needs authorization data for access checks, but loads the entire threat model with all sub-resources.

When the client opens a threat model and lazy-loads 6 sub-entity types in parallel, the middleware alone triggers 6 full threat model loads = 120-180 DB round-trips before any handler code runs.

Issue 3: List Endpoint Loads Full Models + Count Queries (HIGH)

ListWithCounts() (database_store_gorm.go:297):

  1. Loads all matching threat models
  2. Calls convertToAPIModel() for each (triggering the N+1 cascade)
  3. Runs 6 additional COUNT queries per model (database_store_gorm.go:371-376)

For 10 threat models: ~200-300 DB queries to render the list page.

Issue 4: Cold Start / Login Providers (MODERATE)

First request after dyno wake-up is slow: cold connection pool, SAML metadata parsing, no cache warming.

Issue 5: No Caching of Authorization Data (MODERATE)

Cache service caches sub-resources but not threat model authorization data (the most frequently needed piece, checked by middleware on every request).

Remediation Plan

Phase 1 — Quick wins (major impact, lower effort)

  1. Lightweight authorization-only middleware query: Replace ThreatModelStore.Get() in middleware with GetAuthorization(id) that only loads threat_model_access with a single JOIN. Eliminates full model load from middleware (80-90% reduction in middleware queries).

  2. Cache authorization data in Redis: Cache auth list per threat model with short TTL (60-120s), invalidate on auth changes. Most sub-resource requests skip DB for auth entirely.

  3. Batch count queries in ListWithCounts: Replace 6 individual COUNT queries per model with single query using GROUP BY:

    SELECT threat_model_id, COUNT(*) FROM threats WHERE threat_model_id IN (?) GROUP BY threat_model_id

Phase 2 — Eliminate N+1 patterns (major impact, moderate effort)

  1. Batch user/group lookup in loadAuthorization: Replace per-entry WHERE internal_uuid = ? with single WHERE internal_uuid IN (?).

  2. Batch threat metadata loading: Replace per-threat loadThreatMetadata with single query for all threats.

  3. Batch diagram loading: Create DiagramStore.GetBatch(ids) instead of per-diagram DiagramStore.Get().

Phase 3 — Architectural improvements (moderate impact, higher effort)

  1. Separate List from Get models: List endpoint should return lightweight projection (name, status, owner, counts) without loading threats, diagrams, or metadata.

  2. Response caching for GET /threat_models/{id}: Cache full response in Redis with write-through invalidation.

  3. Connection pool tuning: Consider reducing maxOpenConns to 5 for Heroku's limited connection budget.

Expected Impact

Phases 1-2 should reduce typical page load times from 10-20s to 1-2s on Heroku by cutting DB round-trips from hundreds to single digits.

Related Files

  • api/database_store_gorm.go — Store implementation with N+1 patterns
  • api/middleware.go — ThreatModelMiddleware full-model load
  • api/cache_service.go — Cache service (missing auth caching)
  • auth/db/gorm.go — Connection pool configuration

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions