Skip to content

feat: Fast Lanes — dedicated worker capacity for named queues#4515

Open
stuartc wants to merge 11 commits intomainfrom
fastlanes
Open

feat: Fast Lanes — dedicated worker capacity for named queues#4515
stuartc wants to merge 11 commits intomainfrom
fastlanes

Conversation

@stuartc
Copy link
Member

@stuartc stuartc commented Mar 9, 2026

Description

This PR adds Fast Lanes support to Lightning — dedicated worker capacity for
named queues, guaranteeing low-latency execution for sync workloads.

Closes #4498

This is a tracking PR that collects the Lightning-side implementation:

Related worker-side issues:

What's included

Data model (#4500)

  • runs.queue string column (NOT NULL, default "default")
  • Composite partial index on (state, queue, inserted_at) for active runs
  • Queue assignment at run creation:
    • Sync webhook (webhook_reply: :after_completion) → "fast_lane"
    • All other triggers (webhook/cron/kafka) → "default"
    • Manual runs and retries → "manual"
  • Validation restricts queue to default, fast_lane, or manual

Claim path (#4501)

  • claim message gains a queues parameter (ordered preference chain)
  • Filter mode (no *): strict pinning, only named queues returned
  • Preference mode (with *): all queues eligible, named ones prioritized
  • Backward compatible: omitting queues defaults to ["manual", "*"]
  • Queue filtering/ordering in shared Queue.claim/4 benefits both FifoRunQueue and RoundRobinQueue
  • sanitise_queues/1 validates input and falls back to defaults for malformed payloads

Validation steps

  1. Run mix ecto.reset to apply the split migrations cleanly
  2. mix test — all tests pass
  3. Verify sync webhook triggers produce runs with queue: "fast_lane"
  4. Verify cron/async webhook triggers produce runs with queue: "default"
  5. Verify manual runs and retries produce runs with queue: "manual"
  6. Verify filter mode: a worker claiming with ["fast_lane"] only receives fast_lane runs
  7. Verify preference mode: a worker claiming with ["manual", "*"] gets manual runs first, then FIFO
  8. Verify backward compat: a worker claiming without queues gets the default ["manual", "*"] behavior

Additional notes for the reviewer

  1. Phase 2 cleanup (removing the priority column, Remove priority column from runs table #4502) is out of scope
    and will happen after Fast Lanes is deployed and stable
  2. The priority column is kept and still written for manual/retry runs
    (:immediate) as a backward-compat safety net during rollout
  3. The implementation uses COALESCE(array_position(...), wildcard_pos) instead
    of CASE WHEN for queue preference ordering — semantically equivalent but
    cleaner for variable-length preference chains
  4. RoundRobinQueue (Thunderbolt) changes are out of scope — that's thunderbolt#627,
    which this PR unblocks

AI Usage

  • I have used Claude Code
  • I have used another model
  • I have not used AI

Pre-submission checklist

  • I have performed an AI review of my code (we recommend using /review
    with Claude Code)
  • I have implemented and tested all related authorization policies.
    (e.g., :owner, :admin, :editor, :viewer)
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

@github-project-automation github-project-automation bot moved this to New Issues in Core Mar 9, 2026
@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 96.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.56%. Comparing base (d8eca73) to head (efd00df).

Files with missing lines Patch % Lines
lib/lightning/runs/queue.ex 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4515      +/-   ##
==========================================
+ Coverage   89.54%   89.56%   +0.01%     
==========================================
  Files         425      425              
  Lines       20307    20325      +18     
==========================================
+ Hits        18184    18204      +20     
+ Misses       2123     2121       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stuartc stuartc marked this pull request as ready for review March 12, 2026 13:59
stuartc added 6 commits March 18, 2026 08:35
* feat: add queue column to runs table for Fast Lanes support (#4500)

Add a `queue` VARCHAR column to the `runs` table (NOT NULL, default "default")
with a composite partial index on (state, queue, inserted_at) for active runs.

Queue assignment rules:
- Webhook triggers with webhook_reply: :after_completion → "fast_lane"
- All other triggers (webhook/cron/kafka) → "default"
- Manual runs and retries → "manual"

* refactor: split migration, move queue routing to WorkOrders

Split the combined add-column + concurrent-index migration into two
separate migrations (transactional and non-transactional). Move queue
determination logic from Run.for(%Trigger{}) into WorkOrders.build_for
where all other routing decisions live. Unify both Run.for/2 clauses
to use put_if_provided/3. Add tests for invalid queue rejection.
* feat: add queue column to runs table for Fast Lanes support (#4500)

Add a `queue` VARCHAR column to the `runs` table (NOT NULL, default "default")
with a composite partial index on (state, queue, inserted_at) for active runs.

Queue assignment rules:
- Webhook triggers with webhook_reply: :after_completion → "fast_lane"
- All other triggers (webhook/cron/kafka) → "default"
- Manual runs and retries → "manual"

* refactor: split migration, move queue routing to WorkOrders

Split the combined add-column + concurrent-index migration into two
separate migrations (transactional and non-transactional). Move queue
determination logic from Run.for(%Trigger{}) into WorkOrders.build_for
where all other routing decisions live. Unify both Run.for/2 clauses
to use put_if_provided/3. Add tests for invalid queue rejection.

* feat: thread queue preferences through the claim path (#4501)

Workers can now pass a `queues` parameter in their claim request to
filter or prioritize runs by queue. Filter mode (no wildcard) returns
only runs from named queues. Preference mode (with `*`) orders named
queues first while still returning all runs. Defaults to ["manual", "*"]
for backward compatibility.

* test: add queue filtering and preference tests for claim path

Cover filter mode, preference mode, default behavior, and malformed
input fallback across Queue, FifoRunQueue, Runs, Services.RunQueue,
and WorkerChannel. Update existing claim tests to new 3-arity signatures.

* chore: fix worker_name typespec and remove dead test code
Runs on the fast_lane queue now bypass project concurrency checks
in the claim query, ensuring sync webhook responses are never
blocked by in-progress cron/async runs.
Queue.claim was stripping all ORDER BY clauses before applying queue
preference ordering. This broke Thunderbolt's round-robin scheduler
which relies on project_id ordering in the base query. Now saves
and restores the caller's ordering after queue preference.
Map priority 0 (immediate) to "manual" and priority 1 (normal) to
"default" instead of setting all rows to "default".
….22.1

Update @openfn/ws-worker from 1.17.0 to 1.22.1, which includes the
--workloops CLI option for splitting worker capacity into independent
slot groups with queue preferences (e.g., "fast_lane:1 manual>*:4").

RuntimeManager changes:
- Add workloops field to Config struct, passed as --workloops flag
- Remove default capacity (nil instead of 5) so the worker uses its
  own defaults unless explicitly configured
- Add WORKER_WORKLOOPS env var to bootstrap config
- capacity and workloops guards only emit when set; if both are
  configured the worker itself reports the conflict
@stuartc stuartc requested a review from taylordowns2000 March 18, 2026 13:39
stuartc and others added 4 commits March 18, 2026 15:44
Replace case expression with function clause pattern matching to bring
complexity under Credo's threshold.
What changed in lib/lightning/work_orders.ex:

  - retry/3 → passes Keyword.get(opts, :queue, "manual") to enqueue_retry, so single retries default to "manual"
  - retry_many([%RunStep{}], opts) → injects queue: "default" into opts before calling individual retry() calls (bulk retry from a specific job)
  - retry_many([%WorkOrder{}], opts) → unchanged; it enqueues Oban jobs which call enqueue_many_for_retry with default "default"
  - enqueue_many_for_retry/3 → accepts optional queue param (default "default") for bulk retries from start
  - enqueue_retry/3 and new_retry_run/8 — now accept and thread through the queue value

  Result:

  ┌────────────────────────┬───────────┐
  │          Path          │   Queue   │
  ├────────────────────────┼───────────┤
  │ Manual run (click Run) │ "manual"  │
  ├────────────────────────┼───────────┤
  │ Single retry           │ "manual"  │
  ├────────────────────────┼───────────┤
  │ Bulk retry from start  │ "default" │
  ├────────────────────────┼───────────┤
  │ Bulk retry from job    │ "default" │
  └────────────────────────┴───────────┘
Copy link
Member

@taylordowns2000 taylordowns2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's great. thank you. i've made a small change (bulk retries shouldn't get same priority as manual runs) which you're welcome to revert if it's more complicated than i thought. otherwise please merge at your leisure. i'm now moving on to the TB review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

Fast Lanes for Sync Workflows

2 participants