-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Summary
Replace the worker's single workloop with independent per-group workloops.
Each slot group gets its own workloop that claims with its queue preferences,
tracks its own capacity, and backs off independently.
Background
Currently the worker has one workloop that calls claim() with
{ demand: 1, worker_name }. After this change, each slot group runs its own
workloop that sends { demand: 1, worker_name, queues: [...] }.
A worker configured with --queues "fast_lane:1 manual,*:4" will have:
- One workloop claiming
["fast_lane"]with 1 slot - One workloop claiming
["manual", "*"]with 4 slots
Per-group capacity tracking
Each slot group independently checks its own capacity:
if (group.activeRuns + group.pendingClaims >= group.maxSlots) → stop workloop
The global app.workflows map continues to track all active runs. The
per-group tracking determines which group's workloop to resume when a run
completes.
When a run is claimed by a slot group, it is associated with that group for
its lifetime. On completion, the group's activeRuns decrements and its
workloop resumes if it was at capacity.
Claim protocol
Each workloop sends its slot group's queue preference in the claim message:
channel.push("claim", {
demand: 1,
worker_name: "pod-1",
queues: ["fast_lane"] // pinned slot
})
channel.push("claim", {
demand: 1,
worker_name: "pod-1",
queues: ["manual", "*"] // preference slot
})Lightning interprets this as:
- No
*→ filter mode: only return runs from the named queues - With
*→ preference mode: all runs, ordered by preference position
If Lightning hasn't been updated yet, it will ignore the queues field and
return runs as before (Phoenix channels ignore unknown payload keys).
work-available handling
When Lightning pushes work-available, attempt to claim for each slot
group that has free capacity. This means a single work-available event
may trigger multiple claim requests (one per group with free slots).
This is acceptable because:
- Most deployments will have 2-3 slot groups at most
- Claim requests are lightweight channel pushes
- Lightning handles concurrent claims via
FOR UPDATE SKIP LOCKED
Join payload
Update the channel join to include queue configuration:
channel.join("worker:queue", {
capacity: 5,
queues: { "fast_lane": 1, "manual,*": 4 } // informational
})The queues map is informational — Lightning doesn't use it for routing in
this release. It enables future presence-based queue tracking. capacity
remains the total for backward compatibility.
Backoff considerations
Fast-lane slots with strict pinning (["fast_lane"], no wildcard) will
frequently get empty responses when there's no sync work. The existing backoff
(1s min → 10s max) handles this fine because:
work-availablepushes bypass backoff for immediate claiming- A 1-10s backoff on an idle fast-lane slot has negligible cost
Things to watch out for
- The refactor of server.ts / workloop management is the most complex part —
the single workloop assumption is likely baked into several places - Make sure
resumeWorklooponly resumes the correct group's workloop, not all - The run-to-group association needs to survive the full run lifecycle
- Test the edge case where a fast_lane group is idle (getting empty claims)
while the default group is fully busy — they should not interfere with each
other
Testing
- Each slot group runs its own independent workloop
- A slot group at capacity stops its workloop
- Run completion resumes the correct group's workloop (not all groups)
- Claim messages include the correct
queuesarray per slot group -
work-availabletriggers claims on all groups with free slots - Fast_lane group backs off independently when getting empty responses
- Default group claims normally while fast_lane is idle
- Join payload includes
queuesmap - Backward compat: single
manual,*:5group behaves like old capacity-5
Metadata
Metadata
Assignees
Labels
Type
Projects
Status