Skip to content

feat(runs): pause/resume/cancel + unified status primitives + notification center#345

Merged
AbirAbbas merged 9 commits intomainfrom
feat/runs-cancel-pause
Apr 7, 2026
Merged

feat(runs): pause/resume/cancel + unified status primitives + notification center#345
AbirAbbas merged 9 commits intomainfrom
feat/runs-cancel-pause

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

Summary

Adds pause / resume / cancel lifecycle controls for workflow runs across the runs table, run-detail page, and a new bulk action bar — together with a notification center in the top bar and a unified set of status primitives so every status pixel in the app reads from one source of truth (getStatusTheme() in utils/status.ts).

The branch also honestly represents the asymmetry of the cancel pathway: in-flight HTTP calls to agents can't be killed, so root_execution_status is exposed separately from the children-aggregated status and the UI uses it for the kebab, dot, and live-elapsed decisions.

Lifecycle controls

  • Per-row kebab on the runs table — Pause / Resume / Cancel via RunLifecycleMenu with AlertDialog confirmation, gated on root status.
  • Bulk action bar at the bottom of the runs table — multi-select Pause / Resume / Cancel using Promise.allSettled so partial failures don't block the rest.
  • Run detail page header cluster — same lifecycle actions inline; cancel remains available for pending / queued / waiting roots, not just running.
  • Cancellation strip on the detail page — when the root is cancelled but children are still draining, an honest muted strip shows "N nodes still finishing".

Notification center

  • Bell + popover lives in the sticky top bar (next to the ⌘K hint) with (N) unread prefix on document.title.
  • Compact tree grouped by run, collapsed by default, one-line latest-event summary, hover tooltip for full message.
  • Semantic icons per event kind — pause / resume / cancel / error / complete / start / info, no more "everything is a green check".
  • Sonner toasts for transient notifications with neutral cards + thin type-tinted left borders (no richColors flooding).
  • useRunNotification(opts) with an eventKind parameter; persistent log capped at 50 entries.

Unified status primitives

  • StatusTheme interface in utils/status.ts extended with icon: LucideIcon and motion: "none" | "live".
  • StatusDot, StatusIcon, StatusPill primitives in components/ui/status-pill.tsx — all read from getStatusTheme(), no hardcoded colors anywhere.
  • StatusDot has a motion-safe:animate-ping halo for live motion statuses, frozen otherwise.
  • Badge swapped Phosphor → Lucide and now derives shouldSpin from getStatusTheme(canonical).motion === "live" instead of hardcoding variant === "running".
  • WorkflowNode deletes its duplicate STATUS_TONE_TOKEN_MAP and switch-based getStatusIcon and routes through the same theme.

Backend

  • WorkflowRunSummary now carries RootExecutionStatus populated from agg.RootStatus.
  • RunSummaryAggregation adds RootStatus *string; QueryRunSummaries and getRunAggregation select and normalize paused_count + root_status.
  • deriveStatusFromCounts adds an explicit paused branch before the succeeded fallback so all-paused runs no longer collapse. Terminal check excludes paused, so completed_at / duration_ms stay nil.
  • Run trace and DAG graph now desaturate child bars/nodes when the root is terminal but children are still running, so the visual matches the data model.

Live elapsed scaling

  • RunsPage cells use an adaptive liveTickIntervalMs(ageMs) (1s → 5s → 30s → frozen past 1h) so visible-row timers don't pin a fan.
  • NewDashboardPage duration cell uses isTerminalStatus(effectiveStatus) instead of aggregate run.terminal, so cancelled/paused roots freeze the timer even while children drain.

Multi-pass review

A 6-agent parallel review (code-quality, security, go-backend, frontend-a11y, performance, architecture) was run via Codex CLI + Gemini CLI in subprocesses. Surfaced 21 findings: 0 critical, 3 high, 13 medium, 5 low, with 0 medium-or-higher security issues.

Fixed in commit 47d5a90:

  • H1 workflow_runs.go — paused branch in deriveStatusFromCounts
  • H2 RunsPageStatusMenuDot uses StatusDot; filteredRuns uses root_execution_status ?? r.status
  • H3 WorkflowNode — duplicate status map deleted, routes through getStatusTheme()
  • M1+M2 badge.tsx — Lucide icons + theme-driven motion
  • M3+M4 RunDetailPagestatusVariant removed, header uses StatusPill
  • M5 RunsPage filteredRuns — root-effective status fallback
  • M6 NewDashboardPageisTerminalStatus(effective) instead of run.terminal
  • M7 RunDetailPage lifecycle cluster — Cancel available for non-terminal non-running roots
  • M8 NotificationBell — removed nested role="button" over a real <button>
  • M9 RunLifecycleMenu — kebab opacity /40/70 for WCAG contrast
  • M10 status-pill StatusDotrole="img" + aria-label when label is hidden
  • L2 NotificationBell — timestamp text-[10px]/80text-[11px] text-muted-foreground
  • L3 RunLifecycleMenuaria-busy={isPending} on the trigger

Consciously deferred:

  • M11 execution_records.go ranked-root CTE — medium-confidence edge case; current MAX(CASE ...) projection is correct under the one-root-per-run invariant. Will revisit if multi-root data shows up.
  • L1 / L4 / L5selectedRuns Map memo, WorkflowDAGLightweightResponse typing, nested-button a11y note. Non-blocking polish.

Verification

  • npx tsc --noEmit — clean (exit 0)
  • go build ./... — clean (exit 0)
  • go vet ./... — clean (exit 0)
  • No new lint errors in any touched file (3 remaining warnings are pre-existing on unchanged lines)
  • Manual smoke-tested via demo workflow harness (/tmp/af_demo_runs.py) firing 7 mixed multi-node runs (sequential, parallel, nested pipelines)

Test plan

  • Open /runs, fire several long runs, confirm per-row kebab → Pause / Resume / Cancel each show confirm dialogs and dispatch correctly
  • Multi-select 3+ rows, exercise the bulk action bar for each lifecycle action, confirm partial failures surface in toasts
  • Cancel a multi-node run while children are mid-flight, confirm:
    • the row's status dot flips to cancelled tone
    • the live duration freezes
    • the run-detail page shows the muted "N nodes still finishing" strip
    • per-node graph still spins for nodes that haven't yielded
  • Pause a run, confirm the runs table, dashboard, and run-detail badge all show paused (not succeeded), completed_at stays empty
  • Open the bell, fire a mix of start / pause / resume / cancel / error / complete events, confirm semantic icons render and (N) appears in the tab title
  • Tab through the kebab + bell with keyboard only, confirm focus rings + aria-busy announce correctly
  • Resize to mobile, confirm the bulk action bar and lifecycle cluster don't overflow

🤖 Generated with Claude Code

santoshkumarradha and others added 7 commits April 7, 2026 10:15
- Route RunTrace StatusDot/TraceRow colors through getStatusTheme
- Add strikethrough to cancelled reasoner labels in waterfall
- Add cancelled/paused cases to FloatingEdge and EnhancedEdge
- Fix DAG node component to honor own-status (not parent) for color
- No hardcoded colors; everything routed through existing theme system
Replaces the toast-only notification system with a dual-mode center:

- Persistent in-session log backing the new NotificationBell (sidebar
  header, next to ModeToggle). Shows an unread count Badge and opens a
  shadcn Popover with the full notification history, mark-read,
  mark-all-read, and clear-all controls.
- Transient bottom-right toasts continue to fire for live feedback and
  auto-dismiss on their existing schedule; dismissing a toast no longer
  removes it from the log.
- <NotificationProvider> mounted globally in App.tsx so any page can
  surface notifications without local wiring.
- Cleaned up NotificationToastItem styling to use theme-consistent
  tokens (left accent border per type, shadcn Card/Button) instead of
  hardcoded tailwind color classes.
- Existing useSuccess/Error/Info/WarningNotification hook signatures
  preserved — no downstream caller changes required.
Adds full lifecycle controls to the runs index page:

- Per-row kebab (MoreHorizontal) DropdownMenu with Pause / Resume /
  Cancel items, shown based on each run's status. Muted at rest,
  brightens on row hover via a group/run-row selector so it stays
  discoverable without adding visual noise. Cancel opens an AlertDialog
  with honest copy explaining that in-flight nodes finish their current
  step and their output is discarded.
- New RunLifecycleMenu component in components/runs/ centralises the
  menu, dialog, and the shared CANCEL_RUN_COPY constants so the bulk
  bar can mirror the exact same language.
- Bulk bar (shown when >=1 row is selected) upgraded from a single
  "Cancel running" button to Pause / Resume / Cancel alongside the
  existing Compare selected action. Buttons enable only when at least
  one selected row is eligible. A single shared AlertDialog with
  count-aware title confirms bulk cancels.
- Bulk mutations fire via Promise.allSettled and emit one summary
  notification — success, partial failure ("4 of 5 cancelled — 1 could
  not be stopped"), or full failure.
- Per-row spinner via pendingIds Set so each row reflects its own
  mutation state independently of the mutation hook's global isPending.
- Replay of existing success/error notifications via the global
  notification provider — no new toast plumbing.
… strip

- Replace the lone Cancel button in the run detail header with a full
  Pause / Resume / Cancel lifecycle cluster matching the h-8 text-xs
  sizing and outline/destructive variants used elsewhere in the header.
  All three share a single lifecycleBusy flag so mutations are
  serialized and the active control renders a spinner (Activity icon).
- Cancel opens a shadcn AlertDialog that reuses the CANCEL_RUN_COPY
  constants from the runs table, so the dialog body language is
  identical across single-run and bulk confirmation flows.
- Success and error surfaces through the global notification provider
  via useSuccessNotification / useErrorNotification — no local toast.
- Add a muted "Cancellation registered" info strip that renders only
  when the run is in the cancelled state AND at least one child node
  is still reporting running. Copy makes the asymmetry explicit:
  "No new nodes will start; their output will be discarded." The strip
  disappears naturally once every node reaches a terminal state via
  react-query refetch / SSE.
…ent liveness

Cross-cutting UX pass addressing multiple issues from rapid review:

Backend
- Expose RootExecutionStatus in WorkflowRunSummary so the UI can reflect
  what the user actually controls (the root execution) instead of the
  children-aggregated status, which lies in the presence of in-flight
  stragglers after a pause or cancel.
- Add paused_count to the run summary SQL aggregation and root_status
  column so both ListWorkflowRuns and getRunAggregation populate it.
- Normalise root status via types.NormalizeExecutionStatus on the way
  out so downstream consumers see canonical values.

Unified status primitives (web)
- Extend StatusTheme in utils/status.ts with `icon: LucideIcon` and
  `motion: "none" | "live"`. Single source of truth for glyph and motion
  per canonical status.
- Rebuild components/ui/status-pill.tsx into three shared primitives —
  StatusDot, StatusIcon, StatusPill — each deriving colour/glyph/motion
  from getStatusTheme(). Running statuses get a pinging halo on dots
  and a slow (2.5s) spin on icons.
- Replace inline StatusDot implementations in RunsPage and RunTrace
  with the shared primitive. Badge "running" variant auto-spins its
  icon via the same theme.

Runs table liveness
- RunsPage kebab + StatusDot + DurationCell + bulk bar eligibility all
  key on `root_execution_status ?? status`. Paused/cancelled rows stop
  ticking immediately even when aggregate stays running.
- Adaptive tick intervals: 1s under 1m, 5s under 5m, 30s under 1h,
  frozen past 1h. Duration format drops seconds after 5 min. Motion
  is proportional to information; no more 19m runs counting seconds.

Run detail page
- Lifecycle cluster (Pause/Resume/Cancel) uses root execution status
  from the DAG timeline instead of the aggregated workflow status.
- Status badge at the top reflects the root status.
- "Cancellation registered" info strip also recognises paused-with-
  running-children and adjusts copy.
- RunTrace receives rootStatus; child rows whose own status is still
  running but whose root is terminal render desaturated with motion
  suppressed — honest depiction of abandoned stragglers.

Dashboard
- partitionDashboardRuns active/terminal split now uses
  root_execution_status so a timed-out run with stale children no
  longer appears in "Active runs".
- All RunStatusBadge call sites pass the effective status.

Notification center — compact tree, semantic icons
- Add NotificationEventKind (pause/resume/cancel/error/complete/start/
  info) driving a dedicated icon + accent map. Pause uses PauseCircle
  amber, Resume PlayCircle emerald, Cancel Ban muted, Error
  AlertTriangle destructive. No more universal green checkmark.
- Sonner toasts now pass a custom icon element so the glyph matches
  the bell popover; richColors removed for a quiet neutral card with
  only a thin type-tinted left border.
- Bell popover redesigned as a collapsed-by-default run tree: each run
  group shows one header line + one latest-event summary (~44px);
  expand via chevron to see the full timeline with a connector line
  on the left. Event rows are single-line with hover tooltip for the
  full message, hover-reveal dismiss ×, and compact timestamps
  ("now", "2m", "3h").
- useRunNotification accepts an eventKind parameter; RunsPage and
  RunDetailPage handlers pass explicit kinds.
- Replace Radix ScrollArea inside the popover with a plain overflow
  div — Radix was eating wheel events.
- Fix "View run" navigation: Link uses `to={`/runs/${runId}`}`
  directly (no href string manipulation) so basename=/ui prepends
  properly. Sonner toast action builds the URL from VITE_BASE_PATH.

Top bar + layout
- Move NotificationBell from the sidebar header to the main content
  top bar, next to the ⌘K hint. Sidebar header is back to just logo
  + ModeToggle.
- Constrain SidebarProvider to h-svh overflow-hidden so the inner
  content div is the scroll container — top header stays pinned at
  the viewport top without needing a sticky hack.
- NotificationProvider reflects unreadCount in the browser tab title
  as "(N) …" so notifications surface in the Chrome tab when the
  window is unfocused.

Dependencies
- Add sonner ^2.0.7 for standard shadcn toasts.
Backend
- H1 deriveStatusFromCounts: add explicit paused branch before succeeded
  fallback so all-paused runs no longer collapse to succeeded. Terminal
  check already excludes paused, so completed_at/duration_ms stay nil.

Frontend — single source of truth via getStatusTheme()
- H3 WorkflowNode: delete duplicate STATUS_TONE_TOKEN_MAP and switch-based
  getStatusIcon; route icon, color, motion through getStatusTheme().
- H2 RunsPage: StatusMenuDot delegates to <StatusDot/> instead of
  hardcoding bg-green/red/blue.
- M3+M4 RunDetailPage: statusVariant helper removed; header now uses
  <StatusPill/> for unified status visual.
- M1+M2 badge: swap Phosphor → Lucide icons; derive spin from
  StatusTheme.motion === "live" via variantToCanonical map instead of
  hardcoding `variant === "running"`.

Frontend — root-effective status consistency
- M5 RunsPage filteredRuns: filter on root_execution_status ?? r.status
  so client-side filter agrees with the dot.
- M6 NewDashboardPage: duration cell uses isTerminalStatus(effective)
  instead of aggregate run.terminal so cancelled/paused roots freeze the
  timer even while children drain.
- M7 RunDetailPage lifecycle cluster: render for any non-terminal root
  (Cancel now available for pending/queued/waiting), Pause/Resume still
  gated on running/paused.

Frontend — accessibility + contrast
- M8 NotificationBell: remove nested role="button" + key handler on
  NotificationRow (was wrapping a real <button>), drop ambiguous tree
  semantics.
- L2 NotificationBell: bump timestamp from text-[10px]/80 to text-[11px]
  text-muted-foreground to clear WCAG AA contrast for small metadata.
- M9 RunLifecycleMenu: kebab opacity text-muted-foreground/40 → /70 to
  meet contrast at rest; group-hover lifts to text-foreground.
- L3 RunLifecycleMenu: add aria-busy={isPending} on the trigger so AT
  hears the in-flight state, not just visually.
- M10 status-pill StatusDot: add role="img" + aria-label when label is
  hidden so the dot is not skipped by screen readers.

Deferred (recorded for follow-up, out of scope for this pass)
- M11 execution_records.go ranked-root CTE: medium-confidence edge case;
  current MAX(CASE ...) projection is correct under one-root-per-run
  invariant. Will revisit if multi-root data shows up in practice.
- L1/L4/L5: selectedRuns Map memo, WorkflowDAGLightweightResponse type
  field, nested-button a11y note — non-blocking polish.

Verified: tsc clean, go build clean, go vet clean, no new lint errors
in touched files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@santoshkumarradha santoshkumarradha requested review from a team and AbirAbbas as code owners April 7, 2026 07:04
AbirAbbas and others added 2 commits April 7, 2026 13:16
Prevents Hypothesis test framework cache files from being tracked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pause, resume, and cancel handlers required a workflow_executions
entry to exist, but simple async single-node executions only create
rows in the executions table. This caused a 404 for all lifecycle
actions in local mode.

Make the workflow_executions lookup non-fatal: use it when available
for UpdateWorkflowExecution and event metadata, otherwise fall back
to the execution record's RunID.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas added this pull request to the merge queue Apr 7, 2026
Merged via the queue into main with commit 3b8d302 Apr 7, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants