Add dual-segment progress bar for in-progress vs completed rollouts#901
Add dual-segment progress bar for in-progress vs completed rollouts#901mikasenghaas wants to merge 11 commits intomainfrom
Conversation
The progress bar now visually distinguishes between completed rollouts (green) and actively executing rollouts (amber), making it easy to see concurrency at a glance. The text also shows an "active" count when rollouts are in flight. Changes: - Add on_acquire callback to with_sem for tracking when rollouts start executing - Add RolloutStartCallback type and wire it through generate/evaluate/run_evaluation - Replace BarColumn with custom DualBarColumn showing three segments: completed (green), in-progress (amber), remaining (gray) - Track started count in EnvEvalState to derive in-progress = started - completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In grouped scoring, each task covers multiple rollouts (rollouts_per_example). The on_acquire callback was incrementing started_count by 1 per group, but progress counts individual rollouts, causing in_progress = started - progress to be wrong. Now uses _make_on_acquire(len(group_input)) so started_count tracks rollouts consistently. Also removes "active" text from progress bar. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Initialize started_count to len(builder.outputs) so it matches the resumed progress count. Without this, in_progress = started - progress is clamped to 0 until started_count catches up to the resumed count, making the amber bar segment invisible for resumed evals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _DualBar now subclasses ProgressBar, inheriting __rich_measure__
and pulse animation instead of reimplementing them
- Render via Segment (like ProgressBar) instead of Text with appends
- Half-char precision using half-bar chars for smoother transitions
- Use Rich theme styles ("bar.complete", "bar.back") as defaults
instead of hardcoded RGB for completed and remaining segments
- DualBarColumn follows BarColumn's constructor pattern with
configurable styles
- Handle ASCII fallback and console.no_color like ProgressBar does
- Fix text truncation by not overriding default table_column
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define COLOR_PENDING/RUNNING/COMPLETED/FAILED as module-level constants and use them for both panel border styles and progress bar segments. Completed bar is now green, in-progress is yellow — matching the border colors for completed and running states respectively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass Column(no_wrap=True, ratio=2) as the default table_column, matching what Rich's BarColumn uses when bar_width=None. Without ratio, the bar column wouldn't claim proportional space in the Progress table, resulting in a narrow bar instead of spanning the terminal width. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
generate() now just fires on_rollout_start(num_rollouts) with the batch size when rollouts acquire the semaphore — no cumulative state. The TUI layer (eval_utils.py) owns the started_count accumulator, initializing it from the resumed count in on_start. This keeps generate() free of display-specific bookkeeping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| # Remaining segment | ||
| used = c_full + c_half + p_full + p_half | ||
| remaining = width - used |
There was a problem hiding this comment.
Dual-segment bar overflows width by one character
Medium Severity
The _DualBar rendering can overflow the declared width by one character when both the completed and in-progress segments have an odd number of half-chars that sum to total_halves. Each half-char (╸) occupies a full terminal cell, so two trailing halves (one per segment) consume 2 cells but the halves-based clamping only budgets for 1 cell. For example, with width=40, total=100, completed=99, in_progress=1: c_halves=79, p_halves=1 yields used = 39+1+0+1 = 41 > 40. The character-position total needs to be checked and adjusted after divmod, not just the halves sum.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion - on_progress → on_task_done (fires when a task completes) - on_rollout_start → on_task_start (fires when a task begins executing) - ProgressCallback → TaskDoneCallback - RolloutStartCallback → TaskStartCallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| on_progress: ProgressCallback | None = None, | ||
| on_task_done: TaskDoneCallback | None = None, | ||
| on_log: LogCallback | None = None, | ||
| on_task_start: TaskStartCallback | None = None, |
There was a problem hiding this comment.
Renamed parameter breaks existing test caller
High Severity
Renaming the on_progress parameter to on_task_done in generate() breaks an existing test in tests/test_environment_extra.py (line 327) that still calls generate(on_progress=no_op). This will raise a TypeError at runtime since on_progress is no longer a valid keyword argument. The refactoring is incomplete — all callers need to be updated to use the new name.


Summary
on_acquirecallback towith_sem()that fires when a rollout acquires the semaphore and starts executing, enabling accurate tracking of in-flight workRolloutStartCallbackthrough thegenerate()→evaluate()→run_evaluation()callback chain so the display knows how many rollouts are currently active(5/100 rollouts, 8 active)) when rollouts are in flightHow it works
The bar has three visual segments:
in_progress = started - completed, wherestartedis incremented each time a rollout acquires the concurrency semaphore andcompletedcomes from the existingon_progresscallback.Files changed
verifiers/utils/async_utils.pyon_acquirecallback towith_sem()verifiers/types.pyRolloutStartCallbacktype aliasverifiers/envs/environment.pyon_rollout_startthroughgenerate()andevaluate()verifiers/utils/eval_utils.pyon_rollout_startthroughrun_evaluation()andrun_with_progress()verifiers/utils/eval_display.pyDualBarColumn/_DualBarrenderable, trackstartedcount inEnvEvalStateTest plan
uv run pytest tests/- 608 tests)uv run pre-commit run --all-filespasses (ruff check + format)prime eval runwith a real environment and observe the dual-segment bar during executionGenerated with Claude Code
Note
Medium Risk
Changes the public callback surface (
on_progress->on_task_doneplus newon_task_start) and adds semaphore-acquire hooks, which could affect integrations and progress accounting under concurrency/resume scenarios.Overview
Adds rollout in-flight tracking and a new Rich UI to visualize it: the evaluation display now renders a three-segment progress bar (completed vs running vs remaining) and maintains a
startedcounter to computein_progress.Refactors the generation/evaluation callback API by replacing
on_progresswithon_task_doneand introducingon_task_start;Environment.generate()now triggerson_task_startwhen a task acquires the concurrency semaphore via a newwith_sem(..., on_acquire=...)hook, and this signal is threaded throughevaluate(),run_evaluation(), and the GEPA adapter.Written by Cursor Bugbot for commit ad1f673. This will update automatically on new commits. Configure here.