Add dual-segment progress bar for in-progress vs completed rollouts by mikasenghaas · Pull Request #901 · PrimeIntellect-ai/verifiers

mikasenghaas · 2026-02-12T15:20:32Z

Summary

Adds a dual-segment progress bar that visually distinguishes between completed rollouts (green) and actively executing rollouts (amber), with remaining rollouts shown in dark gray
Adds an on_acquire callback to with_sem() that fires when a rollout acquires the semaphore and starts executing, enabling accurate tracking of in-flight work
Wires a new RolloutStartCallback through the generate() → evaluate() → run_evaluation() callback chain so the display knows how many rollouts are currently active
Shows an "active" count in the progress text (e.g., (5/100 rollouts, 8 active)) when rollouts are in flight

How it works

The bar has three visual segments:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  completed (green)   active (amber)   remaining (gray)

in_progress = started - completed, where started is incremented each time a rollout acquires the concurrency semaphore and completed comes from the existing on_progress callback.

Files changed

File	Change
`verifiers/utils/async_utils.py`	Add optional `on_acquire` callback to `with_sem()`
`verifiers/types.py`	Add `RolloutStartCallback` type alias
`verifiers/envs/environment.py`	Wire `on_rollout_start` through `generate()` and `evaluate()`
`verifiers/utils/eval_utils.py`	Wire `on_rollout_start` through `run_evaluation()` and `run_with_progress()`
`verifiers/utils/eval_display.py`	Add `DualBarColumn` / `_DualBar` renderable, track `started` count in `EnvEvalState`

Test plan

All existing tests pass (uv run pytest tests/ - 608 tests)
uv run pre-commit run --all-files passes (ruff check + format)
Manual verification: run prime eval run with a real environment and observe the dual-segment bar during execution
Verify the bar correctly shows active rollouts ramping up when the evaluation starts, and draining to zero as it finishes

Generated with Claude Code

Note

Medium Risk
Changes the public callback surface (on_progress -> on_task_done plus new on_task_start) and adds semaphore-acquire hooks, which could affect integrations and progress accounting under concurrency/resume scenarios.

Overview
Adds rollout in-flight tracking and a new Rich UI to visualize it: the evaluation display now renders a three-segment progress bar (completed vs running vs remaining) and maintains a started counter to compute in_progress.

Refactors the generation/evaluation callback API by replacing on_progress with on_task_done and introducing on_task_start; Environment.generate() now triggers on_task_start when a task acquires the concurrency semaphore via a new with_sem(..., on_acquire=...) hook, and this signal is threaded through evaluate(), run_evaluation(), and the GEPA adapter.

^{Written by Cursor Bugbot for commit ad1f673. This will update automatically on new commits. Configure here.}

The progress bar now visually distinguishes between completed rollouts (green) and actively executing rollouts (amber), making it easy to see concurrency at a glance. The text also shows an "active" count when rollouts are in flight. Changes: - Add on_acquire callback to with_sem for tracking when rollouts start executing - Add RolloutStartCallback type and wire it through generate/evaluate/run_evaluation - Replace BarColumn with custom DualBarColumn showing three segments: completed (green), in-progress (amber), remaining (gray) - Track started count in EnvEvalState to derive in-progress = started - completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verifiers/envs/environment.py

In grouped scoring, each task covers multiple rollouts (rollouts_per_example). The on_acquire callback was incrementing started_count by 1 per group, but progress counts individual rollouts, causing in_progress = started - progress to be wrong. Now uses _make_on_acquire(len(group_input)) so started_count tracks rollouts consistently. Also removes "active" text from progress bar. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Initialize started_count to len(builder.outputs) so it matches the resumed progress count. Without this, in_progress = started - progress is clamped to 0 until started_count catches up to the resumed count, making the amber bar segment invisible for resumed evals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verifiers/utils/eval_display.py

- _DualBar now subclasses ProgressBar, inheriting __rich_measure__ and pulse animation instead of reimplementing them - Render via Segment (like ProgressBar) instead of Text with appends - Half-char precision using half-bar chars for smoother transitions - Use Rich theme styles ("bar.complete", "bar.back") as defaults instead of hardcoded RGB for completed and remaining segments - DualBarColumn follows BarColumn's constructor pattern with configurable styles - Handle ASCII fallback and console.no_color like ProgressBar does - Fix text truncation by not overriding default table_column Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Define COLOR_PENDING/RUNNING/COMPLETED/FAILED as module-level constants and use them for both panel border styles and progress bar segments. Completed bar is now green, in-progress is yellow — matching the border colors for completed and running states respectively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pass Column(no_wrap=True, ratio=2) as the default table_column, matching what Rich's BarColumn uses when bar_width=None. Without ratio, the bar column wouldn't claim proportional space in the Progress table, resulting in a narrow bar instead of spanning the terminal width. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

generate() now just fires on_rollout_start(num_rollouts) with the batch size when rollouts acquire the semaphore — no cumulative state. The TUI layer (eval_utils.py) owns the started_count accumulator, initializing it from the resumed count in on_start. This keeps generate() free of display-specific bookkeeping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-02-12T16:24:16Z

verifiers/utils/eval_display.py

+
+        # Remaining segment
+        used = c_full + c_half + p_full + p_half
+        remaining = width - used


Dual-segment bar overflows width by one character

Medium Severity

The _DualBar rendering can overflow the declared width by one character when both the completed and in-progress segments have an odd number of half-chars that sum to total_halves. Each half-char (╸) occupies a full terminal cell, so two trailing halves (one per segment) consume 2 cells but the halves-based clamping only budgets for 1 cell. For example, with width=40, total=100, completed=99, in_progress=1: c_halves=79, p_halves=1 yields used = 39+1+0+1 = 41 > 40. The character-position total needs to be checked and adjusted after divmod, not just the halves sum.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion - on_progress → on_task_done (fires when a task completes) - on_rollout_start → on_task_start (fires when a task begins executing) - ProgressCallback → TaskDoneCallback - RolloutStartCallback → TaskStartCallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-12T18:14:52Z

verifiers/envs/environment.py

-        on_progress: ProgressCallback | None = None,
+        on_task_done: TaskDoneCallback | None = None,
        on_log: LogCallback | None = None,
+        on_task_start: TaskStartCallback | None = None,


Renamed parameter breaks existing test caller

High Severity

Renaming the on_progress parameter to on_task_done in generate() breaks an existing test in tests/test_environment_extra.py (line 327) that still calls generate(on_progress=no_op). This will raise a TypeError at runtime since on_progress is no longer a valid keyword argument. The refactoring is incomplete — all callers need to be updated to use the new name.

cursor bot reviewed Feb 12, 2026

View reviewed changes

verifiers/envs/environment.py Outdated Show resolved Hide resolved

verifiers/envs/environment.py Outdated Show resolved Hide resolved

mikasenghaas and others added 2 commits February 12, 2026 16:30

cursor bot reviewed Feb 12, 2026

View reviewed changes

verifiers/utils/eval_display.py Show resolved Hide resolved

verifiers/utils/eval_display.py Show resolved Hide resolved

mikasenghaas and others added 6 commits February 12, 2026 16:54

Use Rich's default bar.back for remaining bar segment

0805388

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix ruff

076be19

cursor bot reviewed Feb 12, 2026

View reviewed changes

mikasenghaas and others added 2 commits February 12, 2026 17:27

Remove leading underscores from acquire helper functions

6da84f5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dual-segment progress bar for in-progress vs completed rollouts#901

Add dual-segment progress bar for in-progress vs completed rollouts#901
mikasenghaas wants to merge 11 commits intomainfrom
feat/dual-progress-bar

mikasenghaas commented Feb 12, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Feb 12, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented Feb 12, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Files changed

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Feb 12, 2026

Choose a reason for hiding this comment

Dual-segment bar overflows width by one character

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 12, 2026

Choose a reason for hiding this comment

Renamed parameter breaks existing test caller

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikasenghaas commented Feb 12, 2026 •

edited by cursor bot

Loading