Skip to content

feat: adaptive gating + cross-review dedup for review army (v0.15.2.0)#760

Open
garrytan wants to merge 5 commits intomainfrom
garrytan/learning-phase-2.5-clean
Open

feat: adaptive gating + cross-review dedup for review army (v0.15.2.0)#760
garrytan wants to merge 5 commits intomainfrom
garrytan/learning-phase-2.5-clean

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented Apr 1, 2026

Summary

Reviews now learn from your decisions and get smarter over time.

  • Cross-review finding dedup — skip a finding once, it stays quiet until the code changes. No more re-skipping the same intentional patterns every PR.
  • Test stub suggestions — specialists suggest skeleton tests alongside findings using the detected test framework (Jest, Vitest, RSpec, pytest, Go test). Findings with test stubs are surfaced as ASK items.
  • Adaptive specialist gating — specialists dispatched 10+ times with zero findings get auto-gated. Security and data-migration are exempt (insurance policies). Force any specialist back with --security, --performance, etc.
  • Per-specialist stats — every review records which specialists ran, findings per specialist, and skip/gate reasons. Powers adaptive gating and gives /retro richer data.

Files Changed

  • review/specialists/*.md — add test_stub optional field to all specialist schemas
  • review/design-checklist.md — document test_stub field
  • scripts/resolvers/review-army.ts — test framework detection, adaptive gating logic, per-specialist stats
  • review/SKILL.md.tmpl — Step 5.0 cross-review dedup, test stub override in Step 5a, enriched review-log
  • review/SKILL.md — regenerated
  • bin/gstack-specialist-stats — new binary for specialist hit rate tracking

Test Coverage

All new code paths are prompt template logic (natural language instructions to Claude) and a shell script. No app-level code paths to unit test. Existing test suite passes (all assertions green).

Pre-Landing Review

No issues found. Changes are prompt templates + infrastructure only. No SQL, auth, or trust boundary changes.

Test plan

  • All bun tests pass (0 failures)
  • SKILL.md regenerated successfully from template
  • Clean cherry-pick onto main (no conflicts)

🤖 Generated with Claude Code

garrytan and others added 5 commits April 1, 2026 14:33
All specialist prompts now document test_stub as an optional output field,
enabling specialists to suggest test code alongside findings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds gstack-specialist-stats binary for tracking specialist hit rates.
Resolver now detects test framework for test_stub generation, applies
adaptive gating to skip silent specialists, and compiles per-specialist
stats for the review-log entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ew-log

Step 5.0 suppresses findings previously skipped by the user when the
relevant code hasn't changed. Test stub findings force ASK classification
so users approve test creation. Review-log now includes quality_score,
per-specialist stats, and per-finding action records.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
[ -f a ] || [ -f b ] && X="y" evaluates as A || (B && C), so the
assignment only runs when the second test passes. Wrap the OR group
in braces: { [ -f a ] || [ -f b ]; } && X="y".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

E2E Evals: ✅ PASS

5/5 tests passed | $.47 total cost | 12 parallel runners

Suite Result Status Cost
e2e-review 3/3 $0.43
llm-judge 2/2 $0.04

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant