From 6757f666ea2bcdb4bf74162f69dfa93d922f8361 Mon Sep 17 00:00:00 2001 From: Derek Misler Date: Sat, 9 May 2026 00:33:26 +0000 Subject: [PATCH 1/2] fix(review-pr): schema mismatch, chunk size, and timeout diagnostics MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three targeted fixes for the runaway-drafter timeout seen in docker/sailor PR #754 (https://github.com/docker/sailor/actions/runs/25577728034/job/75088929732). ## Fix 1 — Schema mismatch in pr-review.yaml (root cause of infinite loop) The root agent's transfer_task delegation message was not explicitly documenting the drafter's enforced structured_output schema field names. When the LLM generated the task string it used 'title'/'body' (natural GitHub review terminology) instead of the schema-required 'category'/'issue'/'details'/'in_diff'. Every emit attempt was rejected by schema validation, causing the drafter to loop indefinitely reading the same source files ~140 times each until SIGKILL at 1800 s. Changes: - Add a CRITICAL block in the delegation section documenting the exact drafter response schema with explicit warning never to use title/body - Update step 4d (aggregate) to reference the correct field names - Update step 5 (parse drafter response) to name 'issue' and 'details' explicitly - Update step 9 (build comments) to list the correct finding fields ## Fix 2 — Lower soft chunk target 1000 → 600 lines (review-pr/action.yml) Chunk 1 of sailor PR #754 was 1229 lines of dense Rust snapshot/restore code — at the edge of what a drafter can process in one turn, amplifying the schema-loop impact. Lowering the soft target produces smaller, more manageable chunks. The 2000-line hard cap is unchanged. ## Fix 3 — Better exit-124 diagnostics (review-pr/action.yml) When the agent timed out the PR received a generic 'PR Review Failed' comment and an unhelpful step summary. Users had no idea why or what to do next. Changes: - Add a dedicated elif branch for EXIT_CODE=124 before the generic non-zero handler - Post an actionable PR comment explaining the timeout and suggesting /review retry or splitting the PR into smaller pieces - Add a TIMEOUT_NOTE variable with exit code, 1800 s limit, and verbose log artifact reference, printed in the step summary after the STATUS line --- review-pr/action.yml | 24 ++++++++++++++++++++---- review-pr/agents/pr-review.yaml | 25 +++++++++++++++++++++---- 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/review-pr/action.yml b/review-pr/action.yml index af4bd2e..efb97bd 100644 --- a/review-pr/action.yml +++ b/review-pr/action.yml @@ -284,7 +284,7 @@ runs: echo "chunk_manifest=$(jq -nc --argjson f "$FILES" '{"1": $f}')" >> $GITHUB_OUTPUT echo "✅ Single chunk ($TOTAL_LINES lines)" else - # Large diff — split at file boundaries, targeting ~1000 lines per chunk + # Large diff — split at file boundaries, targeting ~600 lines per chunk CHUNK=1 CHUNK_LINES=0 CURRENT_FILE="" @@ -301,9 +301,9 @@ runs: DIR=$(dirname "$FILE") # Start new chunk if: - # - over soft limit (1000 lines) and directory changed, OR + # - over soft limit (600 lines) and directory changed, OR # - over hard limit (2000 lines) regardless of directory - if ([ "$CHUNK_LINES" -gt 1000 ] && [ "$DIR" != "$CURRENT_DIR" ]) || [ "$CHUNK_LINES" -gt 2000 ]; then + if ([ "$CHUNK_LINES" -gt 600 ] && [ "$DIR" != "$CURRENT_DIR" ]) || [ "$CHUNK_LINES" -gt 2000 ]; then # Save current chunk's file list to manifest MANIFEST=$(echo "$MANIFEST" | jq --argjson files "$CHUNK_FILES" --arg k "$CHUNK" '.[$k] = $files') CHUNK=$((CHUNK + 1)) @@ -326,7 +326,7 @@ runs: echo "chunk_count=$CHUNK" >> $GITHUB_OUTPUT echo "chunk_manifest=$(echo "$MANIFEST" | jq -c .)" >> $GITHUB_OUTPUT - echo "✅ Split into $CHUNK chunks (target ~1000 lines each)" + echo "✅ Split into $CHUNK chunks (target ~600 lines each)" for i in $(seq 1 $CHUNK); do LINES=$(wc -l < /tmp/drafter_chunk_${i}.diff | tr -d ' ') @@ -806,6 +806,7 @@ runs: run: | REVIEW_URL="https://github.com/$REPOSITORY/pull/$PR_NUMBER" echo "review-url=$REVIEW_URL" >> $GITHUB_OUTPUT + TIMEOUT_NOTE="" if [ "$SKIP_REASON" = "concurrent" ]; then # Stay silent — the 👀 reaction on the triggering comment (added by the @@ -814,6 +815,17 @@ runs: STATUS="⏭️ **Review skipped** — another review is already in progress" elif [ -z "$EXIT_CODE" ]; then STATUS="⏭️ **Review skipped** — agent did not run" + elif [ "$EXIT_CODE" = "124" ]; then + # Timeout (SIGKILL after 1800 s) — provide actionable guidance + STATUS="⏱️ **Review timed out** (exit code: 124, limit: 1800 s)" + TIMEOUT_NOTE="- **Exit code:** 124 (SIGKILL — 1800 s timeout) +- **Timeout limit:** 1800 s +- Verbose log artifact (\'${VERBOSE_LOG_FILE}\') uploaded for debugging." + if ! gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \ + -f body="⏱️ **PR Review Timed Out** — The review agent hit the 1800 s time limit before completing. This usually happens on large or complex diffs. You can re-trigger with \`/review\` — if it times out again, consider splitting the PR into smaller pieces." \ + 2>&1; then + echo "::warning::Failed to post timeout comment to PR" + fi elif [ "$EXIT_CODE" != "0" ]; then # Check if agent actually posted a review despite the error exit code. # This happens when a sub-agent fails (e.g., API overload) but the root @@ -841,6 +853,10 @@ runs: echo "## PR Review Summary" echo "" echo "$STATUS" + if [ -n "$TIMEOUT_NOTE" ]; then + echo "" + printf '%s\n' "$TIMEOUT_NOTE" + fi echo "" echo "📝 [View Pull Request #$PR_NUMBER]($REVIEW_URL)" } >> $GITHUB_STEP_SUMMARY diff --git a/review-pr/agents/pr-review.yaml b/review-pr/agents/pr-review.yaml index 891e1e5..d349b0b 100644 --- a/review-pr/agents/pr-review.yaml +++ b/review-pr/agents/pr-review.yaml @@ -155,7 +155,8 @@ agents: c. Delegate to the drafter once with the path `./pr-review.diff`. **Both modes**: - d. Merge all findings arrays into a single list before proceeding. + d. Merge all findings arrays into a single list before proceeding. Each finding + object uses fields `category`, `issue`, `details`, `in_diff` — not `title`/`body`. e. Include any relevant learned patterns from memory in each delegation. ## CRITICAL: How to delegate to the drafter @@ -170,12 +171,26 @@ agents: - File history: relevant entries from `/tmp/file_history.txt` (CI mode only — skip in console mode) - Available files: run `ls` on directories of changed files so drafter knows real paths - Learned patterns: any relevant memories + + **CRITICAL — drafter response schema (schema-validated, strict):** + The drafter returns a JSON object with these exact top-level fields: + - `findings`: array of objects with fields `file`, `line`, `severity`, `category`, + `issue` (one-line summary), `details` (trigger + impact), `in_diff` (boolean) + - `summary`: string — overall assessment + - `review_complete`: boolean — true if the full chunk was reviewed + + **NEVER mention `title` or `body` as finding field names in the delegation + message.** Those fields do not exist in the drafter schema. If the task string + asks for `title`/`body`, the schema validator rejects every emit attempt and the + drafter enters an infinite file-read retry loop. Always use `issue` and `details`. 5. Parse the drafter's JSON response. Check `review_complete`: - If `review_complete` is `true` AND zero findings → skip directly to step 8 (Decision Rules). - If `review_complete` is `false` AND zero findings → the drafter likely exhausted its context. Post a COMMENT review noting the review was incomplete and do NOT approve. - - Otherwise, collect all findings with severity "high" or "medium" - and delegate them to the `verifier` in a single batch. Skip verification for "low" findings. + - Otherwise, collect all findings with severity "high" or "medium" — reference + each finding's one-line summary via the `issue` field and its explanation via + the `details` field (not `title`/`body`) — and delegate them to the `verifier` + in a single batch. Skip verification for "low" findings. Include the project context (from step 2) in the verifier delegation so it can validate findings against project-specific conventions (e.g., language version, available APIs). **The verifier has no file access — you must provide all code context inline.** @@ -194,7 +209,9 @@ agents: `in_changed_code == false` or `in_diff == false`. 7. **Verify line numbers** before posting (see below) 8. Apply the Decision Rules (see below) to determine the review verdict - 9. Build inline comments from CONFIRMED/LIKELY issues and post the review + 9. Build inline comments from CONFIRMED/LIKELY issues using each finding's `issue` + (one-line summary), `details` (full explanation), `severity`, `category`, `file`, + and `line` fields. Post the review. 10. Always report ALL HIGH severity bugs. Limit MEDIUM/LOW to 5 comments max. Find **real bugs in the changed code**, not style issues. If the changed code works correctly, approve it. From 4dbb76bddbbe72ef86f4cdd45a2bc7134cbad3e9 Mon Sep 17 00:00:00 2001 From: Derek Misler Date: Sat, 9 May 2026 00:36:56 +0000 Subject: [PATCH 2/2] fix(review-pr): remove invalid backslash escapes in TIMEOUT_NOTE string Inside a double-quoted bash string, \' is not a valid escape sequence and produces a literal backslash. The step summary was rendering (\'some-log-file.txt\') instead of the intended (some-log-file.txt) Drop the attempted single-quote wrapping and write () directly. --- review-pr/action.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/review-pr/action.yml b/review-pr/action.yml index efb97bd..0034017 100644 --- a/review-pr/action.yml +++ b/review-pr/action.yml @@ -820,7 +820,7 @@ runs: STATUS="⏱️ **Review timed out** (exit code: 124, limit: 1800 s)" TIMEOUT_NOTE="- **Exit code:** 124 (SIGKILL — 1800 s timeout) - **Timeout limit:** 1800 s -- Verbose log artifact (\'${VERBOSE_LOG_FILE}\') uploaded for debugging." +- Verbose log artifact (${VERBOSE_LOG_FILE}) uploaded for debugging." if ! gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \ -f body="⏱️ **PR Review Timed Out** — The review agent hit the 1800 s time limit before completing. This usually happens on large or complex diffs. You can re-trigger with \`/review\` — if it times out again, consider splitting the PR into smaller pieces." \ 2>&1; then