feat: collapse root action.yml into single node24 bundle (dist/main.js) by docker-agent · Pull Request #182 · docker/cagent-action

docker-agent · 2026-05-06T19:15:35Z

Summary

Replaces the 872-line composite action.yml (bash steps) with a using: node24 action backed by a new TypeScript bundle dist/main.js. All 24 inputs and 10 outputs are preserved verbatim — this is a zero-breaking-change refactor.

Motivation

The composite-action-with-bash-steps model makes the logic hard to test, hard to type-check, and impossible to cover with unit tests. Moving to a single Node.js bundle enables:

Full Vitest unit coverage of every auth/exec/output path
Static type-checking via tsc --noEmit
Simpler debugging (single JS bundle, real stack traces)
No external bash dependencies at runtime

Changes

action.yml

Rewritten from 872 lines of composite bash steps to 5 lines:

runs:
  using: node24
  main: dist/main.js

All 24 inputs and 10 outputs are unchanged (public contract preserved).

New modules in src/main/

File	Description
`index.ts`	Orchestrates all original steps in order
`auth.ts`	4-tier authorization waterfall (skip-auth → trusted-bot bypass → org-membership → author_association)
`binary.ts`	Downloads docker-agent + mcp-gateway via `@actions/tool-cache` with caching
`exec.ts`	Spawns docker-agent via `child_process` with retry loop, timeout (SIGTERM/SIGKILL), stdin prompt
`outputs.ts`	TypeScript port of awk state-machine: strips tool-call noise, extracts `docker-agent-output` blocks
`artifact.ts`	Uploads verbose log via `@actions/artifact` DefaultArtifactClient
`summary.ts`	Writes GitHub Actions job summary via `core.summary`

Test coverage

src/main/__tests__/outputs.test.ts — 28 tests for the awk filter + docker-agent-output extractor
src/main/__tests__/auth.test.ts — 15 tests for the 4-tier auth waterfall (mocked Octokit/org-membership)

New deps

@actions/exec@3 — binary verification
@actions/tool-cache@4 — download + cache docker-agent binary
@actions/artifact@6 — upload verbose log artifact

Security properties preserved

All API keys masked with core.setSecret() before any exec
Keys in env, never argv (no eval anywhere)
sanitizeInput / sanitizeOutput / checkAuth imported from existing src/security/ modules (unmodified)
4-tier auth waterfall is a faithful port of the original bash logic, including the trusted-bot PAT bypass

Test results

Test Files  12 passed (12)
Tests       143 passed (143)

pnpm lint (biome + tsc + actionlint) passes cleanly.

Caveats / deferred items

No post: step — deferred per plan. Cleanup runs inline at end of main step.
dist/ not committed — gitignored for PRs per AGENTS.md; release.yml commits it for tagged releases.
binary.ts uses tool-cache for download caching (replaces actions/cache@v4 binary caching).

Refs: migration plan artifact 15eb0c67, inventory artifact 8981dc56

docker-agent · 2026-05-06T19:25:02Z

❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

docker-agent

Assessment: 🔴 CRITICAL

1 high-severity security regression + 2 notable + 2 minor issues found in the action.yml → node24 port. See inline comments for details.

docker-agent · 2026-05-06T19:25:24Z

+        core.error(`❌ Authorization failed: @${commentUserLogin} is not a ${authOrg} org member`);
+        return { authorized: false, outcome: 'denied' };
+      }
+    } catch (err: unknown) {


[MEDIUM] Tier 3 org membership check: any exception results in hard deny with no Tier 4 fallback

When orgMembershipToken and authOrg are configured, any exception (network timeout, invalid token, rate limit) in the checkOrgMembership call returns { authorized: false, outcome: 'denied' } and does not fall through to Tier 4 (author_association check):

} catch (err: unknown) { core.error(`Org membership check failed: ${(err as Error).message}`); return { authorized: false, outcome: 'denied' }; // hard deny }

This means a valid org member can be silently denied due to a transient API failure, with no recovery path and no retry. The original bash had the same behavior (exit on API failure), but this PR is a good opportunity to improve availability by falling through to Tier 4 on non-fatal errors, or at minimum adding a configurable retry.

Specific concern: checkOrgMembership makes an HTTP call to https://api.github.com/orgs/${org}/members/${user}. A 502 or 503 from the GitHub API will throw, and a legitimate contributor's PR will fail auth. Today's GitHub Actions ecosystem sees periodic API degradations — this is a real-world risk.

Suggested fix: On transient HTTP errors (5xx, network timeout), fall through to Tier 4 rather than hard-denying, or throw so the caller can add the step-level continue-on-error behaviour:

} catch (err: unknown) { core.warning(`Org membership check failed, falling through to author_association: ${(err as Error).message}`); // fall through to Tier 4 }

derekmisler · 2026-05-06T19:38:45Z

/review

docker-agent · 2026-05-06T19:47:18Z

❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

docker-agent

Assessment: 🟡 NEEDS ATTENTION

All 5 findings from the previous review are correctly fixed. The new test suite (203 tests) is well-structured — but 3 of the 5 fixes lack regression guards that would catch a future revert. See inline comments for specific gaps and suggested additions.

docker-agent · 2026-05-06T19:57:22Z

+
+// ── Wiring — sanitizeOutput runs before block extraction ─────────────────────
+
+describe('security pipeline ordering (FIX 1)', () => {


[MEDIUM] FIX 1 integration test doesn't catch the security ordering regression it was written to guard

The test 'sanitizeOutput scans full filtered output before block extraction narrows it' writes verbose log content with no leaked secret:

fsSync.appendFileSync( capturedVerboseLog, // ← No secret pattern here — sanitizeOutput is a no-op regardless of call order '--- Agent: root ---\nConversational text.\n```docker-agent-output\n## Result\n\nClean output.\n```\n', );

Because there is no secret to detect, sanitizeOutput is a no-op in both the correct order and a regressed order. The test's final assertion:

expect(outputCalls['secrets-detected']).toBe('false'); // trivially true either way

is satisfied whether sanitizeOutput runs before or after extractDockerAgentOutputBlock.

What a meaningful regression test looks like: include a known secret pattern in the conversational text before the block (e.g., sk-ant-api03-XXXXXXXX), and assert secrets-detected === 'true'. Under the old (broken) order, that content is stripped by block extraction before sanitizeOutput sees it → secrets-detected === 'false' → test fails. Under the new (correct) order, sanitizeOutput scans the full filtered text → detects the key → secrets-detected === 'true' → test passes.

Without this, the highest-severity fix from the previous review has no regression guard.

docker-agent · 2026-05-06T19:57:22Z

+    expect(result.outcome).toBe('denied');
+  });
+
+  it('denies when org membership check throws', async () => {


[MEDIUM] Tier 3 fallthrough test is vacuous — passes under both old (hard deny) and new (fallthrough) code

The test 'denies when org membership check throws' uses author_association: 'NONE':

it('denies when org membership check throws', async () => { await writePayload({ comment: { author_association: 'NONE', user: { login: 'outsider' } }, }); mockCheckOrgMembership.mockRejectedValue(new Error('Token invalid')); // ... expect(result.authorized).toBe(false); expect(result.outcome).toBe('denied'); });

With NONE association, Tier 4 (checkAuth('NONE', ['OWNER', 'MEMBER', 'COLLABORATOR'])) also returns false → outcome: 'denied'. So the test outcome is identical whether the catch hard-denies (old behavior) or falls through to Tier 4 (new behavior). Reverting the fix to a hard deny would not break this test.

The missing case that actually validates the fallthrough:

it('falls through to author_association when org check throws', async () => { await writePayload({ comment: { author_association: 'OWNER', user: { login: 'repo-owner' } }, }); mockGetAuthenticated.mockResolvedValue({ data: { login: 'bot' } }); mockCheckOrgMembership.mockRejectedValue(new Error('Network timeout')); const result = await checkAuthorization({ ...opts, orgMembershipToken: 'org-token', authOrg: 'my-org', }); // Old code: hard deny → authorized=false. New code: Tier 4 OWNER → authorized=true. expect(result.authorized).toBe(true); expect(result.outcome).toBe('author-association'); });

This case distinguishes the two behaviors and confirms the Tier 3 catch actually falls through.

docker-agent · 2026-05-06T19:57:22Z

+    expect(result).toContain('Actual answer');
+  });
+
+  it('strips --- Tool: blocks', () => {


[MEDIUM] No test for the FIX 4 behavior: blank line in inTool state is dropped (not emitted)

The existing 'strips --- Tool: blocks' test terminates inTool state via a --- Agent: header, not a blank line:

it('strips --- Tool: blocks', () => { const input = '--- Tool: read_file ---\nsome tool internals\n--- Agent: root ---\nClean output\n'; // ↑ exits inTool via header, not blank line const result = filterAgentOutput(input); expect(result).toContain('Clean output'); });

The fix (commit b7a5bb4) changed the blank-line exit from a fall-through (which would emit the blank line) to a continue (which drops it, matching awk next):

if (line.trim() === '') { state = 'normal'; continue; // ← the fix. Before: fell through and blank line was emitted }

No test exercises this path. Reverting continue back to a fall-through would emit an extra blank line after tool blocks, but the existing tests would not catch it.

Suggested addition:

it('drops blank line that terminates an inTool block (matches awk `next`)', () => { const input = [ '--- Tool: bash ---', 'some tool output', '', // ← blank line exits inTool, should be dropped 'Clean output', // ← back in normal state ].join('\n'); const result = filterAgentOutput(input); // Blank line must NOT appear — it was consumed by the inTool exit const lines = result.split('\n'); // 'Clean output' should not be preceded by a blank line const cleanIdx = lines.findIndex(l => l === 'Clean output'); expect(cleanIdx).toBeGreaterThan(-1); expect(lines[cleanIdx - 1]).not.toBe(''); });

docker-agent · 2026-05-06T19:57:22Z

+    expect(writtenData.toString()).toContain('Raw prompt');
+  });
+
+  it('returns TIMEOUT_EXIT_CODE (124) without retrying on timeout', async () => {


[LOW] Timeout test bypasses the real timer → SIGTERM path; child.kill is never called in the test

The test achieves exit code 124 by making the mock child emit close(124) immediately — bypassing the real timeout mechanism:

it('returns TIMEOUT_EXIT_CODE (124) without retrying on timeout', async () => { mockSpawn.mockReturnValue(makeMockChild(TIMEOUT_EXIT_CODE)); // child immediately emits close(124) const result = await runAgent(baseOpts({ timeout: 5, maxRetries: 3 })); expect(result.exitCode).toBe(TIMEOUT_EXIT_CODE); expect(mockSpawn).toHaveBeenCalledOnce(); // ✓ no retry on 124 // child.kill was never called — timer never fired });

The real timeout path (spawnAgent):

const timer = setTimeout(() => { timedOut = true; try { child.kill('SIGTERM'); } catch {} setTimeout(() => { try { child.kill('SIGKILL'); } catch {} }, 5000); resolve(TIMEOUT_EXIT_CODE); }, opts.timeoutSeconds * 1000);

The test correctly validates the no-retry-on-124 invariant, but does not verify that:

child.kill('SIGTERM') is called when the timer fires

The timedOut flag gates the resolve path correctly

Suggested complement (uses makeMockChild's delayMs parameter, which already exists):

it('kills process with SIGTERM when timeout fires', async () => { const child = makeMockChild(0, 5000); // child would exit in 5s naturally mockSpawn.mockReturnValue(child); const result = await runAgent(baseOpts({ timeout: 0.05, maxRetries: 0 })); // 50ms timeout expect(result.exitCode).toBe(TIMEOUT_EXIT_CODE); expect(child.kill).toHaveBeenCalledWith('SIGTERM'); }, 2000);

docker-agent · 2026-05-06T19:57:22Z

+  // ── 2. Remote @actions/cache hit (cross-run persistence) ───────────────
+  const tmpBinDir = await fs.promises.mkdtemp(path.join(os.tmpdir(), 'docker-agent-'));
+  const cacheKey = `docker-agent-${toolName}-${version}-${platform}-${arch}`;
+  const restoredKey = await actionsCache.restoreCache([tmpBinDir], cacheKey);


[LOW] restoreCache is not wrapped in try/catch, creating asymmetric error handling vs saveCache

saveCache is wrapped and non-fatal (logs a warning and continues). restoreCache is not:

// restoreCache — any exception propagates and aborts binary setup: const restoredKey = await actionsCache.restoreCache([tmpBinDir], cacheKey); // saveCache — wrapped (non-fatal): try { await actionsCache.saveCache([tmpBinDir], cacheKey); } catch (err: unknown) { core.warning(`Failed to save docker-agent to remote cache: ${(err as Error).message}`); }

@actions/cache typically returns undefined (cache miss) rather than throwing when the cache service is unavailable. However, in certain conditions — permission errors, ACTIONS_CACHE_URL misconfiguration, or future API changes — it may throw. If it does, the exception propagates out of ensureDockerAgent, skipping the download fallback and failing the entire action.

The same pattern applies to ensureMcpGateway (line 164).

Suggested fix (consistent with saveCache's pattern):

let restoredKey: string | undefined; try { restoredKey = await actionsCache.restoreCache([tmpBinDir], cacheKey); } catch (err: unknown) { core.warning(`Remote cache restore failed (${(err as Error).message}); falling back to download`); }

This ensures that a cache service blip never blocks a binary download.

docker-agent

Assessment: 🟢 APPROVE

All 10 findings across two review rounds (5 runtime bugs, 5 test-quality gaps) are fully addressed. Every fix is correct, and each regression guard is load-bearing — verified to fail on the un-fixed code.

Round 1 runtime fixes (b7a5bb4) — ✅ all verified:

FIX 1 · sanitizeOutput before block extraction — security ordering preserved
FIX 2 · Two-level cache: remote @actions/cache + local tool-cache
FIX 3 · Tier 3 org-membership catch falls through to Tier 4
FIX 4 · inTool blank-line exit uses continue (matches awk next)
FIX 5 · skip-summary uses core.getBooleanInput

Round 2 regression-guard fixes (5054132) — ✅ all verified load-bearing:

FIX A · Integration test includes real sk-ant-api03- key; asserts secrets-detected=true — fails if ordering is reverted
FIX B · Auth test uses OWNER + org throws → authorized=true expected — fails if Tier 3 hard-deny is restored
FIX C · Outputs test checks blank line is absent after inTool block — fails if continue reverts to fall-through
FIX D · makeMockChild.kill emits close; timeout test asserts child.kill('SIGTERM') called — exercises real timer path
FIX E · Both restoreCache calls (docker-agent + mcp-gateway) wrapped in try/catch — matching saveCache pattern

Replace the 872-line composite action with a `using: node24` action backed by a new TypeScript bundle `dist/main.js`. Key changes: - action.yml: rewritten to `using: node24, main: dist/main.js` — all 24 inputs and 10 outputs preserved verbatim - src/main/index.ts: orchestrates all original steps (validation, auth, input sanitization, binary setup, agent execution, output processing, output sanitization, artifact upload, job summary) - src/main/auth.ts: 4-tier authorization waterfall (skip-auth → trusted-bot bypass → org-membership → author_association) - src/main/binary.ts: downloads docker-agent (and optionally mcp-gateway) via @actions/tool-cache with caching - src/main/exec.ts: spawns docker-agent via child_process with retry loop, timeout (SIGTERM/SIGKILL), stdin prompt, API keys via env only (never argv) - src/main/outputs.ts: TypeScript port of the awk state-machine that strips tool-call noise from verbose agent logs; extracts docker-agent-output fenced blocks - src/main/artifact.ts: uploads verbose log via @actions/artifact DefaultArtifactClient - src/main/summary.ts: writes GitHub Actions job summary via core.summary - New deps: @actions/exec, @actions/tool-cache, @actions/artifact - tsup.config.ts: add 'main' entry point - All 143 unit tests pass; pnpm lint clean Security properties preserved: - All API keys masked with core.setSecret() before any exec - Keys in env only, never argv (no eval) - sanitizeInput / sanitizeOutput / checkAuth imported from existing src/security/ modules - Authorization 4-tier waterfall identical to original Assisted-By: docker-agent

FIX 1 (security): correct sanitizeOutput ordering in index.ts - Split processAgentOutput into two separate phases - Step 8: filterAgentOutput writes FULL filtered text to outputFile - Step 9: sanitizeOutput scans the full filtered text (before block extraction) - Step 9b: extractDockerAgentOutputBlock runs AFTER sanitize; skipped on leak - Leaked keys in conversational text are now caught even when a clean block follows FIX 2 (performance): add @actions/cache remote persistence to binary.ts - Add @actions/cache@6 dependency - restoreCache() before download attempt; saveCache() after successful download - Applies to both docker-agent and mcp-gateway binaries - tc.find/cacheDir kept for local in-process hits; @actions/cache for cross-run - saveCache failures are non-fatal (read-only in forked PRs) FIX 3 (auth): Tier 3 exception falls through to Tier 4 instead of hard-denying - On network error / 5xx in checkOrgMembership, emit core.warning and fall through - Tier 4 (author_association) now runs when Tier 3 throws, not just when skipped - Valid contributors with OWNER/MEMBER/COLLABORATOR association are no longer blocked FIX 4 (outputs): inTool blank-line exit now continues (drops line) - Split the combined if-condition into two separate branches - Blank-line path: state=normal + continue (matches awk `next`, drops the line) - Header line path: state=normal + fall-through (re-evaluates the header line) FIX 5 (robustness): replace case-sensitive skip-summary check - Remove unused _skipSummary variable from try block - Use core.getBooleanInput('skip-summary') in finally block (handles 'True'/'TRUE'/'1') Assisted-By: docker-agent

New test files: - src/main/__tests__/artifact.test.ts (8 tests) — makeArtifactName + uploadVerboseLog - src/main/__tests__/binary.test.ts (9 tests) — detectPlatform, local/remote-cache hits, full download, mcp-gateway - src/main/__tests__/exec.test.ts (17 tests) — buildArgs (pure) + runAgent with mocked spawn - src/main/__tests__/summary.test.ts (12 tests) — writeJobSummary status lines, table, output section - src/main/__tests__/main.integration.test.ts (13 tests) — full run() pipeline via mocked externals Also export 'run' from index.ts and guard auto-invocation with process.env.VITEST so the integration test can import and call run() without triggering action side-effects. Totals: 189 unit tests (16 files), 14 integration tests (1 file) — all passing. Assisted-By: docker-agent

pnpm format + biome check --unsafe applied to: - src/main/__tests__/artifact.test.ts - src/main/__tests__/binary.test.ts - src/main/__tests__/exec.test.ts - src/main/__tests__/main.integration.test.ts Changes: line wrapping (100-col), import sort order, useLiteralKeys, removed unused import (fsSync in summary.test.ts was already clean). Assisted-By: docker-agent

FIX A (integration test — security pipeline ordering): Replace vacuous conversational-only content with a real Anthropic API key (matching /sk-ant-[a-zA-Z0-9_-]{30,}/) in the verbose log before a clean docker-agent-output block. Test now asserts secrets-detected=true and security-blocked=true, which ONLY passes under the correct order (filter→sanitize→extract). Also asserts outputFile retains the leaked key (Step 9b is skipped on leak detection). FIX B (auth.test.ts — Tier 3 fallthrough is distinguishable): Add 'falls through to author_association when org check throws (FIX 3)'. Uses OWNER association so old hard-deny code would return denied but new fallthrough code reaches Tier 4 and returns authorized=true, outcome='author-association'. This test fails if the FIX 3 catch block is reverted to return { authorized: false }. FIX C (outputs.test.ts — blank-line inTool exit): Add 'drops blank line that terminates an inTool block (matches awk next)'. Checks that the line immediately before 'Clean output' is not blank, which fails if the continue is reverted to fall-through. FIX D (exec.test.ts — SIGTERM kill path): Add 'kills process with SIGTERM when timeout fires'. Uses a 50ms action timeout with a child that would naturally exit in 5s. Updated makeMockChild so kill() emits close via setImmediate (simulates OS killing the process), allowing the timer path to complete within the 2s test timeout. Asserts child.kill was called with 'SIGTERM'. FIX E (binary.ts — restoreCache non-fatal): Wrap both restoreCache calls (docker-agent and mcp-gateway) in try/catch with core.warning + fallthrough to download, matching the existing saveCache non-fatal pattern. A network error in restoreCache no longer fails the action. Assisted-By: docker-agent

Assisted-By: docker-agent

The test was hardcoding 'v1.54.0' in two places: the fake binary content and the cagent-version output assertion. When upstream bumped DOCKER_AGENT_VERSION to v1.56.0 (via the rebase), the assertion broke. Fix: read DOCKER_AGENT_VERSION from the canonical file at the start of the test module so both the fake binary echo and the assertion always reflect the pinned version. This makes the test resilient to future version bumps without requiring manual test updates. Assisted-By: docker-agent

docker-agent

Assessment: 🟡 NEEDS ATTENTION

Round 3 review — post-rebase sanity check

This PR was reviewed after rebasing onto 6 upstream commits (v1.54.0→v1.56.0 bump, inline mention-reply fixes, self-review local action paths).

Rebase interaction analysis: Clean — the 6 upstream commits modified src/mention-reply/, .github/actions/mention-reply/, and review-pr/ (none of which this PR touches). No file conflicts, no behavioral interactions. The version bump is automatically picked up via readDockerAgentVersion().

Overall: The TypeScript port is sound. All security primitives are correct (secrets masked before exec, sanitizeOutput always runs in finally, prompt injection detection preserved). Two findings surfaced:

🔴 [HIGH] Auth catch swallows HTTP 401 → fails open on revoked token — src/main/auth.ts line 129
🟡 [MEDIUM] Retry output accumulation — src/main/exec.ts line 296

See inline comments for details and suggested fixes.

docker-agent · 2026-05-07T14:23:11Z

+      // rather than hard-denying a valid contributor.
+      core.warning(
+        `Org membership check failed (${(err as Error).message}); falling back to author_association`,
+      );


[HIGH] Auth Tier 3 catch falls through to Tier 4 on ALL exceptions — fails open when token is revoked

The catch block here silences all exceptions from checkOrgMembership (network errors, 5xx, and HTTP 401) and falls through to the weaker author_association Tier 4 check.

checkOrgMembership throws on HTTP 401 with message "Org membership token is missing or invalid". When the org-membership-token is revoked or compromised, the original bash hard-denied (exit 1). The new code instead falls through to Tier 4, allowing a user with author_association of OWNER/MEMBER/COLLABORATOR to be authorized via the weaker fallback — even though the primary membership gate has failed.

The comment says "Network error / 5xx" but the catch applies to 401 as well.

Suggested fix: rethrow on 401 (token invalid) — only fall through for transient errors:

} catch (err: unknown) { const status = (err as { status?: number }).status; if (status === 401) { // Token is invalid/revoked — hard deny, don't fall through core.error(`Org membership token is invalid (HTTP 401): ${(err as Error).message}`); return { authorized: false, outcome: 'denied' }; } // Network / 5xx: warn and fall through to Tier 4 core.warning(`Org membership check failed (${(err as Error).message}); falling back to author_association`); }

Rebase note: This is not introduced by the rebase — it's present in the PR's own code and was intentionally labeled "FIX 3" in tests. Worth revisiting whether 401 should also fail-closed per the project's security-first design.

Fixed in commit 764558f. The catch block now inspects err.status:

HTTP 401 (revoked/invalid token) → core.error + hard-deny ({ authorized: false, outcome: 'denied' }). Does not fall through to the weaker Tier 4 author_association check.

Non-401 (network error, 5xx, timeout) → core.warning + fall through to Tier 4 as before.

Two new/updated tests in src/main/__tests__/auth.test.ts:

hard-denies when org membership token returns HTTP 401 (does not fall through to Tier 4) — uses OWNER association, which Tier 4 would authorize; the denied outcome proves the hard-deny fired instead of falling through.

falls through to author_association when org check throws a non-401 error — explicitly uses a plain Error with no .status + OWNER association, confirming the fallthrough path still works.

docker-agent · 2026-05-07T14:23:11Z

+        '',
+      ].join(os.EOL);
+      fs.appendFileSync(opts.verboseLogFile, separator, 'utf-8');
+    }


[MEDIUM] Retry loop appends to verbose log without resetting output state — all retry attempts' content ends up in the output file

The verbose log is intentionally cumulative (with separators), which is correct. But the clean output file in the original bash was truncated (> "$OUTPUT_FILE") before each retry attempt, ensuring downstream consumers only ever saw the last attempt's output.

In the new code, filterAgentOutput is called on the full verbose log after all retries complete in index.ts:

const rawVerbose = fs.readFileSync(verboseLogFile, 'utf-8'); const filteredOutput = filterAgentOutput(rawVerbose); // ALL attempts fs.writeFileSync(outputFile, filteredOutput, 'utf-8');

The extractDockerAgentOutputBlock extractor finds the first ```docker-agent-output fence and extracts until the first closing ```. If a failed attempt 1 emits a partial ```docker-agent-output block without a closing fence, attempt 2's closing ``` will terminate it — producing truncated/incorrect extracted output.

For most transient retry scenarios (API timeouts, transient errors) the risk is low since the failing agent is unlikely to emit a partial structured block. However the behavioral change from the original is real.

Suggested fix: After all retries in runAgent or before calling filterAgentOutput in index.ts, trim the verbose log content to only the final attempt's section (after the last RETRY ATTEMPT separator):

// In index.ts, before filterAgentOutput: const rawVerbose = fs.readFileSync(verboseLogFile, 'utf-8'); const lastAttemptMarker = /^={10,} RETRY ATTEMPT \d+/m; const parts = rawVerbose.split(lastAttemptMarker); const lastAttemptContent = parts[parts.length - 1]; const filteredOutput = filterAgentOutput(lastAttemptContent);

Or pass outputFile path to runAgent so it can be reset per-retry (matching the original bash behavior).

Fixed in commit 764558f. Before calling filterAgentOutput, the verbose log is now split on the retry-attempt marker (/^={10,} RETRY ATTEMPT \d+/m) and only the last section is filtered:

const lastAttemptMarker = /^={10,} RETRY ATTEMPT \d+/m; const parts = rawVerbose.split(lastAttemptMarker); const lastAttemptContent = parts[parts.length - 1]; const filteredOutput = filterAgentOutput(lastAttemptContent);

When there are no retries parts.length === 1 and parts[0] is the full log — correct behavior preserved.

Two new tests added to src/main/__tests__/main.integration.test.ts (describe block retry output trimming (FIX 2)):

Writes a verbose log with a partial first-attempt docker-agent-output block followed by a retry marker and the correct second-attempt block; asserts the output file contains only the second-attempt content.

Writes a verbose log with no retry markers; asserts the full content passes through unchanged.

Assisted-By: docker-agent

docker-agent

Assessment: 🔴 CRITICAL — FIX 1 is incomplete; PR is not ready to merge

Round 4 review — verifying commit 764558f

FIX 2 (retry output accumulation) ✅ — Correct and complete. The lastAttemptMarker regex split before filterAgentOutput mirrors the original bash > "$OUTPUT_FILE" truncation. No-retry case (single part) and multi-retry case both handled correctly. Integration tests cover both paths.

FIX 1 (auth 401 hard-deny) ❌ — The guard fires in unit tests but will never fire in production. The .status property is stripped when checkOrgMembership converts the Octokit RequestError (which has .status = 401) into a plain new Error('...HTTP 401...') before rethrowing. By the time auth.ts catches it, (err as {status?:number}).status is undefined, not 401. The unit test passes only because it mocks checkOrgMembership directly with an Object.assign(new Error(), {status:401}) error, which has a shape the real implementation never produces.

The original vulnerability is still present: on a revoked/invalid org-membership-token, the action silently falls through to Tier 4 (author_association) rather than hard-denying.

See inline comments for the exact call chain and two concrete fix options.

docker-agent · 2026-05-07T14:37:15Z

+      }
+    } catch (err: unknown) {
+      const status = (err as { status?: number }).status;
+      if (status === 401) {


[HIGH] FIX 1 is incomplete — .status is stripped before this guard is reached

This guard will never trigger in production. Here's the full call chain:

GitHub API → throws Octokit.RequestError with .status = 401

checkOrgMembership (in src/check-org-membership/index.ts line 32–37) catches it, then rethrows as a plain new Error('...HTTP 401...') — losing .status entirely:
if (status === 401) { throw new Error('Org membership token is missing or invalid (HTTP 401). ...'); // ↑ plain Error — .status is undefined on this object }

Here in auth.ts — (err as { status?: number }).status evaluates to undefined, so status === 401 is always false. The code falls through to Tier 4 exactly as before.

The unit test at auth.test.ts line 270 passes only because it mocks checkOrgMembership directly with Object.assign(new Error(), { status: 401 }), bypassing the real conversion. That error shape is never produced by the actual checkOrgMembership implementation.

Two correct fixes — pick one:

Option A — message-based guard (no changes to other files):

} catch (err: unknown) { const status = (err as { status?: number }).status; const msg = (err as Error).message ?? ''; if (status === 401 || msg.includes('HTTP 401')) { core.error(`Org membership token is invalid: ${msg}`); return { authorized: false, outcome: 'denied' }; } // Network / 5xx: fall through to Tier 4 core.warning(`Org membership check failed (${msg}); falling back to author_association`); }

Option B — preserve .status in check-org-membership/index.ts:

if (status === 401) { throw Object.assign( new Error('Org membership token is missing or invalid (HTTP 401). ...'), { status: 401 } ); }

Then update the test mock to match: mockRejectedValue(Object.assign(new Error('...'), { status: 401 })) — which it already does, so the test remains valid.

Option B is cleaner (the error carries its own status), but Option A works too. Either way the test at auth.test.ts:270 is testing the right behavior — just with an error shape that doesn't match what the real code produces.

Fixed in 95052ef. Used Object.assign(new Error(...), { status: 401 }) so the thrown error preserves .status — matching exactly the shape auth.ts Tier 3 guards on. Also updated the 401 test in src/check-org-membership/__tests__/check-org-membership.test.ts to assert .status === 401 on the rethrown error, not just the message. All 221 unit tests green.

docker-agent · 2026-05-07T14:37:15Z

+    });
+    mockGetAuthenticated.mockResolvedValue({ data: { login: 'bot' } });
+    const err401 = Object.assign(new Error('Unauthorized'), { status: 401 });
+    mockCheckOrgMembership.mockRejectedValue(err401);


[HIGH] Test passes but validates the wrong error shape

const err401 = Object.assign(new Error('Unauthorized'), { status: 401 }); mockCheckOrgMembership.mockRejectedValue(err401);

This mock constructs an error with .status = 401, which makes auth.ts's status === 401 guard fire. But the real checkOrgMembership on an HTTP 401 throws new Error('Org membership token is missing or invalid (HTTP 401)...') — a plain Error with no .status property.

The test correctly covers the intended behavior (401 → hard deny), but because the mock bypasses checkOrgMembership's internal conversion, it doesn't catch the gap between the expected and actual error shapes.

Fix: Once check-org-membership/index.ts is updated to preserve .status (Option B above), this mock is already correct and no test changes are needed. If using the message-based Option A instead, the mock can stay as-is — the behavior test remains valid either way.

Assisted-By: docker-agent

docker-agent

Assessment: 🟢 APPROVE

Round 5 review — verifying commit 95052ef

FIX 1 is now complete and correct. All prior findings are resolved. The PR is ready to merge.

FIX 1 (auth 401 hard-deny) ✅ — Root cause addressed in the right place.

check-org-membership/index.ts now rethrows the 401 error with .status preserved:

throw Object.assign(
  new Error('Org membership token is missing or invalid (HTTP 401). ...'),
  { status: 401 }   // ← .status preserved through the rethrow
);

The full call chain now works as intended:

GitHub API → Octokit.RequestError with .status = 401
checkOrgMembership catches, rethrows with .status = 401 preserved ✅
auth.ts catch → status === 401 is true → hard deny, no fallthrough to Tier 4 ✅

The test in check-org-membership/__tests__/ now asserts .status === 401 on the rethrown error (in addition to the message check), closing the gap that let the previous attempt pass tests while failing in production. The shape of the mock in auth.test.ts (Object.assign(new Error(), { status: 401 })) now correctly matches what checkOrgMembership actually throws.

FIX 2 (retry output accumulation) ✅ — Verified correct in round 4, unchanged.

All CI checks pass (221 unit + 16 integration). No new issues introduced.

docker-agent requested a review from a team as a code owner May 6, 2026 19:15

docker-agent requested a review from derekmisler May 6, 2026 19:15

docker-agent commented May 6, 2026

View reviewed changes

docker-agent added 7 commits May 7, 2026 14:10

chore: remove unused fsSync import in summary.test.ts

a3000f4

Assisted-By: docker-agent

chore: use indexOf over findIndex per Biome useIndexOf rule

fd4a5e2

Assisted-By: docker-agent

docker-agent force-pushed the feat/node24-main-bundle branch from d91e92a to fd4a5e2 Compare May 7, 2026 14:12

docker-agent commented May 7, 2026

View reviewed changes

fix: rethrow 401 in auth Tier 3; trim retry output to last attempt

764558f

Assisted-By: docker-agent

docker-agent commented May 7, 2026

View reviewed changes

fix: preserve .status on 401 rethrow in checkOrgMembership

95052ef

Assisted-By: docker-agent

docker-agent commented May 7, 2026

View reviewed changes

derekmisler enabled auto-merge (squash) May 7, 2026 15:15

derekmisler approved these changes May 7, 2026

View reviewed changes

derekmisler merged commit f7b4181 into docker:main May 7, 2026
9 checks passed


		// ── Wiring — sanitizeOutput runs before block extraction ─────────────────────

		describe('security pipeline ordering (FIX 1)', () => {

Conversation

docker-agent commented May 6, 2026

Summary

Motivation

Changes

action.yml

New modules in src/main/

Test coverage

New deps

Security properties preserved

Test results

Caveats / deferred items

Uh oh!

docker-agent Bot commented May 6, 2026

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🔴 CRITICAL

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

derekmisler commented May 6, 2026

Uh oh!

docker-agent Bot commented May 6, 2026

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟡 NEEDS ATTENTION

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟡 NEEDS ATTENTION

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🔴 CRITICAL — FIX 1 is incomplete; PR is not ready to merge

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development