Skip to content

fix(provider): partition invalid review findings instead of failing the whole feature#84

Open
coletebou wants to merge 2 commits into
openclaw:mainfrom
coletebou:pr/safeparse-partition
Open

fix(provider): partition invalid review findings instead of failing the whole feature#84
coletebou wants to merge 2 commits into
openclaw:mainfrom
coletebou:pr/safeparse-partition

Conversation

@coletebou
Copy link
Copy Markdown

Summary

Today, when an LLM returns a single review finding whose category is outside the 10-value enum (e.g. "quality", "correctness"), reviewOutputSchema.parse(output) throws and the whole feature loses every finding it produced — even the valid sibling findings. We hit this 6 times in a 100-feature run and 25+ times in a 1000-feature run.

Upstream now also runs validateReviewOutput() after parse, which throws on the FIRST bad evidence (line range / quote / file inclusion). Same all-or-nothing failure mode at a different layer.

This PR partitions at both layers so one bad finding can no longer sink its siblings.

Layer 1 (commit 1) — parseReviewOutput:

  • Tries fast path with reviewOutputSchema.safeParse.
  • On failure, validates each finding individually with the new reviewFindingSchema.
  • Returns {validFindings, droppedFindings: [{path, message, sample}]}.
  • All four review() providers (codex/opencode/acpx/grok) now use this helper.

Layer 2 (commit 2) — validateReviewOutputPartitioned:

  • Wraps validateReviewOutput per-finding, catching ClawpatchError("malformed-output").
  • Same drop-and-continue shape; new layer: "schema" | "validation" discriminator on DroppedFinding.

Drops are persisted to the run's errors array as {message, code: "schema-drop" | "validation-drop"}. The run-level fatal-error gate filters these out, so a dropped finding never marks a run "failed".

Why

Run 20260517T190759-3c9e9e (1000 features, 78 errors). 26 of those errors were single bad findings sinking their whole feature. After this PR, they'd land as code: "schema-drop" or "validation-drop" and the valid siblings would persist normally.

Files

  • src/types.ts — extracted reviewFindingSchema + reviewInspectedSchema
  • src/provider.ts — new parseReviewOutput + DroppedFinding types; threaded through 4 providers + 2 mocks
  • src/app.tsreviewFeature carries droppedFindings; worker loop persists as non-fatal errors
  • src/review-validation.ts — new validateReviewOutputPartitioned
  • src/provider.test.ts + src/review-validation.test.ts — 8 new cases

Validation

  • pnpm format:check — clean
  • pnpm typecheck — clean
  • pnpm lint — clean
  • pnpm build — clean
  • pnpm test — 549 passed, 1 skipped (was 545 pre-PR)

Coordination

  • Stacks cleanly with the schema-tolerance PR (same family of failure modes).
  • Reviewers can land commit 1 alone if Layer 2 needs more discussion.

coletebou added 2 commits May 18, 2026 14:24
…g the whole feature

When a single finding in a provider review response had a `category`
outside the `findingCategories` enum (e.g. a hallucinated "quality"
label), `reviewOutputSchema.parse(output)` in each provider's `review()`
threw and `reviewFeature`'s outer try/catch marked the whole feature
errored. All sibling findings on that feature were lost. Six features
in run 20260517T184420-d52a72 lost every finding this way.

This change introduces `parseReviewOutput(output)` in `src/provider.ts`
which:

  1. Tries `reviewOutputSchema.safeParse` on the whole document (fast
     path; common case).
  2. On per-element failure, validates the container shape loosely
     (findings: unknown[], inspected strict) and partitions the
     `findings` array element-by-element using the new
     `reviewFindingSchema`. Successes go through; failures are
     recorded as `{path, message, sample}` entries in
     `droppedFindings`.
  3. Throws `ClawpatchError("malformed-output")` only when the
     container shape itself is invalid (e.g. `findings` is not an
     array) — preserving today's fail-loud contract for genuinely
     malformed provider output.

`reviewFindingSchema` and `reviewInspectedSchema` are now exported
from `src/types.ts` so the partitioner can reuse them; the composed
`reviewOutputSchema` keeps its existing shape (no breaking change to
consumers).

The four provider `review()` methods (codex, opencode, acpx, grok)
now call `parseReviewOutput` instead of `reviewOutputSchema.parse`.
The provider `Provider.review` return type is now
`PartitionedReviewOutput`, which is `ReviewOutput` plus an additive
`droppedFindings` field.

`reviewFeature` in `src/app.ts` threads `droppedFindings` back to the
review-command worker loop, which appends each as a non-fatal
`{message, code: "schema-drop"}` entry on the run's `errors[]`. The
run-failure threshold is now gated on `errors.some(e => e.code !==
"schema-drop")` so successful runs with dropped findings still
complete (and surface the drops in `run.errors[]` for triage)
instead of being silently flipped to "failed".

Tests:
  - parseReviewOutput preserves all findings when every category is
    valid (fast path)
  - parseReviewOutput keeps valid siblings when one finding has an
    invalid category
  - parseReviewOutput throws ClawpatchError when findings is not an
    array
  - parseReviewOutput truncates oversized samples to 200 characters
`validateReviewOutput` (added in 0.3.0) throws `ClawpatchError("malformed-output")`
on the FIRST evidence violation it finds — line range out of bounds, quote
not present, evidence file not in the prompt's included files. The outer
try/catch in `reviewFeature` then loses every sibling finding on the
feature and marks the whole feature errored.

Same problem the schema-layer partition (parseReviewOutput) addresses,
just one layer later. This commit extends the partition pattern:

- Add `validateReviewOutputPartitioned` in `src/review-validation.ts`.
  It runs the same per-evidence checks (now factored into a private
  `validateFinding` helper that both APIs call) but catches
  malformed-output ClawpatchErrors per finding and accumulates them
  into a `droppedFindings` array. The original `validateReviewOutput`
  keeps its throw-on-first-failure contract for any caller that wants
  it.

- Add a `layer?: "schema" | "validation"` discriminator to
  `DroppedFinding` so operators can tell the two failure modes apart
  in `run.errors`. parseReviewOutput stamps `"schema"`; the validation
  partition stamps `"validation"`.

- Wire reviewFeature in `src/app.ts` to call
  validateReviewOutputPartitioned and concatenate its drops onto the
  schema-layer drops already plumbed through. The orchestrator now
  emits `code: "validation-drop"` (distinct from `"schema-drop"`) for
  the new bucket and the fatal-errors filter excludes both.

Behavior change for callers: the `Provider.review` return type and
the schema-layer partition are unchanged. Adding new error codes is
additive — anyone filtering on `"schema-drop"` keeps working but now
silently misses validation drops. Update the filter to also exclude
`"validation-drop"` to recover the prior "non-fatal drops" semantics.

Tests: four new cases in `src/review-validation.test.ts` covering the
all-clean, partial-drop, included-paths, and empty-evidence flows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant