fix(provider): partition invalid review findings instead of failing the whole feature by coletebou · Pull Request #84 · openclaw/clawpatch

coletebou · 2026-05-19T00:28:39Z

Summary

Today, when an LLM returns a single review finding whose category is outside the 10-value enum (e.g. "quality", "correctness"), reviewOutputSchema.parse(output) throws and the whole feature loses every finding it produced — even the valid sibling findings. We hit this 6 times in a 100-feature run and 25+ times in a 1000-feature run.

Upstream now also runs validateReviewOutput() after parse, which throws on the FIRST bad evidence (line range / quote / file inclusion). Same all-or-nothing failure mode at a different layer.

This PR partitions at both layers so one bad finding can no longer sink its siblings.

Layer 1 (commit 1) — parseReviewOutput:

Tries fast path with reviewOutputSchema.safeParse.
On failure, validates each finding individually with the new reviewFindingSchema.
Returns {validFindings, droppedFindings: [{path, message, sample}]}.
All four review() providers (codex/opencode/acpx/grok) now use this helper.

Layer 2 (commit 2) — validateReviewOutputPartitioned:

Wraps validateReviewOutput per-finding, catching ClawpatchError("malformed-output").
Same drop-and-continue shape; new layer: "schema" | "validation" discriminator on DroppedFinding.

Drops are persisted to the run's errors array as {message, code: "schema-drop" | "validation-drop"}. The run-level fatal-error gate filters these out, so a dropped finding never marks a run "failed".

Why

Run 20260517T190759-3c9e9e (1000 features, 78 errors). 26 of those errors were single bad findings sinking their whole feature. After this PR, they'd land as code: "schema-drop" or "validation-drop" and the valid siblings would persist normally.

Files

src/types.ts — extracted reviewFindingSchema + reviewInspectedSchema
src/provider.ts — new parseReviewOutput + DroppedFinding types; threaded through 4 providers + 2 mocks
src/app.ts — reviewFeature carries droppedFindings; worker loop persists as non-fatal errors
src/review-validation.ts — new validateReviewOutputPartitioned
src/provider.test.ts + src/review-validation.test.ts — 8 new cases

Validation

pnpm format:check — clean
pnpm typecheck — clean
pnpm lint — clean
pnpm build — clean
pnpm test — 549 passed, 1 skipped (was 545 pre-PR)

Coordination

Stacks cleanly with the schema-tolerance PR (same family of failure modes).
Reviewers can land commit 1 alone if Layer 2 needs more discussion.

…g the whole feature When a single finding in a provider review response had a `category` outside the `findingCategories` enum (e.g. a hallucinated "quality" label), `reviewOutputSchema.parse(output)` in each provider's `review()` threw and `reviewFeature`'s outer try/catch marked the whole feature errored. All sibling findings on that feature were lost. Six features in run 20260517T184420-d52a72 lost every finding this way. This change introduces `parseReviewOutput(output)` in `src/provider.ts` which: 1. Tries `reviewOutputSchema.safeParse` on the whole document (fast path; common case). 2. On per-element failure, validates the container shape loosely (findings: unknown[], inspected strict) and partitions the `findings` array element-by-element using the new `reviewFindingSchema`. Successes go through; failures are recorded as `{path, message, sample}` entries in `droppedFindings`. 3. Throws `ClawpatchError("malformed-output")` only when the container shape itself is invalid (e.g. `findings` is not an array) — preserving today's fail-loud contract for genuinely malformed provider output. `reviewFindingSchema` and `reviewInspectedSchema` are now exported from `src/types.ts` so the partitioner can reuse them; the composed `reviewOutputSchema` keeps its existing shape (no breaking change to consumers). The four provider `review()` methods (codex, opencode, acpx, grok) now call `parseReviewOutput` instead of `reviewOutputSchema.parse`. The provider `Provider.review` return type is now `PartitionedReviewOutput`, which is `ReviewOutput` plus an additive `droppedFindings` field. `reviewFeature` in `src/app.ts` threads `droppedFindings` back to the review-command worker loop, which appends each as a non-fatal `{message, code: "schema-drop"}` entry on the run's `errors[]`. The run-failure threshold is now gated on `errors.some(e => e.code !== "schema-drop")` so successful runs with dropped findings still complete (and surface the drops in `run.errors[]` for triage) instead of being silently flipped to "failed". Tests: - parseReviewOutput preserves all findings when every category is valid (fast path) - parseReviewOutput keeps valid siblings when one finding has an invalid category - parseReviewOutput throws ClawpatchError when findings is not an array - parseReviewOutput truncates oversized samples to 200 characters

`validateReviewOutput` (added in 0.3.0) throws `ClawpatchError("malformed-output")` on the FIRST evidence violation it finds — line range out of bounds, quote not present, evidence file not in the prompt's included files. The outer try/catch in `reviewFeature` then loses every sibling finding on the feature and marks the whole feature errored. Same problem the schema-layer partition (parseReviewOutput) addresses, just one layer later. This commit extends the partition pattern: - Add `validateReviewOutputPartitioned` in `src/review-validation.ts`. It runs the same per-evidence checks (now factored into a private `validateFinding` helper that both APIs call) but catches malformed-output ClawpatchErrors per finding and accumulates them into a `droppedFindings` array. The original `validateReviewOutput` keeps its throw-on-first-failure contract for any caller that wants it. - Add a `layer?: "schema" | "validation"` discriminator to `DroppedFinding` so operators can tell the two failure modes apart in `run.errors`. parseReviewOutput stamps `"schema"`; the validation partition stamps `"validation"`. - Wire reviewFeature in `src/app.ts` to call validateReviewOutputPartitioned and concatenate its drops onto the schema-layer drops already plumbed through. The orchestrator now emits `code: "validation-drop"` (distinct from `"schema-drop"`) for the new bucket and the fatal-errors filter excludes both. Behavior change for callers: the `Provider.review` return type and the schema-layer partition are unchanged. Adding new error codes is additive — anyone filtering on `"schema-drop"` keeps working but now silently misses validation drops. Update the filter to also exclude `"validation-drop"` to recover the prior "non-fatal drops" semantics. Tests: four new cases in `src/review-validation.test.ts` covering the all-clean, partial-drop, included-paths, and empty-evidence flows.

coletebou added 2 commits May 18, 2026 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(provider): partition invalid review findings instead of failing the whole feature#84

fix(provider): partition invalid review findings instead of failing the whole feature#84
coletebou wants to merge 2 commits into
openclaw:mainfrom
coletebou:pr/safeparse-partition

coletebou commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

coletebou commented May 19, 2026

Summary

Why

Files

Validation

Coordination

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant