Skip to content

findingFromOutput signature is not stable across re-reviews: JSON.stringify(evidence) is key-order-dependent #83

@antigency

Description

@antigency

Summary

findingFromOutput (src/findings.ts:49) computes the dedup signature by passing the LLM-returned evidence array through JSON.stringify directly. JSON property order is not guaranteed across runs, so the same logical finding can produce different signature hashes and therefore different findingIds on re-review. The intended dedup-on-mergeFinding lookup (src/app.ts:702readFinding(paths, finding.findingId)) silently misses, and a second copy of the finding is written to .clawpatch/findings/.

Repro

node -e "
const crypto = require('crypto');
const stableId = (prefix, parts) => {
  const hash = crypto.createHash('sha256').update(parts.join('\0')).digest('hex').slice(0, 10);
  return prefix + '_feat_' + hash;
};

// Same evidence, three plausible orderings from an LLM:
const e1 = [{path:'a.ts',startLine:1,endLine:5,symbol:null,quote:null}];
const e2 = [{startLine:1,path:'a.ts',symbol:null,quote:null,endLine:5}];
const e3 = [{path:'a.ts',endLine:5,startLine:1,symbol:null,quote:null}];

for (const e of [e1, e2, e3]) {
  console.log(stableId('sig', ['feat','bug','title', JSON.stringify(e)]));
}
"

Output:

sig_feat_ffd1e3d0d8
sig_feat_1693cfae2c
sig_feat_160b4f4aaf

Three IDs for the same evidence.

Why it matters

clawpatch review is meant to be re-runnable. Documented quick-start:

clawpatch review --limit 10
# (later)
clawpatch review --limit 10

If the LLM returns evidence with a slightly different key order on the second run (very common — LLMs are not deterministic about JSON key emission across separate calls, even at temperature 0), the second review writes a NEW finding file for an already-known issue. Over time, .clawpatch/findings/ accumulates duplicates of the same logical finding, the report grows misleading, and clawpatch fix --finding <id> can target a stale duplicate.

Confirmed against clawpatch@0.3.0, current main HEAD 6fdd8fa. src/findings.ts and src/id.ts have not been touched since initial commit.

Suggested fix

Canonicalize the evidence stringification before hashing. Three viable approaches:

(A) Sort keys at the leaf — cheapest, narrowest scope:

// src/findings.ts
function canonicalEvidence(evidence: ReviewOutput["findings"][number]["evidence"]): string {
  return JSON.stringify(
    evidence.map((e) => ({
      path: e.path,
      startLine: e.startLine,
      endLine: e.endLine,
      symbol: e.symbol,
      quote: e.quote,
    }))
  );
}

// ...
const signature = stableId("sig", [
  featureId,
  finding.category,
  finding.title,
  canonicalEvidence(finding.evidence),
]);

(B) Generic canonical-stringify — slightly more general, useful if other stableId callers grow:

function canonicalStringify(value: unknown): string {
  if (Array.isArray(value)) return `[${value.map(canonicalStringify).join(",")}]`;
  if (value !== null && typeof value === "object") {
    const entries = Object.entries(value as object).toSorted(([a], [b]) => a.localeCompare(b));
    return `{${entries.map(([k, v]) => `${JSON.stringify(k)}:${canonicalStringify(v)}`).join(",")}}`;
  }
  return JSON.stringify(value);
}

(C) Switch signature input to a strictly typed shape — derive a path:start-end joined key string from each evidence entry. Loses some fidelity but is the most stable across LLM whim.

Option A is what I'd ship. It matches the existing evidenceRefSchema field set explicitly so a Zod schema change ripples cleanly into the signature.

Regression test

src/findings.ts has no test coverage today. Suggested addition (src/findings.test.ts):

import { describe, it, expect } from "vitest";
import { findingFromOutput } from "./findings.js";

describe("findingFromOutput signature stability", () => {
  it("produces the same signature for evidence with different key order", () => {
    const base = {
      title: "Bug X",
      category: "bug" as const,
      severity: "medium" as const,
      confidence: "high" as const,
      reasoning: "r",
      reproduction: null,
      recommendation: "fix it",
      whyTestsDoNotAlreadyCoverThis: "",
      suggestedRegressionTest: null,
      minimumFixScope: "",
    };
    const e1 = [{ path: "a.ts", startLine: 1, endLine: 5, symbol: null, quote: null }];
    const e2 = [{ endLine: 5, path: "a.ts", symbol: null, quote: null, startLine: 1 }];
    const f1 = findingFromOutput({ ...base, evidence: e1 }, "feat-1", "run-1");
    const f2 = findingFromOutput({ ...base, evidence: e2 }, "feat-1", "run-2");
    expect(f1.signature).toBe(f2.signature);
    expect(f1.findingId).toBe(f2.findingId);
  });
});

Severity

Medium. Doesn't break correctness (each finding is still valid), but defeats the dedup design and quietly degrades the findings inventory over time. Triggered by every re-review against an unchanged feature, which is one of the documented happy-path workflows.

Found while

Auditing clawpatch source after Derek (OpenClaw user) asked me to "check out current implementation for easy bug." Ran adversarially against src/, focused on JSON parsing, ID generation, shell quoting, and git ops. The signature instability is the cleanest small bug I found in this pass — small, exploitable in normal use, untested, fixable in <30 lines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions