Skip to content

experiment(gra-217): tighten graduation dedup threshold 0.85→0.78#220

Open
Gradata wants to merge 1 commit into
mainfrom
experiment/gra-217-noise-correction-filter-rate
Open

experiment(gra-217): tighten graduation dedup threshold 0.85→0.78#220
Gradata wants to merge 1 commit into
mainfrom
experiment/gra-217-noise-correction-filter-rate

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented May 21, 2026

Summary

Lab experiment GRA-217: noise_correction_filter_rate

Hypothesis: Tightening _GRADUATION_DEDUP_THRESHOLD from 0.85 to 0.78 catches more near-duplicate corrections at graduation time without losing distinct signals — raising the noise correction filter rate.

Change: src/gradata/enhancements/self_improvement/_confidence.py line 139

  • Before: _GRADUATION_DEDUP_THRESHOLD = 0.85
  • After: _GRADUATION_DEDUP_THRESHOLD = 0.78

Mechanism: In _graduation.py, this threshold gates whether a candidate RULE lesson is blocked as a near-duplicate of an existing rule via semantic vector similarity. Lowering from 0.85 to 0.78 means lessons with ≥0.78 cosine similarity (vs prior ≥0.85) are now filtered — a wider net that should catch more redundant near-duplicates while still allowing genuinely distinct signals through.

Metric direction: higher (more noise filtered = better)
Measurement window: 7 days
Lead: analyst (GRA-217)

Closes: GRA-217

Lower _GRADUATION_DEDUP_THRESHOLD to catch more near-duplicate corrections
at graduation time, targeting higher noise_correction_filter_rate (GRA-217).
Window: 7d. Measurement: filter rate via graduation diagnostics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 15fb358d-5915-40ee-b77c-73a691698bb2

📥 Commits

Reviewing files that changed from the base of the PR and between a197bff and 720ec82.

📒 Files selected for processing (1)
  • Gradata/src/gradata/enhancements/self_improvement/_confidence.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest (py3.12)
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest (py3.11)
🧰 Additional context used
📓 Path-based instructions (1)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/enhancements/self_improvement/_confidence.py
🔇 Additional comments (1)
Gradata/src/gradata/enhancements/self_improvement/_confidence.py (1)

139-139: LGTM!


📝 Walkthrough
  • Tightened _GRADUATION_DEDUP_THRESHOLD from 0.85 to 0.78 in src/gradata/enhancements/self_improvement/_confidence.py
  • Increases aggressiveness of near-duplicate rule detection at graduation time by lowering the cosine similarity threshold
  • Implements lab experiment GRA-217 (noise_correction_filter_rate) to catch more near-duplicate corrections without losing distinct signals
  • Expected outcome: higher noise correction filter rate (metric measured over 7-day window)
  • No breaking changes, security fixes, or new public APIs

Walkthrough

The near-duplicate detection threshold for graduation gating is tightened by lowering _GRADUATION_DEDUP_THRESHOLD from 0.85 to 0.78, causing the system to treat more similar rules as duplicates during the graduation process.

Changes

Graduation Dedup Threshold

Layer / File(s) Summary
Graduation dedup threshold tightening
Gradata/src/gradata/enhancements/self_improvement/_confidence.py
_GRADUATION_DEDUP_THRESHOLD constant is reduced from 0.85 to 0.78, lowering the near-duplicate similarity threshold and causing more rules to be treated as duplicates during graduation gating.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested labels

feature

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: tightening the graduation dedup threshold from 0.85 to 0.78, which is the core modification in the changeset.
Description check ✅ Passed The description provides detailed context about the lab experiment GRA-217, explaining the hypothesis, mechanism, and expected impact of the threshold change.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch experiment/gra-217-noise-correction-filter-rate

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.21.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.62][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant