Open
Conversation
Watchman-based file watcher that keeps QMD indexes current automatically. Splits update (on file change, debounced) from embed (scheduled every 30 min) to avoid concurrent local model runs spiking CPU. - vault-sync.sh: qmd update via watchman trigger with lock file - vault-embed.sh: qmd embed with pending check and lock file - install.sh: platform-aware (Linux systemd timer, macOS launchd plist) - setup-trigger.sh: idempotent watchman trigger registration
Adds a new plugin that scores codebases across 8 dimensions for AI agent-readiness, framed around the Stripe benchmark of 1k+ AI-generated PRs per week. Uses a multi-agent parallel assessment pattern — an orchestrator skill gathers metadata then launches 4 specialized agents concurrently, consolidating results into a weighted 0-100 score with band rating and improvement roadmap. Dimensions assessed: Test Foundation (20%), Documentation & Context (15%), Code Clarity (15%), Type Safety (15%), Architecture Clarity (15%), Consistency & Conventions (10%), Feedback Loops (5%), Change Safety (5%).
Ruby and Python codebases were unfairly penalized for lacking a static
type system. In dynamic languages, comprehensive tests + contract systems
(dry-rb, Pydantic, Result patterns) serve the same role that type
checkers serve in TypeScript/Go.
Changes:
- architecture-assessor: language-aware Type Safety rubric with separate
bands for JavaScript, TypeScript, Python, and Ruby.
- Ruby scored on dry-rb adoption, ActiveRecord validations, service
object interfaces, and Result pattern — not Sorbet presence.
- Plain JavaScript projects are flagged as TypeScript migration
candidates in recommendations (highest-ROI type safety investment).
- SKILL.md: adaptive weighting by language tier. For dynamic languages,
Test Foundation increases from 20% to 25%, Type Safety drops from 15%
to 10%. Total safety signal (35%) matches static language Type Safety (20%).
- SKILL.md: Phase 1 classifies LANGUAGE_TIER and passes it to all agents.
- SKILL.md: Phase 4 report includes language context block explaining
Ruby/Python score interpretation.
- architecture-assessor: Change Safety rubric notes that Rails co-change
patterns (model+migration, controller+view) are expected, not bad coupling.
Incorporates research from Jason Wei's Verifier's Law and Keles's verifiability-as-limit thesis into the plugin's rubrics, scoring, and report structure. Flaky test detection (test-coverage-assessor): - Detects git commits mentioning flakiness, skipped/disabled tests, and test retry plugins (rspec-retry, pytest-rerunfailures) as indicators of a noisy oracle - Detects property-based testing presence (Hypothesis, fast-check, etc.) - Detects mock/stub density as oracle quality signal Oracle quality modifiers (Test Foundation): - +5 for property-based testing - −10 for test retry plugins (masking flakiness) - −5 for >5% disabled tests - −5 for high mock density (implementation testing) - −10 for unit-only suite with no integration layer Non-functional verification signals (Feedback Loops): - Detects security scanning in CI (Brakeman, CodeQL, Bandit, bundle-audit) - Detects Dependabot / vulnerability scanning configuration - Detects coverage delta reporting (Codecov, Coveralls) - +5/+3/+2 bonuses for each Feedback Loops weight bumped from 5% → 10% (both language tiers). Consistency & Conventions reduced from 10% → 5% to compensate. Verification speed now framed as a structural prerequisite, not a convenience — 45-min CI = ~10 agent iterations/day. Verification Cost Profile added to Phase 4 report: a ✓/✗ table answering "how expensive is it to verify an agent's change?" covering pipeline speed, security scanning, property-based tests, reproducible dev state, and coverage reporting. Stripe Benchmark section rewritten to explain the verification asymmetry mechanism, not just cite the number. "What Agent-Ready Means" section rewritten using verification framing — agents don't eliminate verification, they relocate it to automated systems. Phase 6 added: recommends btar for CI enforcement after strategic assessment, with install snippet and context generation command.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
codebase-readinessplugin that scores codebases across 8 dimensions for AI agent-readinessHow It Works
Phase 1 — Orchestrator runs shell commands to build a Codebase Snapshot (language, size, CI config, docs, lint)
Phase 2 — 4 agents launch in parallel, each assessing 2-3 related dimensions:
test-coverage-assessor→ Test Foundation (20%) + Feedback Loops (5%)documentation-assessor→ Documentation & Context (15%)code-clarity-assessor→ Code Clarity (15%) + Consistency & Conventions (10%)architecture-assessor→ Type Safety (15%) + Architecture Clarity (15%) + Change Safety (5%)Phase 3-4 — Weighted score (0-100) calculated, band rating assigned, full report assembled with improvement roadmap
Phase 5 — Offers to save report as
AGENT_READY_ASSESSMENT.mdScore Bands
Test Plan
cdto any existing project and invoke: "Run a codebase readiness assessment"AGENT_READY_ASSESSMENT.md/plugin marketplace list