Skip to content

feat: add codebase-readiness plugin#15

Open
dgalarza wants to merge 7 commits intomainfrom
dg-codebase-agent-ready
Open

feat: add codebase-readiness plugin#15
dgalarza wants to merge 7 commits intomainfrom
dg-codebase-agent-ready

Conversation

@dgalarza
Copy link
Owner

Summary

  • Adds a new codebase-readiness plugin that scores codebases across 8 dimensions for AI agent-readiness
  • Framed around the Stripe benchmark of 1,000+ AI-generated PRs per week as a non-political external standard
  • Uses the parallel-code-review pattern: orchestrator skill gathers metadata, then launches 4 specialized agents concurrently

How It Works

Phase 1 — Orchestrator runs shell commands to build a Codebase Snapshot (language, size, CI config, docs, lint)

Phase 2 — 4 agents launch in parallel, each assessing 2-3 related dimensions:

  • test-coverage-assessor → Test Foundation (20%) + Feedback Loops (5%)
  • documentation-assessor → Documentation & Context (15%)
  • code-clarity-assessor → Code Clarity (15%) + Consistency & Conventions (10%)
  • architecture-assessor → Type Safety (15%) + Architecture Clarity (15%) + Change Safety (5%)

Phase 3-4 — Weighted score (0-100) calculated, band rating assigned, full report assembled with improvement roadmap

Phase 5 — Offers to save report as AGENT_READY_ASSESSMENT.md

Score Bands

Score Band Meaning
85-100 Agent-Ready Autonomous agent work
70-84 Agent-Assisted Agents with human oversight
50-69 Agent-Supervised Heavy review needed
30-49 Agent-Caution Foundational work first
0-29 Not Agent-Ready Significant investment required

Test Plan

  • cd to any existing project and invoke: "Run a codebase readiness assessment"
  • Verify Phase 1 reconnaissance runs and produces a readable Codebase Snapshot
  • Verify 4 agents launch in parallel and return scored dimension reports
  • Verify final report includes weighted score table, band rating, and improvement roadmap
  • Verify save-to-file prompt produces valid AGENT_READY_ASSESSMENT.md
  • Verify plugin appears in /plugin marketplace list

Watchman-based file watcher that keeps QMD indexes current automatically.
Splits update (on file change, debounced) from embed (scheduled every 30
min) to avoid concurrent local model runs spiking CPU.

- vault-sync.sh: qmd update via watchman trigger with lock file
- vault-embed.sh: qmd embed with pending check and lock file
- install.sh: platform-aware (Linux systemd timer, macOS launchd plist)
- setup-trigger.sh: idempotent watchman trigger registration
Adds a new plugin that scores codebases across 8 dimensions for AI
agent-readiness, framed around the Stripe benchmark of 1k+ AI-generated
PRs per week. Uses a multi-agent parallel assessment pattern — an
orchestrator skill gathers metadata then launches 4 specialized agents
concurrently, consolidating results into a weighted 0-100 score with
band rating and improvement roadmap.

Dimensions assessed: Test Foundation (20%), Documentation & Context
(15%), Code Clarity (15%), Type Safety (15%), Architecture Clarity
(15%), Consistency & Conventions (10%), Feedback Loops (5%), Change
Safety (5%).
Ruby and Python codebases were unfairly penalized for lacking a static
type system. In dynamic languages, comprehensive tests + contract systems
(dry-rb, Pydantic, Result patterns) serve the same role that type
checkers serve in TypeScript/Go.

Changes:
- architecture-assessor: language-aware Type Safety rubric with separate
  bands for JavaScript, TypeScript, Python, and Ruby.
  - Ruby scored on dry-rb adoption, ActiveRecord validations, service
    object interfaces, and Result pattern — not Sorbet presence.
  - Plain JavaScript projects are flagged as TypeScript migration
    candidates in recommendations (highest-ROI type safety investment).
- SKILL.md: adaptive weighting by language tier. For dynamic languages,
  Test Foundation increases from 20% to 25%, Type Safety drops from 15%
  to 10%. Total safety signal (35%) matches static language Type Safety (20%).
- SKILL.md: Phase 1 classifies LANGUAGE_TIER and passes it to all agents.
- SKILL.md: Phase 4 report includes language context block explaining
  Ruby/Python score interpretation.
- architecture-assessor: Change Safety rubric notes that Rails co-change
  patterns (model+migration, controller+view) are expected, not bad coupling.
Incorporates research from Jason Wei's Verifier's Law and Keles's
verifiability-as-limit thesis into the plugin's rubrics, scoring,
and report structure.

Flaky test detection (test-coverage-assessor):
- Detects git commits mentioning flakiness, skipped/disabled tests,
  and test retry plugins (rspec-retry, pytest-rerunfailures) as
  indicators of a noisy oracle
- Detects property-based testing presence (Hypothesis, fast-check, etc.)
- Detects mock/stub density as oracle quality signal

Oracle quality modifiers (Test Foundation):
- +5 for property-based testing
- −10 for test retry plugins (masking flakiness)
- −5 for >5% disabled tests
- −5 for high mock density (implementation testing)
- −10 for unit-only suite with no integration layer

Non-functional verification signals (Feedback Loops):
- Detects security scanning in CI (Brakeman, CodeQL, Bandit, bundle-audit)
- Detects Dependabot / vulnerability scanning configuration
- Detects coverage delta reporting (Codecov, Coveralls)
- +5/+3/+2 bonuses for each

Feedback Loops weight bumped from 5% → 10% (both language tiers).
Consistency & Conventions reduced from 10% → 5% to compensate.
Verification speed now framed as a structural prerequisite, not
a convenience — 45-min CI = ~10 agent iterations/day.

Verification Cost Profile added to Phase 4 report: a ✓/✗ table
answering "how expensive is it to verify an agent's change?" covering
pipeline speed, security scanning, property-based tests, reproducible
dev state, and coverage reporting.

Stripe Benchmark section rewritten to explain the verification
asymmetry mechanism, not just cite the number.

"What Agent-Ready Means" section rewritten using verification framing —
agents don't eliminate verification, they relocate it to automated systems.

Phase 6 added: recommends btar for CI enforcement after strategic
assessment, with install snippet and context generation command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant