feat: add /speckit-red-team — adversarial spec review before /speckit-plan#2303
feat: add /speckit-red-team — adversarial spec review before /speckit-plan#2303ashbrener wants to merge 3 commits intogithub:mainfrom
Conversation
…-plan Adds a new core command that attacks a functional spec with 3-5 parallel adversarial lens agents before `/speckit-plan` locks in architecture. The protocol is complementary to the existing `/speckit-clarify` (correctness) and `/speckit-analyze` (consistency) commands — it adds an adversarial layer that structurally catches issues those tools cannot. Design: - Project supplies a lens catalog at `.specify/red-team-lenses.yml` declaring available adversarial lenses (each with core_questions, trigger_match categories, severity_weight, finding_bound). - Six default trigger categories: money_path, regulatory_path, ai_llm, immutability_audit, multi_party, contracts. A spec matching ≥1 category qualifies for red team. - Lenses that match any of the spec's triggers run in parallel via the host agent's sub-agent primitive (e.g., Claude Code's Agent tool), each producing top-N findings ranked by severity. - When >5 lenses match, propose-and-confirm UX (ranked by overlap count primary + severity_weight tie-break); --yes accepts the proposed default; non-interactive without --yes fails fast. - Findings aggregate into a structured report at `specs/<feature-id>/red-team-findings-YYYY-MM-DD[-NN].md`. - Four resolution categories for each finding: spec-fix / new-OQ / accepted-risk / out-of-scope. - Hard-and-fast rule: resolution edits MUST land in forward-facing canonical locations (04_Functional_Specs, 03_Product_Requirements, 02_System_Architecture, etc.). Historical SpecKit working records in specs/<feature-id>/ (spec.md, plan.md, tasks.md, research.md, data-model.md, contracts/, quickstart.md, checklists/) MUST NOT be rewritten — they are immutable audit records of what was decided at their point in time. This is a PROPOSAL / discussion-opener per the CONTRIBUTING note on larger changes. Happy to iterate on: - Extension vs core-command placement (this PR assumes core-command; could instead ship as an Extension under extensions/). - Whether to also register in a handler (currently relies on templates/commands/ auto-discovery via base.py). - Schema references — the SKILL expects lens-catalog-schema.md and findings-report-schema.md to ship alongside; I can add those as sibling files in a follow-up commit if the core approach lands. - Test coverage — would add to tests/ in a follow-up once the shape is agreed. Dogfood validation: the protocol was dogfooded against a real functional spec in a private project (25 findings surfaced in ~1.5 min wall-clock, 19 of which met the 'meaningful finding' bar of severity ≥ HIGH AND represent an adversarial scenario clarify/analyze cannot structurally catch). One finding — a cross-spec UUID/ULID drift across two halves of the same interface contract — was a gap that a ULID alignment commit landed 1 hour earlier had missed. That meta-catch is the strongest real-world demonstration of the protocol's unique value over existing tooling.
There was a problem hiding this comment.
Pull request overview
Adds a new core command template, /speckit-red-team, intended to run an adversarial (“red team”) review of a functional spec before moving into planning, producing a findings report and a maintainer-driven resolution workflow.
Changes:
- Introduces
templates/commands/red-team.mddefining the red-team workflow (invocation parsing, trigger matching, lens selection, parallel dispatch, aggregation, and resolution flow). - Adds extension hook handling for a new
hooks.before_red_teamevent and a handoff into planning.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolves 5 inline comments from the bot review on github#2303: 1. Removed the `scripts:` frontmatter block since the body never references `{SCRIPT}` — spec-kit's template processor would strip an unused block and the reference was misleading. Trigger matching uses direct constitution/catalog reads, not the prerequisites script's FEATURE_DIR output. 2. Rewrote the "schema references ship alongside" paragraph — the PR doesn't ship schema sidecar files; now explicitly flagged as TODO for a follow-up if the core approach lands. The minimal lens-catalog shape is inline in §2 preconditions (see github#5). 3. Converted all slash-command references to dot-notation (/speckit.clarify, /speckit.analyze, /speckit.plan, /speckit.red-team) to match the rest of templates/commands/. 4. Dropped the undocumented `--accept-defaults` alias — usage line and flag prose are now consistent on `--yes` alone. 5. Replaced the "See skill README §Adoption" error message with an inline minimal-required-YAML-shape block so users get actionable guidance from the error itself, without depending on a README that this template doesn't ship. No behavioural changes to the protocol; all fixes are template-hygiene and contract-clarity.
There was a problem hiding this comment.
Pull request overview
Adds a new core command template, red-team, to run an adversarial (“red team”) review of a functional spec prior to planning, producing a structured findings report and guiding maintainers through resolution without rewriting historical SpecKit working records.
Changes:
- Introduces
templates/commands/red-team.mdwith an end-to-end red-team workflow (trigger matching, lens selection, parallel dispatch, aggregation, and resolution flow). - Defines a project lens-catalog expectation at
.specify/red-team-lenses.ymlplus a report format written underspecs/<feature-id>/. - Adds a strict “immutable historical working records” rule to prevent edits to
specs/<feature-id>/*records during resolution.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolves inline comments from review round 2:
1. Added explicit fenced Usage block under ## Outline; §1 now
references it by name ("print the fenced Usage block from §Outline")
instead of the ambiguous "print the usage block above" instruction.
2. Simplified the >5-lens flow: removed the "non-interactive detection"
branch entirely. Interactivity is now determined solely by whether
--yes was passed — CI / batch runs MUST pass --yes; otherwise the
run prompts. Added a clarifying parenthetical and updated §8 row.
3. Aligned dry-run session-ID example with the canonical format.
Both now use RT-<feature-id>-<YYYY-MM-DD>[-<NN>] with the -<NN>
as optional.
4. Converted UK spelling to US to match the rest of templates/commands/:
"artefact" -> "artifact", "behaviour" -> "behavior".
5. Picked a single threshold for overwhelming-findings: >=25
HIGH+CRITICAL combined. Dropped the inconsistent ">50 HIGH"
example; the row now states one actionable heuristic.
No behavior changes to the protocol; all fixes are template-hygiene
and contract-clarity.
|
Nice! Please deliver this as an extensions as per https://github.com/github/spec-kit/tree/main/extensions and host the extensions in your own GitHub repository so we can add it to the community catalog |
|
Thanks for the guidance, @mnriem — restructured as a community extension as requested:
The extension command is Closing this PR in favour of #2306. Thanks for the fast turnaround on the review. |
* feat(catalog): add red-team extension Adds the `red-team` community extension to the catalog: - Adversarial review of functional specs before /speckit.plan locks in architecture. - Complements /speckit.clarify (correctness) and /speckit.analyze (consistency) with parallel adversarial lens agents. - One command: speckit.red-team.run - MIT licensed; requires spec-kit >= 0.7.0. Origin: this extension was originally proposed as a core command (#2303). Per maintainer guidance (mnriem's comment on that PR), it's been restructured as a community extension hosted at https://github.com/ashbrener/spec-kit-red-team. Dogfood-validated on a 500-line functional spec: 5 lens agents dispatched in parallel returned 25 findings in ~1.5 min wall-clock, 19 of which met the meaningful-finding bar (severity >= HIGH AND novel adversarial angle that clarify/analyze structurally cannot catch). Full detail in the extension's CHANGELOG. * catalog: shorten red-team description to fit <200 char schema limit Resolves Copilot review comment on #2306. Previous description (259 chars) exceeded the extensions/EXTENSION-PUBLISHING-GUIDE.md Appendix schema ceiling. Shortened to 188 chars, keeping the distinctive value proposition (adversarial, complements clarify/analyze) and moving the per-phase mechanics to the extension's own README. * catalog: bump red-team to v1.0.1 (lower required spec-kit version) Follow-up to v1.0.0 catalog entry: - version: 1.0.0 -> 1.0.1 - download_url: points at v1.0.1 release asset - requires.speckit_version: >=0.7.0 -> >=0.1.0 The v1.0.0 requirement was too strict and blocked installation on common 0.6.x field versions (confirmed via local install attempt). The extension uses no 0.7.x-specific APIs; matches community norm (reconcile, refine, others use >=0.1.0). * catalog: bump red-team to v1.0.2 (adds mandatory before_plan gate) v1.0.2 ships a /speckit.red-team.gate command wired as a mandatory before_plan hook so /speckit.plan auto-invokes it on every run against qualifying specs. Non-qualifying specs return PROCEED silently; qualifying specs without findings on record return HALT with explicit remediation (run /speckit.red-team.run, or opt out via --skip-red-team-gate: <reason> which is recorded as an Accepted Risk [red-team-skipped] in the plan). Catalog metadata delta: - version: 1.0.1 -> 1.0.2 - download_url: v1.0.2/red-team-v1.0.2.zip - provides.commands: 1 -> 2 (adds speckit.red-team.gate) - provides.hooks: 0 -> 1 (adds before_plan hook) No breaking changes. Projects that do not want the gate simply do not install the extension. --------- Co-authored-by: Ash Brener <ashley@midletearth.com>
Summary
Adds a new core command,
/speckit-red-team, that attacks a functional spec with 3–5 parallel adversarial lens agents before/speckit-planlocks in architecture.It's complementary to the existing workflow:
/speckit-clarify/speckit-analyze/speckit-red-team(new)Why
Clarify and analyze are structurally incapable of surfacing certain classes of issue — prompt injection in untrusted LLM inputs, self-approval segregation-of-duties gaps, race conditions at configuration-change boundaries, cross-spec drift between cooperating halves of an interface contract, missing audit-chain integrity on "immutable" records. These failure modes look internally consistent and individually correct; only an adversarial lens surfaces them.
Without a red team step, these defects tend to land as production incidents instead of spec-phase fixes.
Design
.specify/red-team-lenses.yml— each lens has a description, 3–5core_questions(the attack brief),trigger_match(which categories it applies to),severity_weight(tie-breaker in selection), andfinding_bound(top-N emitted).money_path,regulatory_path,ai_llm,immutability_audit,multi_party,contracts. A spec matching ≥1 category qualifies.--yesauto-accepts; non-interactive without--yesfails fast.specs/<feature-id>/red-team-findings-YYYY-MM-DD[-NN].md: session metadata header + findings table + resolutions log + (for dogfood sessions) validation decision + machine-readable session YAML block.spec-fix/new-OQ/accepted-risk/out-of-scope. Skill does NOT auto-apply spec changes — every resolution requires maintainer authorisation.04_Functional_Specs/,03_Product_Requirements/,02_System_Architecture/,.specify/memory/constitution.md,.specify/templates/). Historical SpecKit working records inspecs/<feature-id>/(spec.md, plan.md, tasks.md, research.md, data-model.md, contracts/, quickstart.md, checklists/) MUST NOT be rewritten during resolution — they are immutable point-in-time audit records.Discussion-opener — note on CONTRIBUTING guidance
Per CONTRIBUTING.md, large changes that materially impact the CLI or workflow should be discussed and agreed upon before opening a PR. This PR is opened as a proposal / discussion starter, not a merge-ready change. Happy to iterate on:
templates/commands/. The alternative is shipping it as an Extension underextensions/(since it's "domain-specific" in the sense of being adversarial-review rather than core spec-kit workflow). I'm open to either. Core-command placement is what I'd prefer because it parallelsclarifyandanalyzedirectly.base.py(every.mdintemplates/commands/becomes a registered command). No code changes needed. If a central registry exists I missed, I'll add the entry.tests/once shape is agreed.clarify.mdandanalyze.md. Happy to tighten.Dogfood validation
The protocol was dogfooded against a real functional spec in a private project:
Test plan
specify initprojecttests/once shape is agreed🤖 Generated with Claude Code