feat: add /speckit-red-team — adversarial spec review before /speckit-plan by ashbrener · Pull Request #2303 · github/spec-kit

ashbrener · 2026-04-21T21:25:23Z

Summary

Adds a new core command, /speckit-red-team, that attacks a functional spec with 3–5 parallel adversarial lens agents before /speckit-plan locks in architecture.

It's complementary to the existing workflow:

Command	Role
`/speckit-clarify`	Correctness (is the spec internally correct?)
`/speckit-analyze`	Consistency (do spec + plan + tasks agree?)
`/speckit-red-team` (new)	Adversarial (what can a hostile actor, auditor, or silent-failure mode do against this spec?)

Why

Clarify and analyze are structurally incapable of surfacing certain classes of issue — prompt injection in untrusted LLM inputs, self-approval segregation-of-duties gaps, race conditions at configuration-change boundaries, cross-spec drift between cooperating halves of an interface contract, missing audit-chain integrity on "immutable" records. These failure modes look internally consistent and individually correct; only an adversarial lens surfaces them.

Without a red team step, these defects tend to land as production incidents instead of spec-phase fixes.

Design

Project-specific lens catalog at .specify/red-team-lenses.yml — each lens has a description, 3–5 core_questions (the attack brief), trigger_match (which categories it applies to), severity_weight (tie-breaker in selection), and finding_bound (top-N emitted).
Six default trigger categories (OR-combined): money_path, regulatory_path, ai_llm, immutability_audit, multi_party, contracts. A spec matching ≥1 category qualifies.
Lenses that match any trigger run in parallel via the host agent's sub-agent primitive (e.g., Claude Code's Agent tool), each producing top-N findings ranked by severity (CRITICAL > HIGH > MEDIUM > LOW).
Propose-and-confirm UX when >5 lenses match (ranked by overlap count primary + severity_weight tie-break). --yes auto-accepts; non-interactive without --yes fails fast.
Structured findings report at specs/<feature-id>/red-team-findings-YYYY-MM-DD[-NN].md: session metadata header + findings table + resolutions log + (for dogfood sessions) validation decision + machine-readable session YAML block.
Four resolution categories per finding: spec-fix / new-OQ / accepted-risk / out-of-scope. Skill does NOT auto-apply spec changes — every resolution requires maintainer authorisation.
Hard-and-fast rule: resolution edits MUST land in forward-facing canonical locations (04_Functional_Specs/, 03_Product_Requirements/, 02_System_Architecture/, .specify/memory/constitution.md, .specify/templates/). Historical SpecKit working records in specs/<feature-id>/ (spec.md, plan.md, tasks.md, research.md, data-model.md, contracts/, quickstart.md, checklists/) MUST NOT be rewritten during resolution — they are immutable point-in-time audit records.

Discussion-opener — note on CONTRIBUTING guidance

Per CONTRIBUTING.md, large changes that materially impact the CLI or workflow should be discussed and agreed upon before opening a PR. This PR is opened as a proposal / discussion starter, not a merge-ready change. Happy to iterate on:

Extension vs core-command placement. This PR adds a core command under templates/commands/. The alternative is shipping it as an Extension under extensions/ (since it's "domain-specific" in the sense of being adversarial-review rather than core spec-kit workflow). I'm open to either. Core-command placement is what I'd prefer because it parallels clarify and analyze directly.
Handler registration. The PR relies on auto-discovery in base.py (every .md in templates/commands/ becomes a registered command). No code changes needed. If a central registry exists I missed, I'll add the entry.
Schema references. The command body references a lens-catalog schema and a findings-report schema. I haven't committed those to this PR to keep the blast radius small; happy to add them as sibling files in a follow-up if the core approach lands.
Tests. Would add to tests/ once shape is agreed.
Prose style. My draft is more verbose than the existing clarify.md and analyze.md. Happy to tighten.

Dogfood validation

The protocol was dogfooded against a real functional spec in a private project:

5 adversary agents dispatched in parallel (Regulatory, AI/LLM, Immutability, Configuration-Drift, Documentation-Drift lenses).
25 findings surfaced in ~1.5 min wall-clock (vs 30-min target).
19 of 25 met the "meaningful finding" bar (severity ≥ HIGH + represents an adversarial scenario clarify/analyze structurally cannot catch).
Standout result: one Documentation-Drift finding caught a cross-spec identifier-type drift between two halves of the same interface contract that had been introduced by a separate commit landed 1 hour earlier. Clarify and analyze are single-spec scoped and structurally could not surface this. That meta-catch is the strongest real-world signal that the red team adds value over existing tooling.

Test plan

Decide: core command vs Extension placement
Verify auto-discovery installs the command into a fresh specify init project
Add schema references (lens catalog YAML, findings report markdown) as sibling skill files if desired
Add test coverage under tests/ once shape is agreed
Manual end-to-end run against a sample spec in a fresh project

🤖 Generated with Claude Code

…-plan Adds a new core command that attacks a functional spec with 3-5 parallel adversarial lens agents before `/speckit-plan` locks in architecture. The protocol is complementary to the existing `/speckit-clarify` (correctness) and `/speckit-analyze` (consistency) commands — it adds an adversarial layer that structurally catches issues those tools cannot. Design: - Project supplies a lens catalog at `.specify/red-team-lenses.yml` declaring available adversarial lenses (each with core_questions, trigger_match categories, severity_weight, finding_bound). - Six default trigger categories: money_path, regulatory_path, ai_llm, immutability_audit, multi_party, contracts. A spec matching ≥1 category qualifies for red team. - Lenses that match any of the spec's triggers run in parallel via the host agent's sub-agent primitive (e.g., Claude Code's Agent tool), each producing top-N findings ranked by severity. - When >5 lenses match, propose-and-confirm UX (ranked by overlap count primary + severity_weight tie-break); --yes accepts the proposed default; non-interactive without --yes fails fast. - Findings aggregate into a structured report at `specs/<feature-id>/red-team-findings-YYYY-MM-DD[-NN].md`. - Four resolution categories for each finding: spec-fix / new-OQ / accepted-risk / out-of-scope. - Hard-and-fast rule: resolution edits MUST land in forward-facing canonical locations (04_Functional_Specs, 03_Product_Requirements, 02_System_Architecture, etc.). Historical SpecKit working records in specs/<feature-id>/ (spec.md, plan.md, tasks.md, research.md, data-model.md, contracts/, quickstart.md, checklists/) MUST NOT be rewritten — they are immutable audit records of what was decided at their point in time. This is a PROPOSAL / discussion-opener per the CONTRIBUTING note on larger changes. Happy to iterate on: - Extension vs core-command placement (this PR assumes core-command; could instead ship as an Extension under extensions/). - Whether to also register in a handler (currently relies on templates/commands/ auto-discovery via base.py). - Schema references — the SKILL expects lens-catalog-schema.md and findings-report-schema.md to ship alongside; I can add those as sibling files in a follow-up commit if the core approach lands. - Test coverage — would add to tests/ in a follow-up once the shape is agreed. Dogfood validation: the protocol was dogfooded against a real functional spec in a private project (25 findings surfaced in ~1.5 min wall-clock, 19 of which met the 'meaningful finding' bar of severity ≥ HIGH AND represent an adversarial scenario clarify/analyze cannot structurally catch). One finding — a cross-spec UUID/ULID drift across two halves of the same interface contract — was a gap that a ULID alignment commit landed 1 hour earlier had missed. That meta-catch is the strongest real-world demonstration of the protocol's unique value over existing tooling.

Copilot

Pull request overview

Adds a new core command template, /speckit-red-team, intended to run an adversarial (“red team”) review of a functional spec before moving into planning, producing a findings report and a maintainer-driven resolution workflow.

Changes:

Introduces templates/commands/red-team.md defining the red-team workflow (invocation parsing, trigger matching, lens selection, parallel dispatch, aggregation, and resolution flow).
Adds extension hook handling for a new hooks.before_red_team event and a handoff into planning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Resolves 5 inline comments from the bot review on github#2303: 1. Removed the `scripts:` frontmatter block since the body never references `{SCRIPT}` — spec-kit's template processor would strip an unused block and the reference was misleading. Trigger matching uses direct constitution/catalog reads, not the prerequisites script's FEATURE_DIR output. 2. Rewrote the "schema references ship alongside" paragraph — the PR doesn't ship schema sidecar files; now explicitly flagged as TODO for a follow-up if the core approach lands. The minimal lens-catalog shape is inline in §2 preconditions (see github#5). 3. Converted all slash-command references to dot-notation (/speckit.clarify, /speckit.analyze, /speckit.plan, /speckit.red-team) to match the rest of templates/commands/. 4. Dropped the undocumented `--accept-defaults` alias — usage line and flag prose are now consistent on `--yes` alone. 5. Replaced the "See skill README §Adoption" error message with an inline minimal-required-YAML-shape block so users get actionable guidance from the error itself, without depending on a README that this template doesn't ship. No behavioural changes to the protocol; all fixes are template-hygiene and contract-clarity.

Copilot

Pull request overview

Adds a new core command template, red-team, to run an adversarial (“red team”) review of a functional spec prior to planning, producing a structured findings report and guiding maintainers through resolution without rewriting historical SpecKit working records.

Changes:

Introduces templates/commands/red-team.md with an end-to-end red-team workflow (trigger matching, lens selection, parallel dispatch, aggregation, and resolution flow).
Defines a project lens-catalog expectation at .specify/red-team-lenses.yml plus a report format written under specs/<feature-id>/.
Adds a strict “immutable historical working records” rule to prevent edits to specs/<feature-id>/* records during resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Resolves inline comments from review round 2: 1. Added explicit fenced Usage block under ## Outline; §1 now references it by name ("print the fenced Usage block from §Outline") instead of the ambiguous "print the usage block above" instruction. 2. Simplified the >5-lens flow: removed the "non-interactive detection" branch entirely. Interactivity is now determined solely by whether --yes was passed — CI / batch runs MUST pass --yes; otherwise the run prompts. Added a clarifying parenthetical and updated §8 row. 3. Aligned dry-run session-ID example with the canonical format. Both now use RT-<feature-id>-<YYYY-MM-DD>[-<NN>] with the -<NN> as optional. 4. Converted UK spelling to US to match the rest of templates/commands/: "artefact" -> "artifact", "behaviour" -> "behavior". 5. Picked a single threshold for overwhelming-findings: >=25 HIGH+CRITICAL combined. Dropped the inconsistent ">50 HIGH" example; the row now states one actionable heuristic. No behavior changes to the protocol; all fixes are template-hygiene and contract-clarity.

mnriem · 2026-04-21T22:54:05Z

Nice! Please deliver this as an extensions as per https://github.com/github/spec-kit/tree/main/extensions and host the extensions in your own GitHub repository so we can add it to the community catalog

ashbrener · 2026-04-22T09:36:20Z

Thanks for the guidance, @mnriem — restructured as a community extension as requested:

Extension repo: https://github.com/ashbrener/spec-kit-red-team
v1.0.0 release: https://github.com/ashbrener/spec-kit-red-team/releases/tag/v1.0.0
Catalog PR (supersedes this one): feat(catalog): add red-team extension to community catalog #2306

The extension command is speckit.red-team.run (previously proposed here as core speckit.red-team). The command body incorporates both rounds of Copilot feedback from this PR (dot-notation alignment, CLI contract hygiene, inline error-message shapes, US spelling, threshold consistency, simplified interactivity model).

Closing this PR in favour of #2306. Thanks for the fast turnaround on the review.

* feat(catalog): add red-team extension Adds the `red-team` community extension to the catalog: - Adversarial review of functional specs before /speckit.plan locks in architecture. - Complements /speckit.clarify (correctness) and /speckit.analyze (consistency) with parallel adversarial lens agents. - One command: speckit.red-team.run - MIT licensed; requires spec-kit >= 0.7.0. Origin: this extension was originally proposed as a core command (#2303). Per maintainer guidance (mnriem's comment on that PR), it's been restructured as a community extension hosted at https://github.com/ashbrener/spec-kit-red-team. Dogfood-validated on a 500-line functional spec: 5 lens agents dispatched in parallel returned 25 findings in ~1.5 min wall-clock, 19 of which met the meaningful-finding bar (severity >= HIGH AND novel adversarial angle that clarify/analyze structurally cannot catch). Full detail in the extension's CHANGELOG. * catalog: shorten red-team description to fit <200 char schema limit Resolves Copilot review comment on #2306. Previous description (259 chars) exceeded the extensions/EXTENSION-PUBLISHING-GUIDE.md Appendix schema ceiling. Shortened to 188 chars, keeping the distinctive value proposition (adversarial, complements clarify/analyze) and moving the per-phase mechanics to the extension's own README. * catalog: bump red-team to v1.0.1 (lower required spec-kit version) Follow-up to v1.0.0 catalog entry: - version: 1.0.0 -> 1.0.1 - download_url: points at v1.0.1 release asset - requires.speckit_version: >=0.7.0 -> >=0.1.0 The v1.0.0 requirement was too strict and blocked installation on common 0.6.x field versions (confirmed via local install attempt). The extension uses no 0.7.x-specific APIs; matches community norm (reconcile, refine, others use >=0.1.0). * catalog: bump red-team to v1.0.2 (adds mandatory before_plan gate) v1.0.2 ships a /speckit.red-team.gate command wired as a mandatory before_plan hook so /speckit.plan auto-invokes it on every run against qualifying specs. Non-qualifying specs return PROCEED silently; qualifying specs without findings on record return HALT with explicit remediation (run /speckit.red-team.run, or opt out via --skip-red-team-gate: <reason> which is recorded as an Accepted Risk [red-team-skipped] in the plan). Catalog metadata delta: - version: 1.0.1 -> 1.0.2 - download_url: v1.0.2/red-team-v1.0.2.zip - provides.commands: 1 -> 2 (adds speckit.red-team.gate) - provides.hooks: 0 -> 1 (adds before_plan hook) No breaking changes. Projects that do not want the gate simply do not install the extension. --------- Co-authored-by: Ash Brener <ashley@midletearth.com>

ashbrener requested a review from mnriem as a code owner April 21, 2026 21:25

Copilot AI review requested due to automatic review settings April 21, 2026 21:25

Copilot started reviewing on behalf of ashbrener April 21, 2026 21:25 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

ashbrener requested a review from Copilot April 21, 2026 21:56

Copilot started reviewing on behalf of ashbrener April 21, 2026 21:56 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

Comment thread templates/commands/red-team.md Outdated

ashbrener mentioned this pull request Apr 22, 2026

feat(catalog): add red-team extension to community catalog #2306

Merged

5 tasks

ashbrener closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add /speckit-red-team — adversarial spec review before /speckit-plan#2303

feat: add /speckit-red-team — adversarial spec review before /speckit-plan#2303
ashbrener wants to merge 3 commits intogithub:mainfrom
ashbrener:add-red-team-command

ashbrener commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mnriem commented Apr 21, 2026

Uh oh!

ashbrener commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ashbrener commented Apr 21, 2026

Summary

Why

Design

Discussion-opener — note on CONTRIBUTING guidance

Dogfood validation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mnriem commented Apr 21, 2026

Uh oh!

ashbrener commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants