From cd20685ecde428b162e5529939028f885c525429 Mon Sep 17 00:00:00 2001 From: Ash Brener Date: Tue, 21 Apr 2026 23:24:41 +0200 Subject: [PATCH 1/3] =?UTF-8?q?feat:=20add=20/speckit-red-team=20=E2=80=94?= =?UTF-8?q?=20adversarial=20spec=20review=20before=20/speckit-plan?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a new core command that attacks a functional spec with 3-5 parallel adversarial lens agents before `/speckit-plan` locks in architecture. The protocol is complementary to the existing `/speckit-clarify` (correctness) and `/speckit-analyze` (consistency) commands — it adds an adversarial layer that structurally catches issues those tools cannot. Design: - Project supplies a lens catalog at `.specify/red-team-lenses.yml` declaring available adversarial lenses (each with core_questions, trigger_match categories, severity_weight, finding_bound). - Six default trigger categories: money_path, regulatory_path, ai_llm, immutability_audit, multi_party, contracts. A spec matching ≥1 category qualifies for red team. - Lenses that match any of the spec's triggers run in parallel via the host agent's sub-agent primitive (e.g., Claude Code's Agent tool), each producing top-N findings ranked by severity. - When >5 lenses match, propose-and-confirm UX (ranked by overlap count primary + severity_weight tie-break); --yes accepts the proposed default; non-interactive without --yes fails fast. - Findings aggregate into a structured report at `specs//red-team-findings-YYYY-MM-DD[-NN].md`. - Four resolution categories for each finding: spec-fix / new-OQ / accepted-risk / out-of-scope. - Hard-and-fast rule: resolution edits MUST land in forward-facing canonical locations (04_Functional_Specs, 03_Product_Requirements, 02_System_Architecture, etc.). Historical SpecKit working records in specs// (spec.md, plan.md, tasks.md, research.md, data-model.md, contracts/, quickstart.md, checklists/) MUST NOT be rewritten — they are immutable audit records of what was decided at their point in time. This is a PROPOSAL / discussion-opener per the CONTRIBUTING note on larger changes. Happy to iterate on: - Extension vs core-command placement (this PR assumes core-command; could instead ship as an Extension under extensions/). - Whether to also register in a handler (currently relies on templates/commands/ auto-discovery via base.py). - Schema references — the SKILL expects lens-catalog-schema.md and findings-report-schema.md to ship alongside; I can add those as sibling files in a follow-up commit if the core approach lands. - Test coverage — would add to tests/ in a follow-up once the shape is agreed. Dogfood validation: the protocol was dogfooded against a real functional spec in a private project (25 findings surfaced in ~1.5 min wall-clock, 19 of which met the 'meaningful finding' bar of severity ≥ HIGH AND represent an adversarial scenario clarify/analyze cannot structurally catch). One finding — a cross-spec UUID/ULID drift across two halves of the same interface contract — was a gap that a ULID alignment commit landed 1 hour earlier had missed. That meta-catch is the strongest real-world demonstration of the protocol's unique value over existing tooling. --- templates/commands/red-team.md | 255 +++++++++++++++++++++++++++++++++ 1 file changed, 255 insertions(+) create mode 100644 templates/commands/red-team.md diff --git a/templates/commands/red-team.md b/templates/commands/red-team.md new file mode 100644 index 0000000000..d2621e1442 --- /dev/null +++ b/templates/commands/red-team.md @@ -0,0 +1,255 @@ +--- +description: Attack a functional specification with parallel adversarial lens agents before `/speckit-plan` locks in architecture. Aggregates findings and walks the maintainer through resolution into one of four categories. +handoffs: + - label: Proceed to planning + agent: speckit.plan + prompt: "Create a plan for the spec, which has now been red-team-reviewed and hardened." +scripts: + sh: scripts/bash/check-prerequisites.sh --json --paths-only + ps: scripts/powershell/check-prerequisites.ps1 -Json -PathsOnly +--- + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Pre-Execution Checks + +**Check for extension hooks (before red team)**: +- Check if `.specify/extensions.yml` exists in the project root. +- If it exists, read it and look for entries under the `hooks.before_red_team` key +- If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally +- Filter out hooks where `enabled` is explicitly `false`. Treat hooks without an `enabled` field as enabled by default. +- For each remaining hook, do **not** attempt to interpret or evaluate hook `condition` expressions: + - If the hook has no `condition` field, or it is null/empty, treat the hook as executable + - If the hook defines a non-empty `condition`, skip the hook and leave condition evaluation to the HookExecutor implementation +- For each executable hook, output the following based on its `optional` flag: + - **Optional hook** (`optional: true`): + ``` + ## Extension Hooks + + **Optional Pre-Hook**: {extension} + Command: `/{command}` + Description: {description} + + Prompt: {prompt} + To execute: `/{command}` + ``` + - **Mandatory hook** (`optional: false`): + ``` + ## Extension Hooks + + **Automatic Pre-Hook**: {extension} + Executing: `/{command}` + EXECUTE_COMMAND: {command} + + Wait for the result of the hook command before proceeding to the Outline. + ``` +- If no hooks are registered or `.specify/extensions.yml` does not exist, skip silently + +## Outline + +Goal: Run an adversarial review of a functional spec using project-configured lenses. Produce a structured findings report at `specs//red-team-findings-[-NN].md`; walk the maintainer through per-finding resolution. Do NOT auto-apply spec changes. Complementary to `/speckit-clarify` (correctness) and `/speckit-analyze` (consistency) — the red team is the adversarial layer. + +Schema references for input (lens catalog) and output (findings report) ship alongside this command in the skill directory and are authoritative for format. + +Invocation: `/speckit-red-team [--yes] [--lenses name1,name2,...] [--dry-run] [--session-suffix NN]` + +## 1. Invocation parsing + +Parse `$ARGUMENTS` into: + +- `` (positional, required): path relative to repo root of the functional spec to attack. Examples: `specs//spec.md` (SpecKit working record), or a project-specific canonical location for graduated specs (e.g. `04_Functional_Specs/_Functional_Spec_v0.1_DRAFT.md` in repos that use a graduated docs tree). Both are valid inputs. +- `--yes` / `--accept-defaults` (flag): auto-confirm the proposed lens selection when >5 lenses match triggers. Required for non-interactive / CI invocations. +- `--lenses ` (flag with value): explicit override of the lens set. Skips trigger-matching; runs exactly the listed lenses. +- `--dry-run` (flag): report which lenses would run and why, without dispatching adversary agents. +- `--session-suffix ` (flag with value): override the session ID's trailing ordinal when multiple sessions occur on the same day. + +If `$ARGUMENTS` is empty OR the target spec path is missing, print the usage block above and STOP. Do NOT try to infer the target from context. + +## 2. Preconditions check + +Before dispatching any adversary, verify in order. Fail fast on the first failure. + +1. **Target spec exists**. Resolve the given path relative to repo root. If not found, print: `ERROR: target spec not found at ` and STOP. +2. **Lens catalog exists** at `/.specify/red-team-lenses.yml`. If not found, print: `ERROR: no lens catalog at .specify/red-team-lenses.yml. See the skill README §Adoption for setup.` and STOP. +3. **Lens catalog parses**. Read the YAML. If parse fails, print: `ERROR: .specify/red-team-lenses.yml failed to parse: ` and STOP. +4. **Catalog non-empty**. If top-level `lenses` list is missing or empty, print: `ERROR: lens catalog has no lenses defined` and STOP. +5. **Each lens entry has required fields**: `name`, `description`, `core_questions`, `trigger_match`. Entries missing any of these are skipped with a warning — session proceeds on the remainder. If ALL entries are malformed, fail the catalog check. +6. **Project declares trigger criteria** (soft check — warn, don't fail). Read `/.specify/memory/constitution.md`. Search for a `## Red Team Trigger Criteria` section (or equivalent). If absent, print: `WARNING: constitution does not yet declare red team trigger criteria (expected at ## Red Team Trigger Criteria). Proceeding in bootstrap mode using the six default categories enumerated in the lens catalog schema.` and continue. The six default categories (money_path, regulatory_path, ai_llm, immutability_audit, multi_party, contracts) are used for §3 trigger matching. If `--lenses` was passed, trigger matching is bypassed entirely and this check is a no-op. + +## 3. Trigger matching + +Skip this section entirely if `--lenses` was passed (jump to §4 using the explicit list). + +Otherwise: + +1. **Read the target spec** content (full file). +2. **Scan for trigger evidence** against the six trigger categories: + - `money_path` — keywords/patterns: fee, amount, $, currency, rate, allocation, commitment size, AUM, price, cost, transfer. + - `regulatory_path` — keywords/patterns: KYC, AML, compliance, regulator, audit, GDPR, SEC, SFC, FCA, jurisdiction, kill filter, fee structure regulatory. + - `ai_llm` — keywords/patterns: LLM, Claude, GPT, prompt, scoring, classification (when LLM-based), summary generation, hallucination. + - `immutability_audit` — keywords/patterns: immutable, audit trail, permanent, never deleted, append-only, version preserved. + - `multi_party` — keywords/patterns: partner, IC, approval, analyst, maintainer, role, authority, sign-off, gate. + - `contracts` — keywords/patterns: upstream, downstream, API, interface, input from, output to, handoff, integration, document pipeline. +3. **Judgement call**: keyword presence is a heuristic. The final decision is the agent's — if the spec genuinely touches the concern described in the category, include it. If the keyword is incidental (e.g., "audit" in a non-audit sentence), exclude it. +4. **Emit matched-trigger list**. If zero triggers match AND `--lenses` was not passed, print: `INFO: target spec matches no trigger categories — no red team required. Pass --lenses to run voluntarily.` and STOP (not an error — this is the opt-in voluntary path working correctly). + +## 4. Lens selection (propose-and-confirm) + +Given the matched-triggers list (from §3) or explicit `--lenses` list (from §1): + +### If explicit `--lenses` was passed + +Resolve each name against the catalog. Unknown names produce a warning and are dropped. If all are dropped, fail with: `ERROR: none of the specified lenses exist in the catalog`. Otherwise proceed to §5 with the resolved list as `selected_lenses`. + +### If trigger-matched + +1. **Filter the catalog** to lenses where `trigger_match` intersects the matched-triggers list. Call this `matched_lenses`. +2. **If `len(matched_lenses) == 0`**: No lens in the catalog covers the matched triggers. Print: `ERROR: lens catalog has no lens covering the matched triggers . Extend the catalog or pass --lenses explicitly.` and STOP. +3. **If `len(matched_lenses) <= 5`**: Use all of them as `selected_lenses` with `selection_method: auto`. Skip to §5. +4. **If `len(matched_lenses) > 5`**: Enter the propose-and-confirm flow: + - **Rank** by: primary — count of overlapping trigger-matches with the spec's triggers (higher = preferred); tie-breaker — `severity_weight` from the catalog (higher = preferred); final tie-breaker — alphabetical by name. + - **Propose the top 5** as the default selection. + - **Show the maintainer**: + - The matched-triggers list. + - The proposed top-5 default with, for each, a one-line rationale (which triggers it covers + severity_weight). + - The dropped lenses with the reason they ranked below. + - **If `--yes` was passed**: auto-accept the proposed default. Set `selection_method: auto` in the session record with a note that --yes was used. Skip to §5. + - **If `--yes` was NOT passed AND the invocation is non-interactive**: FAIL FAST. Print: `ERROR: more than 5 lenses match; interactive confirmation required. Re-run with --yes to accept the proposed default, or --lenses to specify explicitly.` and STOP. + - **If interactive**: ask the maintainer to respond: + - "accept" / "yes" → use proposed default; `selection_method: proposed-and-confirmed`. + - "swap A for B" → swap a default lens with a dropped lens; `selection_method: swapped`. + - "expand to N" (N > 5) → run more than 5 lenses (maintainer opts into the cost); `selection_method: expanded`. + - Anything else → re-prompt with the three options above. + +Write the final `selected_lenses` list. Validate 3 ≤ `len(selected_lenses)` ≤ 5 (unless `selection_method == expanded`). If below 3, warn the maintainer that lens diversity is weak — offer to abort. + +## 5. Parallel adversary dispatch + +### If `--dry-run` was passed + +Print: +``` +DRY RUN — no agents dispatched. +Target: +Matched triggers: +Selected lenses: +Selection method: +Proposed session ID: RT--- +``` +and STOP. + +### Otherwise + +1. **Compute session ID**: `RT--[-]` where `` is derived from the target path (e.g., `` from `specs//spec.md`, or the containing feature when attacking a graduated spec — best-effort match; if ambiguous, derive from the filename). +2. **Build adversary-agent prompts**. For each selected lens, construct a prompt with: + - The lens's `description`. + - The lens's `core_questions` as the attack brief. + - The target spec file path (the agent reads it directly). + - Supporting context paths (if the target is a graduated spec, automatically include its SpecKit working directory — `specs//plan.md`, `tasks.md`, `contracts/` — if present). + - Instruction: return ≤`finding_bound` findings ranked by severity (CRITICAL > HIGH > MEDIUM > LOW), each with: location in the spec (section or FR ref), 1-4 sentence finding description, 1-2 sentence suggested resolution. + - Output format: strict JSON or fenced-code markdown table so aggregation is deterministic. +3. **Dispatch all adversary agents in a single parallel batch** using the host agent's sub-agent / task-dispatch primitive (e.g., Claude Code's Agent tool). All calls go in the same tool-use message so they run concurrently. +4. **Record per-lens start/end times** for wall-clock tracking. Project-level success criteria SHOULD target under 30 minutes for a mid-sized functional spec (roughly 500 lines, 4–6 user stories, 20–30 FRs). Larger specs warrant a proportionally larger budget. + +## 6. Findings aggregation + +Collect the responses from all dispatched agents. + +1. **Parse each response** into structured findings per the findings-report schema: + - `id`: assigned here — format `F--` zero-padded ordinal, monotonic across the whole session. + - `lens`: name of originating lens. + - `severity`: one of CRITICAL / HIGH / MEDIUM / LOW. + - `location`: section or FR reference in the target spec. + - `description`: the finding. + - `suggested_resolution`: adversary's proposed fix. + - `status`: blank (filled in §7). +2. **Enforce per-lens finding bound**. For each lens, retain only the top `finding_bound` findings by severity (default 5 per catalog). Drop the rest. Record dropped count in session metadata. +3. **Detect and handle lens failures**. If an agent returned no findings, returned an error, or the response could not be parsed: + - Record the failure in session metadata with the lens name and reason. + - Continue with other lenses — do NOT abort the session. + - Flag the failed lens as a candidate for `--lenses ` re-run after refinement. +4. **Build the aggregated findings table** in markdown. Group by lens; within each lens sub-order by severity descending. +5. **Write the initial report file** at `specs//red-team-findings-[-NN].md` with: + - Header block (session ID, target, date, maintainer, lenses, selection method, supporting context, wall-clock) + - §1 Session Summary: placeholder for maintainer to fill post-review. + - §2 Findings table: fully populated. + - §3 Resolutions Log: empty stubs per finding ID. + - §4 Validation Decision: include ONLY if this is a designated dogfood session (a first-run validation of the protocol against a real project spec; the target spec for a given project is declared in that project's constitution or extension-adoption docs). + - §5 Session metadata YAML block per the session-record schema. +6. **Announce completion**: print summary (count by lens, count by severity, path to report) and transition to §7. + +## 7. Resolution flow + +### ⚠️ HARD-AND-FAST RULE — Historical SpecKit records are NEVER modified + +**Before editing any file during resolution, classify it using this decision procedure. No exceptions.** + +| Path pattern | Category | Editable during resolution? | +|---|---|---| +| `04_Functional_Specs/*` (or project-equivalent graduated docs tree) | **Forward-facing canonical spec** | ✅ Yes — this is where spec-fix edits land | +| `03_Product_Requirements/PRD_*` | **Forward-facing canonical spec** | ✅ Yes | +| `02_System_Architecture/*` | **Forward-facing canonical spec** | ✅ Yes | +| `01_Business_Overview/*` | **Forward-facing canonical spec** | ✅ Yes | +| `.specify/memory/constitution.md` | **Forward-facing governance** | ✅ Yes (via the constitution-amendment pathway; not a normal spec-fix) | +| `.specify/templates/*` | **Forward-facing tooling config** | ✅ Yes | +| `specs//spec.md` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//plan.md` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//tasks.md` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//research.md` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//data-model.md` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//contracts/*` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//quickstart.md` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//checklists/*` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | +| `specs//red-team-findings-*.md` | **Session artefact** (created by THIS skill) | ✅ Yes — this skill owns it | +| `99_Archive/*` | **Archived historical** | ❌ **NO — never edit** | + +**Rationale**: SpecKit working records in `specs//` capture a point-in-time decision state. They serve as the audit trail of "what was decided at time T." Rewriting them destroys that audit trail. If the correct fix for a red team finding would require editing one of these files, the correct resolution category is: + +- **spec-fix** — land the fix in the **forward-facing canonical** location instead (graduated functional-specs tree, product-requirements tree, system-architecture tree), OR +- **out-of-scope** — cross-reference a different forward-facing spec that owns the concern, OR +- **accepted-risk** — record the risk on the forward-facing spec's `## Accepted Risks` section and move on. + +**Before applying ANY edit, the skill MUST check the target path against this table and refuse (with a clear error message pointing the maintainer at this section) if the target is a historical SpecKit working record.** If the maintainer insists the edit is necessary, the skill redirects them: "The fix belongs in the forward-facing equivalent. Which graduated canonical doc should land this change?" + +### Finding-by-finding walk + +Walk the maintainer through each finding. For each finding in the table (group by lens, severity descending): + +1. **Display the finding**: ID, lens, severity, location, description, suggested resolution. +2. **Ask for resolution category**: one of — + - **spec-fix** — edit the **forward-facing canonical** spec. Before proposing any edit, apply the Editable-File classification above. If the suggested resolution references a historical SpecKit working record, redirect the edit to the corresponding forward-facing location and announce this redirect to the maintainer. Prompt the maintainer for a specific diff or description. Record a `downstream_ref` (commit SHA after the maintainer applies + commits, OR a line-ranged description if deferred). The `downstream_ref` MUST reference a forward-facing path. + - **new-OQ** — route to the target spec's `## Open Questions` section. The "target spec" here is the forward-facing canonical doc, not the SpecKit working record. Prompt for owner name + what it blocks. Compose the OQ entry and offer to insert it; maintainer confirms. Record assigned OQ ID (format `OQ--`) as `downstream_ref`. + - **accepted-risk** — route to the target spec's `## Accepted Risks` section on the forward-facing canonical doc. Prompt for rationale; auto-detect `[regulatory-review]` tag candidates by scanning the finding description for regulated-domain signals (money path, regulatory path, compliance-critical guarantees, disclosure surfaces — as defined by the project's constitution). Offer to insert the AR entry with the next sequential `AR-NNN`. Record AR ID as `downstream_ref`. If tagged, add a reminder about the project's governance-confirmation requirement. + - **out-of-scope** — finding belongs to a different spec, or the fix belongs in a forward-facing doc that does not yet exist (in which case create a placeholder `specs//README.md` and cross-reference it). Prompt for the owning spec path. Record the cross-reference in the resolution log only; do not edit the other spec. +3. **`/speckit-analyze` cross-reference rule**: for each finding, detect overlap with `/speckit-analyze` output if a recent analyze report is present in the feature directory. If overlap is detected, ASK the maintainer: "This finding appears to overlap with analyze finding . Cross-reference instead of duplicating? (yes/no)". If yes, record as `resolution.downstream_ref: analyze:` and SKIP creating a new OQ/AR entry — the analyze finding is authoritative. +4. **MUST NOT auto-apply spec changes**. For spec-fix resolutions, the skill may PROPOSE an edit diff but the maintainer applies it themselves. For AR/OQ insertions, the skill may draft the entry text but the maintainer confirms before the Edit tool actually writes. +5. **MUST NOT edit historical SpecKit working records** (the hard-and-fast rule above). If the maintainer directs the skill to edit such a file, the skill refuses and proposes a forward-facing alternative. If the finding genuinely cannot be resolved at a forward-facing location, the correct category is **accepted-risk** (recording the historical-record gap as an accepted risk) or **out-of-scope** (pointing at a future feature that will do the work properly). +6. **Update the findings table** (`status` column) and append the resolution block to §3 Resolutions Log as each finding is categorised. +7. **After all findings resolved**: update §5 session metadata YAML with final counts per resolution category and `unresolved: 0`. If the maintainer chose to defer some findings, record `unresolved: N` with a note. +8. **For dogfood sessions only**: after all findings resolved, prompt the maintainer to write §4 Validation Decision. Outcome: `proceed-to-codification` / `refine-and-retry` / `abandon`. For `proceed-to-codification`, enforce: at least one meaningful finding (severity ≥ HIGH AND novel adversarial angle) with explicit reason existing tooling couldn't have caught it. + +## 8. Failure-mode handling + +| Condition | Behaviour | +|---|---| +| Target spec missing | Fail fast with `ERROR: target spec not found at `. No session created. | +| Lens catalog missing | Fail fast with `ERROR: no lens catalog at .specify/red-team-lenses.yml. See skill README §Adoption.` | +| Catalog unparseable | Fail fast with `ERROR: .specify/red-team-lenses.yml failed to parse: `. | +| Catalog empty (no `lenses` list) | Fail fast with `ERROR: lens catalog has no lenses defined`. | +| Individual lens entry malformed | Warn, skip that lens, proceed with the rest. If ALL entries malformed, fail. | +| Constitution lacks trigger criteria | Warn and proceed in bootstrap mode using the six default categories. UNLESS `--lenses` was passed (bypass). | +| Target spec matches zero triggers AND no `--lenses` | Print info message and STOP. Not an error. | +| No lens in catalog covers matched triggers | Fail fast — asks maintainer to extend catalog or pass --lenses. | +| >5 matches in non-interactive without `--yes` | Fail fast. Maintainer must re-run with `--yes` or `--lenses`. | +| Individual adversary agent fails (timeout, parse error, empty response) | Record failure in session metadata with lens name + reason. Continue with other lenses. Flag for re-run via `--lenses`. Do NOT abort the session. | +| Overwhelming findings (e.g., >50 HIGH after aggregation) | After §6 completes, if total findings exceed a threshold (default: ≥25 HIGH+CRITICAL), warn the maintainer the spec may not be ready for red team and offer an abort path. Abort records session state for later resumption. | +| Spec was updated since prior red team | On session start, check for prior findings report in the feature dir. If present and older than a material-change threshold (heuristic: target spec has new FRs or section count changed), warn the maintainer and ask whether to proceed or abort. | +| Session interrupted mid-resolution | Report file is saved atomically after every resolution update. On re-invocation with same session ID, offer to resume from last-resolved finding. | + +All fail-fast conditions MUST produce actionable error messages — naming the file, the expected location, and pointing at the README or architecture doc for recovery. From 39887def8e92de063bbac4099b2537e83a0ee358 Mon Sep 17 00:00:00 2001 From: Ash Brener Date: Tue, 21 Apr 2026 23:35:40 +0200 Subject: [PATCH 2/3] red-team: address Copilot review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves 5 inline comments from the bot review on #2303: 1. Removed the `scripts:` frontmatter block since the body never references `{SCRIPT}` — spec-kit's template processor would strip an unused block and the reference was misleading. Trigger matching uses direct constitution/catalog reads, not the prerequisites script's FEATURE_DIR output. 2. Rewrote the "schema references ship alongside" paragraph — the PR doesn't ship schema sidecar files; now explicitly flagged as TODO for a follow-up if the core approach lands. The minimal lens-catalog shape is inline in §2 preconditions (see #5). 3. Converted all slash-command references to dot-notation (/speckit.clarify, /speckit.analyze, /speckit.plan, /speckit.red-team) to match the rest of templates/commands/. 4. Dropped the undocumented `--accept-defaults` alias — usage line and flag prose are now consistent on `--yes` alone. 5. Replaced the "See skill README §Adoption" error message with an inline minimal-required-YAML-shape block so users get actionable guidance from the error itself, without depending on a README that this template doesn't ship. No behavioural changes to the protocol; all fixes are template-hygiene and contract-clarity. --- templates/commands/red-team.md | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/templates/commands/red-team.md b/templates/commands/red-team.md index d2621e1442..5f57c90d32 100644 --- a/templates/commands/red-team.md +++ b/templates/commands/red-team.md @@ -1,12 +1,9 @@ --- -description: Attack a functional specification with parallel adversarial lens agents before `/speckit-plan` locks in architecture. Aggregates findings and walks the maintainer through resolution into one of four categories. +description: Attack a functional specification with parallel adversarial lens agents before `/speckit.plan` locks in architecture. Aggregates findings and walks the maintainer through resolution into one of four categories. handoffs: - label: Proceed to planning agent: speckit.plan prompt: "Create a plan for the spec, which has now been red-team-reviewed and hardened." -scripts: - sh: scripts/bash/check-prerequisites.sh --json --paths-only - ps: scripts/powershell/check-prerequisites.ps1 -Json -PathsOnly --- ## User Input @@ -53,18 +50,18 @@ You **MUST** consider the user input before proceeding (if not empty). ## Outline -Goal: Run an adversarial review of a functional spec using project-configured lenses. Produce a structured findings report at `specs//red-team-findings-[-NN].md`; walk the maintainer through per-finding resolution. Do NOT auto-apply spec changes. Complementary to `/speckit-clarify` (correctness) and `/speckit-analyze` (consistency) — the red team is the adversarial layer. +Goal: Run an adversarial review of a functional spec using project-configured lenses. Produce a structured findings report at `specs//red-team-findings-[-NN].md`; walk the maintainer through per-finding resolution. Do NOT auto-apply spec changes. Complementary to `/speckit.clarify` (correctness) and `/speckit.analyze` (consistency) — the red team is the adversarial layer. -Schema references for input (lens catalog) and output (findings report) ship alongside this command in the skill directory and are authoritative for format. +**Schema references**: a concrete schema for the lens catalog (`.specify/red-team-lenses.yml`) and the findings report format is **TODO — to be added as sibling template files in a follow-up PR if this core approach lands**. Until then, the minimal required lens-catalog shape is inline in §2 preconditions and the findings-report sections are enumerated in §6.5 below. -Invocation: `/speckit-red-team [--yes] [--lenses name1,name2,...] [--dry-run] [--session-suffix NN]` +Invocation: `/speckit.red-team [--yes] [--lenses name1,name2,...] [--dry-run] [--session-suffix NN]` ## 1. Invocation parsing Parse `$ARGUMENTS` into: - `` (positional, required): path relative to repo root of the functional spec to attack. Examples: `specs//spec.md` (SpecKit working record), or a project-specific canonical location for graduated specs (e.g. `04_Functional_Specs/_Functional_Spec_v0.1_DRAFT.md` in repos that use a graduated docs tree). Both are valid inputs. -- `--yes` / `--accept-defaults` (flag): auto-confirm the proposed lens selection when >5 lenses match triggers. Required for non-interactive / CI invocations. +- `--yes` (flag): auto-confirm the proposed lens selection when >5 lenses match triggers. Required for non-interactive / CI invocations. - `--lenses ` (flag with value): explicit override of the lens set. Skips trigger-matching; runs exactly the listed lenses. - `--dry-run` (flag): report which lenses would run and why, without dispatching adversary agents. - `--session-suffix ` (flag with value): override the session ID's trailing ordinal when multiple sessions occur on the same day. @@ -76,7 +73,22 @@ If `$ARGUMENTS` is empty OR the target spec path is missing, print the usage blo Before dispatching any adversary, verify in order. Fail fast on the first failure. 1. **Target spec exists**. Resolve the given path relative to repo root. If not found, print: `ERROR: target spec not found at ` and STOP. -2. **Lens catalog exists** at `/.specify/red-team-lenses.yml`. If not found, print: `ERROR: no lens catalog at .specify/red-team-lenses.yml. See the skill README §Adoption for setup.` and STOP. +2. **Lens catalog exists** at `/.specify/red-team-lenses.yml`. If not found, print the error below and STOP: + ``` + ERROR: no lens catalog at .specify/red-team-lenses.yml + Minimal required shape: + version: v1 + lenses: + - name: + description: + core_questions: + - + - + trigger_match: [] + severity_weight: # optional + finding_bound: # optional + Define at least 1 lens covering each trigger category your project cares about. + ``` 3. **Lens catalog parses**. Read the YAML. If parse fails, print: `ERROR: .specify/red-team-lenses.yml failed to parse: ` and STOP. 4. **Catalog non-empty**. If top-level `lenses` list is missing or empty, print: `ERROR: lens catalog has no lenses defined` and STOP. 5. **Each lens entry has required fields**: `name`, `description`, `core_questions`, `trigger_match`. Entries missing any of these are skipped with a warning — session proceeds on the remainder. If ALL entries are malformed, fail the catalog check. @@ -227,7 +239,7 @@ Walk the maintainer through each finding. For each finding in the table (group b - **new-OQ** — route to the target spec's `## Open Questions` section. The "target spec" here is the forward-facing canonical doc, not the SpecKit working record. Prompt for owner name + what it blocks. Compose the OQ entry and offer to insert it; maintainer confirms. Record assigned OQ ID (format `OQ--`) as `downstream_ref`. - **accepted-risk** — route to the target spec's `## Accepted Risks` section on the forward-facing canonical doc. Prompt for rationale; auto-detect `[regulatory-review]` tag candidates by scanning the finding description for regulated-domain signals (money path, regulatory path, compliance-critical guarantees, disclosure surfaces — as defined by the project's constitution). Offer to insert the AR entry with the next sequential `AR-NNN`. Record AR ID as `downstream_ref`. If tagged, add a reminder about the project's governance-confirmation requirement. - **out-of-scope** — finding belongs to a different spec, or the fix belongs in a forward-facing doc that does not yet exist (in which case create a placeholder `specs//README.md` and cross-reference it). Prompt for the owning spec path. Record the cross-reference in the resolution log only; do not edit the other spec. -3. **`/speckit-analyze` cross-reference rule**: for each finding, detect overlap with `/speckit-analyze` output if a recent analyze report is present in the feature directory. If overlap is detected, ASK the maintainer: "This finding appears to overlap with analyze finding . Cross-reference instead of duplicating? (yes/no)". If yes, record as `resolution.downstream_ref: analyze:` and SKIP creating a new OQ/AR entry — the analyze finding is authoritative. +3. **`/speckit.analyze` cross-reference rule**: for each finding, detect overlap with `/speckit.analyze` output if a recent analyze report is present in the feature directory. If overlap is detected, ASK the maintainer: "This finding appears to overlap with analyze finding . Cross-reference instead of duplicating? (yes/no)". If yes, record as `resolution.downstream_ref: analyze:` and SKIP creating a new OQ/AR entry — the analyze finding is authoritative. 4. **MUST NOT auto-apply spec changes**. For spec-fix resolutions, the skill may PROPOSE an edit diff but the maintainer applies it themselves. For AR/OQ insertions, the skill may draft the entry text but the maintainer confirms before the Edit tool actually writes. 5. **MUST NOT edit historical SpecKit working records** (the hard-and-fast rule above). If the maintainer directs the skill to edit such a file, the skill refuses and proposes a forward-facing alternative. If the finding genuinely cannot be resolved at a forward-facing location, the correct category is **accepted-risk** (recording the historical-record gap as an accepted risk) or **out-of-scope** (pointing at a future feature that will do the work properly). 6. **Update the findings table** (`status` column) and append the resolution block to §3 Resolutions Log as each finding is categorised. @@ -239,7 +251,7 @@ Walk the maintainer through each finding. For each finding in the table (group b | Condition | Behaviour | |---|---| | Target spec missing | Fail fast with `ERROR: target spec not found at `. No session created. | -| Lens catalog missing | Fail fast with `ERROR: no lens catalog at .specify/red-team-lenses.yml. See skill README §Adoption.` | +| Lens catalog missing | Fail fast with the minimal-required-shape error printed in §2.2 above (no external doc references). | | Catalog unparseable | Fail fast with `ERROR: .specify/red-team-lenses.yml failed to parse: `. | | Catalog empty (no `lenses` list) | Fail fast with `ERROR: lens catalog has no lenses defined`. | | Individual lens entry malformed | Warn, skip that lens, proceed with the rest. If ALL entries malformed, fail. | From 1da172a3bf70463a40ebb4b2af4e9de5b316d6cb Mon Sep 17 00:00:00 2001 From: Ash Brener Date: Wed, 22 Apr 2026 00:24:35 +0200 Subject: [PATCH 3/3] red-team: address second Copilot review pass (5 new comments) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves inline comments from review round 2: 1. Added explicit fenced Usage block under ## Outline; §1 now references it by name ("print the fenced Usage block from §Outline") instead of the ambiguous "print the usage block above" instruction. 2. Simplified the >5-lens flow: removed the "non-interactive detection" branch entirely. Interactivity is now determined solely by whether --yes was passed — CI / batch runs MUST pass --yes; otherwise the run prompts. Added a clarifying parenthetical and updated §8 row. 3. Aligned dry-run session-ID example with the canonical format. Both now use RT--[-] with the - as optional. 4. Converted UK spelling to US to match the rest of templates/commands/: "artefact" -> "artifact", "behaviour" -> "behavior". 5. Picked a single threshold for overwhelming-findings: >=25 HIGH+CRITICAL combined. Dropped the inconsistent ">50 HIGH" example; the row now states one actionable heuristic. No behavior changes to the protocol; all fixes are template-hygiene and contract-clarity. --- templates/commands/red-team.md | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/templates/commands/red-team.md b/templates/commands/red-team.md index 5f57c90d32..885877a1e9 100644 --- a/templates/commands/red-team.md +++ b/templates/commands/red-team.md @@ -54,7 +54,11 @@ Goal: Run an adversarial review of a functional spec using project-configured le **Schema references**: a concrete schema for the lens catalog (`.specify/red-team-lenses.yml`) and the findings report format is **TODO — to be added as sibling template files in a follow-up PR if this core approach lands**. Until then, the minimal required lens-catalog shape is inline in §2 preconditions and the findings-report sections are enumerated in §6.5 below. -Invocation: `/speckit.red-team [--yes] [--lenses name1,name2,...] [--dry-run] [--session-suffix NN]` +### Usage + +``` +/speckit.red-team [--yes] [--lenses name1,name2,...] [--dry-run] [--session-suffix NN] +``` ## 1. Invocation parsing @@ -66,7 +70,7 @@ Parse `$ARGUMENTS` into: - `--dry-run` (flag): report which lenses would run and why, without dispatching adversary agents. - `--session-suffix ` (flag with value): override the session ID's trailing ordinal when multiple sessions occur on the same day. -If `$ARGUMENTS` is empty OR the target spec path is missing, print the usage block above and STOP. Do NOT try to infer the target from context. +If `$ARGUMENTS` is empty OR the target spec path is missing, print the fenced Usage block from §Outline above and STOP. Do NOT try to infer the target from context. ## 2. Preconditions check @@ -132,13 +136,14 @@ Resolve each name against the catalog. Unknown names produce a warning and are d - The proposed top-5 default with, for each, a one-line rationale (which triggers it covers + severity_weight). - The dropped lenses with the reason they ranked below. - **If `--yes` was passed**: auto-accept the proposed default. Set `selection_method: auto` in the session record with a note that --yes was used. Skip to §5. - - **If `--yes` was NOT passed AND the invocation is non-interactive**: FAIL FAST. Print: `ERROR: more than 5 lenses match; interactive confirmation required. Re-run with --yes to accept the proposed default, or --lenses to specify explicitly.` and STOP. - - **If interactive**: ask the maintainer to respond: + - **Otherwise** (no `--yes`): prompt the maintainer to respond: - "accept" / "yes" → use proposed default; `selection_method: proposed-and-confirmed`. - "swap A for B" → swap a default lens with a dropped lens; `selection_method: swapped`. - "expand to N" (N > 5) → run more than 5 lenses (maintainer opts into the cost); `selection_method: expanded`. - Anything else → re-prompt with the three options above. + *(CI / batch runs MUST pass `--yes` to auto-accept the proposed default; running without `--yes` in a non-interactive context will stall waiting for input. This keeps the behavior simple: interactivity is determined by whether `--yes` was passed, not by detecting the terminal.)* + Write the final `selected_lenses` list. Validate 3 ≤ `len(selected_lenses)` ≤ 5 (unless `selection_method == expanded`). If below 3, warn the maintainer that lens diversity is weak — offer to abort. ## 5. Parallel adversary dispatch @@ -152,7 +157,7 @@ Target: Matched triggers: Selected lenses: Selection method: -Proposed session ID: RT--- +Proposed session ID: RT--[-] ``` and STOP. @@ -218,7 +223,7 @@ Collect the responses from all dispatched agents. | `specs//contracts/*` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | | `specs//quickstart.md` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | | `specs//checklists/*` | **HISTORICAL SpecKit working record** | ❌ **NO — never edit** | -| `specs//red-team-findings-*.md` | **Session artefact** (created by THIS skill) | ✅ Yes — this skill owns it | +| `specs//red-team-findings-*.md` | **Session artifact** (created by THIS skill) | ✅ Yes — this skill owns it | | `99_Archive/*` | **Archived historical** | ❌ **NO — never edit** | **Rationale**: SpecKit working records in `specs//` capture a point-in-time decision state. They serve as the audit trail of "what was decided at time T." Rewriting them destroys that audit trail. If the correct fix for a red team finding would require editing one of these files, the correct resolution category is: @@ -248,7 +253,7 @@ Walk the maintainer through each finding. For each finding in the table (group b ## 8. Failure-mode handling -| Condition | Behaviour | +| Condition | Behavior | |---|---| | Target spec missing | Fail fast with `ERROR: target spec not found at `. No session created. | | Lens catalog missing | Fail fast with the minimal-required-shape error printed in §2.2 above (no external doc references). | @@ -258,9 +263,9 @@ Walk the maintainer through each finding. For each finding in the table (group b | Constitution lacks trigger criteria | Warn and proceed in bootstrap mode using the six default categories. UNLESS `--lenses` was passed (bypass). | | Target spec matches zero triggers AND no `--lenses` | Print info message and STOP. Not an error. | | No lens in catalog covers matched triggers | Fail fast — asks maintainer to extend catalog or pass --lenses. | -| >5 matches in non-interactive without `--yes` | Fail fast. Maintainer must re-run with `--yes` or `--lenses`. | +| >5 matches without `--yes` | Prompt the maintainer for accept / swap / expand. CI / batch runs MUST pass `--yes` to auto-accept the proposed default; otherwise the run will stall waiting for input. | | Individual adversary agent fails (timeout, parse error, empty response) | Record failure in session metadata with lens name + reason. Continue with other lenses. Flag for re-run via `--lenses`. Do NOT abort the session. | -| Overwhelming findings (e.g., >50 HIGH after aggregation) | After §6 completes, if total findings exceed a threshold (default: ≥25 HIGH+CRITICAL), warn the maintainer the spec may not be ready for red team and offer an abort path. Abort records session state for later resumption. | +| Overwhelming findings (≥25 HIGH+CRITICAL combined after aggregation) | After §6 completes, if the combined count of HIGH and CRITICAL findings meets or exceeds **25**, warn the maintainer the spec may not be ready for red team and offer an abort path. Abort records session state for later resumption. | | Spec was updated since prior red team | On session start, check for prior findings report in the feature dir. If present and older than a material-change threshold (heuristic: target spec has new FRs or section count changed), warn the maintainer and ask whether to proceed or abort. | | Session interrupted mid-resolution | Report file is saved atomically after every resolution update. On re-invocation with same session ID, offer to resume from last-resolved finding. |