diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 3a96cfe..bca73f2 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -72,9 +72,11 @@ devpace 分为两个独立层次,**产品层不得依赖开发层**: | 规范文件 | 职责 | |---------|------| +| `project-structure.md` | 项目目录结构、文件放置规则、配置文件索引;分层架构约束见本文件"分层架构"章节 | | `common.md` | 响应语言、Git 提交规范、文档命名 | | `dev-workflow.md` | 开发会话协议、任务执行、质量检查、跨会话连续性、文档级联 | -| `plugin-dev-spec.md` | Claude Code 组件开发规范(Plugin 结构、Skill/Agent/Hook/MCP 规范、常见陷阱) | +| `plugin-dev-spec.md` | Claude Code 核心组件规范(Plugin 结构、Skill 规范、常见陷阱;Agent/Hook/MCP 参考见 `references/component-reference.md`) | +| `info-architecture.md` | 信息架构(devpace 适配):IA-1 至 IA-11 索引、六层架构映射、约束分级、分发层分离规则;完整原则见 `references/ia-principles.md` | ## 质量检查 @@ -82,33 +84,6 @@ devpace 分为两个独立层次,**产品层不得依赖开发层**: - 每个 rules/ 和 _schema/ 文件有 §0 速查卡片 - 模板文件用 `{{PLACEHOLDER}}` 标记需填充的内容 - Skill 的 SKILL.md 遵循 `.claude/rules/plugin-dev-spec.md` 的 frontmatter 字段定义 -- Skill 分拆模式:SKILL.md 放输入/输出/高层步骤("做什么"),当详细规则超过 ~50 行时拆出 `*-procedures.md`("怎么做")。参考 pace-dev 和 pace-change +- Skill 分拆模式:详见 `plugin-dev-spec.md` "分拆模式"章节。参考 pace-dev 和 pace-change - **分层完整性**:产品层文件不得引用 `docs/` 或 `.claude/`(见分层架构章节) -- **多处出现内容的同步维护**:以下信息在多个文件中出现,修改时须全部同步(箭头表示权威方向:源→派生): - - accept 能力描述:`skills/pace-test/SKILL.md`(权威)→ `rules/devpace-rules.md §15`(教学派生)→ `docs/user-guide.md`(文档派生) - - 子命令列表:各 `SKILL.md`(权威)→ `devpace-rules.md §0`(目录索引)→ `user-guide.md`(文档派生)→ `test-procedures.md 职责行`(测试派生) - - 推荐使用流程:`SKILL.md`(权威)→ `user-guide.md`(文档派生) - - 特性文档同步:各 `SKILL.md`(权威)→ `docs/features/.md`(文档派生)→ `docs/features/_zh.md`(翻译派生) - - pace-next 信号摘要:`knowledge/signal-priority.md` + `knowledge/signal-collection.md`(权威)→ `skills/pace-next/SKILL.md` Step 2/3(内联摘要派生)→ `skills/pace-next/next-procedures-output-default.md`(命令引导派生)→ `docs/features/pace-next.md` + `pace-next_zh.md`(信号概览和示例派生) - - Schema→脚本规则同步:`knowledge/_schema/*.md`(权威)→ `scripts/validate-schema.mjs` RULES 注册表(派生)→ `scripts/collect-signals.mjs` 信号条件(派生)→ `scripts/compute-metrics.mjs` 指标公式(派生) -- **pace-role 角色扩展清单**:新增角色时须同步以下文件(按顺序): - 1. `skills/pace-role/role-procedures-dimensions.md`:角色定义表 - 2. `skills/pace-role/role-procedures-switch.md`:别名映射 - 3. `skills/pace-role/role-procedures-inference.md`:关键词映射 - 4. `skills/pace-role/role-procedures-compare.md`:输出格式加一行 - 5. `skills/pace-status/status-procedures-roles.md`:完整角色模板 - 6. `skills/pace-retro/retro-procedures.md`:角色适配表 - 7. `skills/pace-change/change-procedures-impact.md`:措辞模板表 - 8. `skills/pace-pulse/SKILL.md`:角色感知表 - 9. `skills/pace-theory/theory-procedures-default.md`:角色适配输出框架 - 10. `skills/pace-next/next-procedures.md`:视角调整表 - 11. `skills/pace-release/release-procedures-notes.md`:角色视角 Release Notes - 12. `knowledge/_schema/project-format.md`:preferred-role 枚举 - 13. `docs/features/pace-role.md` + `pace-role_zh.md`:特性文档 -- **pace-plan 子命令扩展清单**:添加新子命令时须同步以下文件(按顺序): - 1. 新建 `skills/pace-plan/-procedures.md` - 2. `skills/pace-plan/SKILL.md`:路由表 + 输入 + argument-hint - 3. `knowledge/_schema/iteration-format.md`:写入规则(如新子命令写入迭代文件) - 4. `rules/devpace-rules.md §11`:迭代节奏信号(如新子命令产生信号) - 5. `docs/features/pace-plan.md` + `pace-plan_zh.md`:核心特性摘要 + 相关资源链接 - 6. `docs/user-guide.md` + `user-guide_zh.md`:参数表 + 功能描述 +- **多处出现内容的同步维护**:修改 Skill 子命令、能力描述、信号定义或 Schema 时,查阅 `.claude/references/sync-checklists.md` 获取完整同步链路和扩展清单 diff --git a/.claude/references/cascade-procedures.md b/.claude/references/cascade-procedures.md new file mode 100644 index 0000000..1588d7f --- /dev/null +++ b/.claude/references/cascade-procedures.md @@ -0,0 +1,123 @@ +# 文档级联处理步骤 + +> **按需加载**:仅当 `dev-workflow.md` §1 上游变更检测有输出时加载本文件。日常使用见 `dev-workflow.md` §0 级联速查表。 + +## §8.1 权威链定义 + +``` +vision.md (WHY) → design.md (HOW) → requirements.md (WHAT) → roadmap.md (WHEN) +``` + +变更只能沿此方向级联(上游 → 下游),不可反向。 + +**多文件同时变更**:当多个上游文件在同一时段被修改时,按权威链从上游到下游依次处理:先 vision.md → 再 design.md → 最后 requirements.md。上游文件的级联结果可能覆盖下游文件的独立变更,因此必须按此顺序,避免重复或矛盾的级联更新。 + +## §8.2 场景 A:vision.md 变更 + +**触发**:OBJ 增删改、北极星调整、护城河策略变化。 + +**影响分析**: + +1. 识别受影响的 OBJ → 通过 roadmap.md "对应 OBJ" 定位受影响的 Phase +2. 检查 design.md 哪些章节基于该 OBJ 设计(参考 design.md §1 设计优先级) +3. 检查 requirements.md 哪些 S/F 条目源自该 OBJ +4. 评估 progress.md 当前任务是否受影响 + +**动作**:更新受影响的下游文档,或标记 ``。 + +## §8.3 场景 B:design.md 变更 + +**触发**:UX 原则变化、状态机修改、工作流重设计、新增/删除章节。 + +**影响分析**: + +1. 通过 design.md Skill 映射(§12)定位受影响的 Skill +2. 检查 requirements.md 哪些 S/F/NF 条目基于变更的设计章节 +3. 检查已实现的 Skill 是否需要适配 +4. 评估 progress.md 是否需要新增任务 + +**动作**:更新 requirements.md 受影响条目 + 必要时在 progress.md 新增任务。 + +## §8.4 场景 C:requirements.md 变更 + +**触发**:新增场景、验收标准修改、功能需求变更、优先级调整。 + +**影响分析**: + +1. 识别受影响的 S/F/NF 条目 → 定位 progress.md 对应的任务 +2. 检查已实现的 Skill 是否需要返工 +3. 评估是否需要在 progress.md 新增任务 + +**动作**:更新 progress.md 任务列表 + 必要时调整 roadmap.md 里程碑。 + +## §8.5 场景 D:自触发级联(Claude 修改上游文档) + +**触发**:当前任务本身要求修改 vision.md / design.md / requirements.md。由 §3 第 5 条或反向反馈触发。 + +**与场景 A/B/C 的区别**: +- 不需要"提示用户建议评估"——Claude 自己就是修改者,直接评估 +- 变更内容已知——不需要 diff,直接从修改内容出发分析影响 + +**处理步骤**: + +1. 完成上游文档修改并 git commit +2. 明确记录本次修改了什么(哪个文档、哪些章节、变更性质) +3. 根据修改的文档级别,按场景 A/B/C 对应的影响分析维度,评估对下游的影响 +4. 检查 progress.md "当前任务"表中其他"进行中"或"待做"任务是否受影响 + - 受影响的任务:在"说明"列添加备注 `[design.md §X 已更新,需适配]` + - 需要新增任务:立即添加到 progress.md(遵循 §5 关联条目填写要求) +5. 在 progress.md "变更记录"添加条目,原因列标注"自触发:任务 [任务名]" + +## §8.6 场景 E:roadmap.md 变更 + +**触发**:里程碑增删改、Phase 调整、验证计划变更。 + +**影响分析**: + +1. 识别受影响的里程碑 → 检查 progress.md 对应任务 +2. 评估任务定义是否需要调整(关联条目、里程碑归属) + +**动作**:更新 progress.md 任务列表。roadmap.md 是权威链终点,不再向下级联。 + +## §8.7 级联执行清单(通用) + +1. 识别变更范围(哪个文档、哪个章节/条目) +2. 沿权威链向下追踪影响 +3. 对每个受影响的下游文档:读取、评估、更新或标记待审 +4. 在 progress.md "变更记录"添加条目:`| 日期 | 变更描述 | 原因 |` +5. 若有进行中的任务受影响,在 progress.md "当前任务"的"说明"列添加备注 +6. 若变更涉及 `.claude/` 文件,评估 `rules/` 下其他文件是否受影响 + +## §8.8 陈旧标记 + +使用 HTML 注释标记在下游文档的受影响位置: + +``` + +``` + +解决后移除标记。 + +## §8.9 变更决策记录 + +所有级联处理的结果记入 progress.md "变更记录"表。 + +格式:`| 日期 | [源文档] 变更:[内容] → [下游影响] | [原因] |` + +## §8.10 反向反馈流程 + +> 由 `dev-workflow.md` §3 触发:实现中发现上游文档有歧义、缺失或不可行之处。 + +**触发条件**(满足任一): +- design.md 的设计规格不可行或有矛盾 +- requirements.md 的验收标准存在歧义或无法满足 +- vision.md 的 OBJ/MoS 定义与实际不匹配 + +**处理步骤**: +1. 暂停当前实现,向用户描述问题和建议修正方案 +2. 获得用户确认后,修改上游文档并 git commit +3. 执行 §8.5(自触发级联),评估修正对其他任务的影响 +4. 在 progress.md "变更记录"添加条目,原因列标注"反向反馈:实现 [任务名] 时发现 [问题简述]" +5. 继续当前任务(基于修正后的上游文档) + +**原则**:反向反馈不是"反向级联",而是"修正上游 → 正向级联"的闭环。下游实现永远不能直接改变上游设计意图,只能报告→确认→修正→正向级联。 diff --git a/.claude/references/component-reference.md b/.claude/references/component-reference.md new file mode 100644 index 0000000..be6b407 --- /dev/null +++ b/.claude/references/component-reference.md @@ -0,0 +1,136 @@ +# Claude Code 组件参考 + +> **职责**:Agent、Hook、MCP Server 的完整规格参考。按需加载,仅在开发/修改对应组件时使用。 +> +> 核心开发规范见 `.claude/rules/plugin-dev-spec.md`(始终加载)。 + +**章节索引**:[Agent 定义](#agent-定义) | [Hooks](#hooks) | [MCP Server 配置](#mcp-server-配置) | [规范查证方法](#规范查证方法) + +## Agent 定义 + +Agent 文件放在 `agents/` 目录,frontmatter 必须包含 `name` 和 `description`。 + +**合法 frontmatter 字段**: + +| 字段 | 说明 | +|------|------| +| `name` | **必填**,Agent 名称 | +| `description` | **必填**,Agent 描述 | +| `tools` | 允许的工具列表 | +| `disallowedTools` | 禁止的工具列表 | +| `model` | `sonnet` / `opus` / `haiku` | +| `color` | UI 背景色(`blue`/`cyan`/`green`/`yellow`/`red`/`magenta`) | +| `permissionMode` | 权限模式 | +| `maxTurns` | 最大轮次 | +| `skills` | 可用 Skill 列表 | +| `mcpServers` | 可用 MCP Server | +| `memory` | 记忆持久化级别(见下方备注) | +| `hooks` | Agent 级 Hook 配置 | +| `isolation` | `worktree` = 独立 git worktree 运行 | + +`memory` 持久化路径:`user` → `~/.claude/agent-memory//`,`project` → `.claude/agent-memory//`,`local` → 仅当前会话。下次 fork 时自动加载。`isolation: worktree` 时无变更自动清理。 + +通过 Task 工具调用:`Task(subagent_type="agent-name", prompt="...", description="...")`。子 agent 不能再嵌套调用 Task。 + +## Hooks + +Hook 事件名称区分大小写。可用事件: + +| 事件 | 触发时机 | 可阻断? | +|------|---------|---------| +| `PreToolUse` | 工具执行前 | 是(exit 2) | +| `PostToolUse` | 工具执行成功后 | 否 | +| `PostToolUseFailure` | 工具执行失败后 | 否 | +| `UserPromptSubmit` | 用户提交 prompt | 是 | +| `PreCompact` | 上下文压缩前(manual/auto) | 否 | +| `Stop` | Claude 完成响应 | 是 | +| `SessionStart` / `SessionEnd` | 会话开始/结束 | 否 | +| `SubagentStart` / `SubagentStop` | 子 agent 启停 | 部分 | +| `TeammateIdle` / `TaskCompleted` | 团队协作事件 | 是(exit 2) | + +配置位置(优先级从高到低): + +1. managed settings +2. `.claude/settings.json`(项目共享) +3. `.claude/settings.local.json`(项目本地) +4. `~/.claude/settings.json`(全局) +5. Plugin `hooks/hooks.json` + +Hook 脚本中使用 `${CLAUDE_PLUGIN_ROOT}` 引用 Plugin 根目录。exit 0 = 成功,exit 2 = 阻断,其他 = 非阻断错误。 + +### Hook 类型 + +| 类型 | 说明 | 超时默认 | +|------|------|---------| +| `command` | 执行 shell 命令,通过 stdin 接收 JSON 输入 | 无默认 | +| `prompt` | LLM 评估 prompt 内容,决定是否放行 | 30s | +| `agent` | LLM agent 执行(有工具访问权限),决定是否放行 | 60s | + +`command` 类型额外支持 `"async": true`(后台执行,不阻塞主流程)。`prompt`/`agent` 类型具有语义理解能力,适合替代简单的正则匹配做复杂判断。 + +### Skill 级 Hooks + +SKILL.md 的 `hooks` frontmatter 字段支持定义仅在该 Skill 激活时生效的 Hook: + +```yaml +hooks: + PreToolUse: + - matcher: + tool_name: "Write|Edit" + hooks: + - type: prompt + prompt: "验证此写入是否合法..." + timeout: 15 +``` + +Skill 级 Hook 与全局 hooks.json 互补——全局做通用检查,Skill 级做精细控制。 + +### 约束执行分级 + +选择约束的执行保障级别时参考: + +| 可靠性 | 机制 | 适用场景 | +|--------|------|---------| +| 最高 | Hook command + exit 2 | 不可逆操作阻断、模式保护 | +| 高 | Hook prompt/agent | 需语义理解的检查 | +| 中 | Skill 指令 + 铁律标记 | 工作流约束 | +| 基线 | Rules 文本建议 | 行为规范、风格指引 | + +## MCP Server 配置 + +项目级配置放在根目录 `.mcp.json`,格式: + +```json +{ + "mcpServers": { + "server-name": { + "command": "path/to/server", + "args": ["--flag"], + "env": { "KEY": "${ENV_VAR}", "KEY2": "${VAR:-default}" } + } + } +} +``` + +Plugin 内部引用路径时使用 `${CLAUDE_PLUGIN_ROOT}`。也可在 `plugin.json` 的 `mcpServers` 字段内联定义。 + +## 规范查证方法 + +不确定时,按优先级查证: + +1. `Task(subagent_type="claude-code-guide", prompt="查询 [具体问题]")`——内置 agent,可访问官方文档 +2. 官方文档:`https://code.claude.com/docs/en/`(plugins、skills、hooks、mcp、sub-agents、agent-teams) +3. `claude --debug` 查看加载日志排查问题 + +### 官方 plugin-dev 工具(推荐) + +Anthropic 官方 plugin-dev Plugin 提供综合开发工具。安装后可用于 devpace 开发验证: + +| 组件 | 用途 | 使用场景 | +|------|------|---------| +| **plugin-validator** Agent | 10 步综合验证(Manifest/目录/Skills/Hooks/安全) | 任何 Plugin 结构变更后 | +| **skill-reviewer** Agent | Skill 质量审查(description/内容/渐进披露) | Skill 新增或修改后 | +| **agent-creator** Agent | AI 辅助 Agent 创建 | 新增 Agent 定义时 | +| `/plugin validate` | 内置命令,验证 plugin.json 基本结构 | 快速检查 | + +安装:`/plugin install plugin-dev@claude-plugins-official` diff --git a/.claude/references/ia-principles.md b/.claude/references/ia-principles.md new file mode 100644 index 0000000..56b0a01 --- /dev/null +++ b/.claude/references/ia-principles.md @@ -0,0 +1,29 @@ +# 通用信息架构原则 + +> **职责**:Claude Code 项目信息组织的 11 项通用原则。可跨项目复用,不含特定项目结构映射。 +> **定位**:通用原则定义层——具体项目的适配规则由各项目的 `info-architecture.md` 负责。 +> **按需加载**:本文件由 `info-architecture.md` 按需引用,不自动加载到上下文。 + +## 11 原则 + +| # | 原则 | 核心要求 | 应用指南 | +|---|------|---------|---------| +| IA-1 | 单向依赖 | 信息从抽象流向具体,不可反向 | 上层定义概念,下层实现细节;下层只引用同层或上层 | +| IA-2 | 抽象分层 | 不同抽象层级放在不同文件 | 路由(做什么)与步骤(怎么做)分文件;Manifest 与组件分离 | +| IA-3 | 稳定-易变分离 | 高扇入文件隔离易变内容以保持稳定 | 被多处引用的文件保持低变更频率;易变部分拆出独立文件 | +| IA-4 | 信息分类 | 不同信息类型不混放 | 步骤、约束、概念、结构、路由、实例各归其位 | +| IA-5 | 按需加载 | 只加载当前上下文所需的最小信息集 | description 仅提供触发信息;复杂内容分级延迟加载 | +| IA-6 | 单一权威 | 每条信息有且仅有一个权威定义点 | 派生文件注明来源;变更沿源→派生方向传播 | +| IA-7 | 确定性分级 | 约束的执行保障级别匹配其关键程度 | 铁律用 Hook 保障;推荐用文本声明;按关键度选机制 | +| IA-8 | 可发现性优先 | 入口信息为发现而设计,不为完整而设计 | description 写触发条件(When),不写行为描述(What) | +| IA-9 | 认知清晰 | 面向 LLM 的指令精确无歧义,阻止合理化绕过 | 铁律配反合理化清单;模糊表述用具体关键词替代 | +| IA-10 | 契约隔离 | 数据格式由独立契约层约束 | 共享数据定义独立 Schema;生产方和消费方都依赖契约 | +| IA-11 | 单一职责 | 每个文件只有一个核心职责 | 多个独立变更原因 = 需要分拆;越界内容压缩为引用 | + +## 约束分级标记 + +| 标记 | 含义 | 执行保障 | +|------|------|---------| +| `iron rule` | 铁律,不可违反 | 应有 Hook 或自动化测试 | +| `required` | 必须遵循 | 代码审查检查 | +| `recommended` | 推荐实践 | 最佳实践指引 | diff --git a/.claude/references/plugin-info-layers.md b/.claude/references/plugin-info-layers.md new file mode 100644 index 0000000..4fb701b --- /dev/null +++ b/.claude/references/plugin-info-layers.md @@ -0,0 +1,27 @@ +# devpace 信息分层架构 + +> **职责**:devpace Plugin 的六层信息架构和资产类型定义。 + +## 六层架构 + +``` +Layer 6: knowledge/theory (Why — 概念知识,被动加载,极少变更) +Layer 5: rules/ (Must — 行为约束,会话启动时自动加载) +Layer 4: knowledge/_schema/ (Shape — 数据格式契约,按需加载) +Layer 3: skills/*/SKILL.md (What — 路由层,description 触发加载) +Layer 2: skills/*/*-procedures.md (How — 操作步骤,按状态/子命令条件加载) +Layer 1: knowledge/_templates/ (Instance — 具体实例,实例化时加载) +``` + +依赖方向:**仅允许下层引用上层**。Layer 2 引用 Layer 4 的格式定义;Layer 4 不引用 Layer 2 的实现细节。 + +## 信息类型 → 资产映射 + +| 信息类型 | Plugin 资产 | 特征 | +|---------|------------|------| +| 步骤(Procedure) | `*-procedures.md` | 按步操作指令 | +| 约束(Principle) | `rules/*.md` | 行为规范 | +| 概念(Concept) | `knowledge/*.md` | 背景知识 | +| 结构(Structure) | `knowledge/_schema/*.md` | 数据格式定义 | +| 路由(Process) | `SKILL.md` | 工作流分发 | +| 实例(Fact) | `knowledge/_templates/*.md` | 具体模板 | diff --git a/.claude/references/skill-content-check.md b/.claude/references/skill-content-check.md new file mode 100644 index 0000000..c6ed087 --- /dev/null +++ b/.claude/references/skill-content-check.md @@ -0,0 +1,15 @@ +# Skill 内容质量验证方法 + +> **按需加载**:仅在 Skill 开发/修改时参考。推荐方法,非强制流程。 + +## RED-GREEN-REFACTOR 验证法 + +1. **基线观测(RED)**:禁用目标 Skill → 观察 Claude 的默认行为 → 记录与期望行为的具体偏差 + - 记录格式:"无 Skill 时 Claude 做了 [X],期望行为是 [Y]" + - 至少用 2 个不同复杂度的场景观测 +2. **最小规则(GREEN)**:针对观测到的偏差写最小修正规则 → 启用 Skill → 确认偏差被修正 + - 原则:一条规则修正一个偏差,不做预防性规则 +3. **漏洞补充(REFACTOR)**:用不同复杂度场景(S/M/L)压力测试 → 发现新偏差 → 补充合理化预防表 + - 重点关注:Claude 在长会话或复杂任务中是否"合理化"跳过规则 + +**适用场景**:新 Skill 开发、Skill 规则重大修改、发现 Skill 行为偏差时。 diff --git a/.claude/references/skill-creator-integration.md b/.claude/references/skill-creator-integration.md new file mode 100644 index 0000000..031e2b7 --- /dev/null +++ b/.claude/references/skill-creator-integration.md @@ -0,0 +1,52 @@ +# skill-creator 集成约定 + +> **按需参考**:仅在使用 `/skill-creator` 创建或评估 Skill 时加载本文件。 + +`/skill-creator` 的评估工作区(`skills/-workspace/`)已通过 `.gitignore` 排除入库。使用时遵循以下约定: + +## 产物位置规范 + +| 产物 | 位置 | 入库 | +|------|------|------| +| 评估工作区 | `skills/-workspace/`(skill-creator 默认行为) | 否(.gitignore) | +| behavioral eval | `tests/evaluation//evals.json` | 是(权威源) | +| trigger eval | `tests/evaluation//trigger-evals.json` | 是(权威源) | +| `--static` HTML | `skills/-workspace/iteration-N/review.html` | 否 | +| 跨 Skill 文件 | `tests/evaluation/_cross-cutting/` | 是 | + +**`--static` 路径规则**:当 `generate_review.py` 需要使用 `--static` 模式时,输出路径必须放在工作区内:`/iteration-N/review.html`,不要使用 `/tmp/` 或其他外部路径。 + +## Eval 目录结构 + +``` +tests/evaluation/ +├── _cross-cutting/ # 跨 Skill 全局文件(acceptance-matrix、shared-assertions 等) +├── pace-init/ # per-Skill 子目录(目录名 = Skill 名) +│ ├── evals.json # behavioral eval(统一命名,不带 Skill 名前缀) +│ └── trigger-evals.json # trigger eval +├── pace-dev/ +│ ├── evals.json +│ └── trigger-evals.json +└── ... +``` + +**命名规则**: +- 子目录名 = Skill 目录名(如 `pace-dev`) +- eval 文件统一命名 `evals.json` 和 `trigger-evals.json`(由目录区分 Skill) +- `_cross-cutting/` 前缀 `_` 确保排在最前 + +## 三层评估体系 + +| 层级 | 文件 | 用途 | 执行时机 | +|------|------|------|---------| +| T1 Trigger | `trigger-evals.json` | 触发精度(~20 查询/Skill) | description 修改后 | +| T2 Behavioral | `evals.json` | 行为正确性断言 | procedures/SKILL.md 修改后 | +| T3 Full Cycle | skill-creator 完整流程 | with/without 对比 + grading | 重大重构/新 Skill | + +## Eval 格式权威来源 + +eval JSON 格式由 **skill-creator 自身**(`references/schemas.md`)定义。devpace 不在 `knowledge/_schema/` 中重复定义 eval 格式——eval 是开发层关注点,不属于产品层。 + +## 跨 Skill 交叉污染测试 + +创建 trigger eval 时,必须包含兄弟 Skill 的典型查询作为负面测试用例。重点测试对见 `tests/evaluation/_cross-cutting/shared-assertions.md`。 diff --git a/.claude/references/sync-checklists.md b/.claude/references/sync-checklists.md new file mode 100644 index 0000000..3565798 --- /dev/null +++ b/.claude/references/sync-checklists.md @@ -0,0 +1,43 @@ +# 扩展同步清单 + +> **按需参考**:仅在新增角色或 pace-plan 子命令时加载本文件。 + +## pace-role 角色扩展清单 + +新增角色时须同步以下文件(按顺序): + +1. `skills/pace-role/role-procedures-dimensions.md`:角色定义表 +2. `skills/pace-role/role-procedures-switch.md`:别名映射 +3. `skills/pace-role/role-procedures-inference.md`:关键词映射 +4. `skills/pace-role/role-procedures-compare.md`:输出格式加一行 +5. `skills/pace-status/status-procedures-roles.md`:完整角色模板 +6. `skills/pace-retro/retro-procedures.md`:角色适配表 +7. `skills/pace-change/change-procedures-impact.md`:措辞模板表 +8. `skills/pace-pulse/SKILL.md`:角色感知表 +9. `skills/pace-theory/theory-procedures-default.md`:角色适配输出框架 +10. `skills/pace-next/next-procedures.md`:视角调整表 +11. `skills/pace-release/release-procedures-notes.md`:角色视角 Release Notes +12. `knowledge/_schema/project-format.md`:preferred-role 枚举 +13. `docs/features/pace-role.md` + `pace-role_zh.md`:特性文档 + +## pace-plan 子命令扩展清单 + +添加新子命令时须同步以下文件(按顺序): + +1. 新建 `skills/pace-plan/-procedures.md` +2. `skills/pace-plan/SKILL.md`:路由表 + 输入 + argument-hint +3. `knowledge/_schema/iteration-format.md`:写入规则(如新子命令写入迭代文件) +4. `rules/devpace-rules.md §11`:迭代节奏信号(如新子命令产生信号) +5. `docs/features/pace-plan.md` + `pace-plan_zh.md`:核心特性摘要 + 相关资源链接 +6. `docs/user-guide.md` + `user-guide_zh.md`:参数表 + 功能描述 + +## 多处出现内容的同步维护 + +以下信息在多个文件中出现,修改时须全部同步(箭头表示权威方向:源→派生): + +- accept 能力描述:`skills/pace-test/SKILL.md`(权威)→ `rules/devpace-rules.md §15`(教学派生)→ `docs/user-guide.md`(文档派生) +- 子命令列表:各 `SKILL.md`(权威)→ `devpace-rules.md §0`(目录索引)→ `user-guide.md`(文档派生)→ `test-procedures.md 职责行`(测试派生) +- 推荐使用流程:`SKILL.md`(权威)→ `user-guide.md`(文档派生) +- 特性文档同步:各 `SKILL.md`(权威)→ `docs/features/.md`(文档派生)→ `docs/features/_zh.md`(翻译派生) +- pace-next 信号摘要:`knowledge/signal-priority.md` + `knowledge/signal-collection.md`(权威)→ `skills/pace-next/SKILL.md` Step 2/3(内联摘要派生)→ `skills/pace-next/next-procedures-output-default.md`(命令引导派生)→ `docs/features/pace-next.md` + `pace-next_zh.md`(信号概览和示例派生) +- Schema→脚本规则同步:`knowledge/_schema/*.md`(权威)→ `skills/pace-init/scripts/validate-schema.mjs` RULES 注册表(派生)→ `skills/pace-next/scripts/collect-signals.mjs` 信号条件(派生)→ `skills/pace-retro/scripts/compute-metrics.mjs` 指标公式(派生) diff --git a/.claude/rules/common.md b/.claude/rules/common.md index e819546..3bf706c 100644 --- a/.claude/rules/common.md +++ b/.claude/rules/common.md @@ -1,5 +1,13 @@ # 通用规则 +## §0 速查卡片 + +| 规则域 | 要点 | +|--------|------| +| 语言 | 中文对话/文档,英文保留:CLI、技术术语、代码标识符、文件名 | +| Git | `<类型>(<范围>): <描述>`,类型:feat/fix/docs/refactor/test/chore | +| 命名 | 全部 kebab-case:文件、脚本、Schema、目录 | + ## 响应语言 **所有对话和文档必须使用中文。** diff --git a/.claude/rules/dev-workflow.md b/.claude/rules/dev-workflow.md index c263c19..9b16713 100644 --- a/.claude/rules/dev-workflow.md +++ b/.claude/rules/dev-workflow.md @@ -6,42 +6,14 @@ ### 会话生命周期 -```mermaid -graph LR - S1[§1 开始] --> S2[§2 选任务] --> S3[§3 执行] --> S4[§4 质检] --> S5[§5 完成] - S5 --> S6[§6 结束] - S6 -->|下一会话| S1 - S7["§7 中断恢复
(progress.md=恢复点)"] -.-> S2 - S1 -.->|检测到上游变更| S8[§8 级联处理] - S3 -.->|反向反馈/自触发| S8 -``` - -### 级联系统速查 - -**权威链**:`vision.md(WHY) → design.md(HOW) → reqs.md(WHAT) → roadmap.md(WHEN=终点)` — 只能向下级联,不可反向。多文件变更按此顺序依次处理。 - -| 场景 | 触发源 | 检查范围 | 动作 | -|------|--------|---------|------| -| A: vision 变 | OBJ 增删改 | design + reqs + progress | 更新下游或标记 REVIEW | -| B: design 变 | UX/状态机/流程 | reqs + 已实现 Skill + progress | 更新 reqs + 新增任务 | -| C: reqs 变 | 场景/验收/功能 | progress + 已实现 Skill | 更新任务 + 调整 roadmap | -| D: 自触发/反向反馈 | Claude 改上游 | 同 A/B/C 对应维度 | 直接评估 + 备注受影响任务 | -| E: roadmap 变 | 里程碑调整 | progress 任务 | 更新任务(终点,不再级联) | - -通用清单:识别范围 → 沿链追踪 → 逐文档更新 → 记入变更记录 → 备注进行中任务 -陈旧标记:`` - -### 各章节速查 - -| 阶段 | 操作 | 流程 | -|------|------|------| -| §1 开始 | 读 progress.md | 快照+当前任务 → 上游变更检测 → 1 句话报告 → 等指令 | -| §2 选任务 | 最高优先级待做 | 强制追溯验证(关联条目非空) → 加载关联文档 → 开始实现 | -| §3 执行 | 按 design.md | 实现 → 上游问题? → 反向反馈(§3.3) → 自触发级联(§8.5) | -| §4 质检 | 自动+手动 | `bash scripts/validate-all.sh` → 修复失败 → 手动验收 | -| §5 完成 | 更新 progress | 里程碑全完成? → 回顾+更新 roadmap → 新增任务? → 填关联条目 | -| §6 结束 | 更新 progress | 快照+任务状态+会话记录+变更记录 → 3 行摘要 → git commit | -| §7 恢复 | progress.md | 唯一恢复点 → 快照 → 当前任务(继续/已完成/涉及) → 近期会话 | +§1 开始 → §2 选任务 → §3 执行 → §4 质检 → §5 完成 → §6 结束 → (下一会话) §1 +中断恢复: §7 → §2 | 上游变更/反向反馈: §1/§3 → §8 级联 + +### 级联速查 + +**权威链**:`vision.md(WHY) → design.md(HOW) → reqs.md(WHAT) → roadmap.md(WHEN=终点)` — 只能向下级联。 + +触发时(§1 检测到上游变更 / §3 反向反馈+自触发),加载 `.claude/references/cascade-procedures.md`。日常记住:识别范围 → 沿链追踪 → 更新下游 → 记入变更记录。 ## §1 会话开始协议 @@ -76,16 +48,8 @@ graph LR ## §3 开发执行 1. 按 design.md 规格和 requirements.md 验收标准实现 -2. 遵循开发守则(CLAUDE.md "开发守则"章节的 7 条) -3. **反向反馈**:实现过程中若发现上游文档(vision.md/design.md/requirements.md)有歧义、缺失或不可行之处: - - **触发条件**(满足任一):design.md 的设计规格不可行或有矛盾、requirements.md 的验收标准存在歧义或无法满足、vision.md 的 OBJ/MoS 定义与实际不匹配 - - **处理步骤**: - 1. 暂停当前实现,向用户描述问题和建议修正方案 - 2. 获得用户确认后,修改上游文档并 git commit - 3. 执行 §8.5(自触发级联),评估修正对其他任务的影响 - 4. 在 progress.md "变更记录"添加条目,原因列标注"反向反馈:实现 [任务名] 时发现 [问题简述]" - 5. 继续当前任务(基于修正后的上游文档) - - **原则**:反向反馈不是"反向级联",而是"修正上游 → 正向级联"的闭环。下游实现永远不能直接改变上游设计意图,只能报告问题、等待确认、修正上游后再正向级联 +2. 遵循开发守则(CLAUDE.md "开发守则"章节) +3. **反向反馈**:实现中发现上游文档有歧义/缺失/不可行时,暂停实现并走反向反馈流程(详见 `cascade-procedures.md` §8.10)。原则:下游不直接改变上游设计意图,只能报告→确认→修正→正向级联 4. 每完成一个有意义的工作单元,git commit(遵循 common.md 提交规范) 5. **自触发级联**:若当前任务涉及修改上游文档(vision.md / design.md / requirements.md),完成修改并 commit 后,立即执行 §8.5(自触发级联),评估对其他任务的影响,再继续后续工作 @@ -95,30 +59,14 @@ graph LR ### 自动检查(必须先通过) -运行 `bash scripts/validate-all.sh`(或 `pytest tests/static/ -v`),修复所有失败后再进行后续手动检查。 - -自动检查覆盖项(无需手动重复): -- 分层完整性(`test_layer_separation.py`) -- plugin.json 同步(`test_plugin_json_sync.py`) -- Schema 结构合规(`test_schema_compliance.py`) -- §0 速查卡片(`test_markdown_structure.py`) -- 模板占位符(`test_template_placeholders.py`) -- Frontmatter 合规(`test_frontmatter.py`) -- Skill 分拆启发(`test_markdown_structure.py`) -- 交叉引用完整性(`test_cross_references.py`) -- 命名规范(`test_naming_conventions.py`) -- 状态机一致性(`test_state_machine.py`) - -### plugin-dev 验证(推荐,自动检查通过后执行) - -安装 Anthropic 官方 plugin-dev Plugin 后可使用以下验证(安装方式见 CONTRIBUTING.md): +运行 `bash dev-scripts/validate-all.sh`(或 `pytest tests/static/ -v`),修复所有失败后再进行后续手动检查。 -- [ ] **Plugin 结构验证**:调用 plugin-validator Agent(10 步综合验证:Manifest + 目录 + Commands + Agents + Skills + Hooks + MCP + 安全检查 → PASS/FAIL 报告) -- [ ] **Skill 质量审查**(Skill 开发/修改时):调用 skill-reviewer Agent(description 质量 + 内容评估 + 渐进披露 + 改进建议 → Rating 报告) -- [ ] **基础验证**:`/plugin validate`(内置命令,验证 plugin.json 语法和基本结构) +自动检查覆盖项(无需手动重复):分层完整性、plugin.json 同步、Schema 结构合规、§0 速查卡片、模板占位符、Frontmatter 合规、Skill 分拆启发、交叉引用完整性、命名规范、状态机一致性。(详细映射见 `dev-scripts/validate-all.sh`) ### 手动检查(自动检查通过后执行) +- [ ] **plugin-dev 验证**(推荐,需安装):plugin-validator 结构验证 + skill-reviewer 质量审查 + `/plugin validate` 基础验证(安装见 CONTRIBUTING.md) + - [ ] Schema 语义合规:产出文件符合 `knowledge/_schema/` 的语义要求(自动检查仅验证结构) - [ ] 验收验证(按任务类型): - Skill 开发:`claude --plugin-dir ./` 加载无报错 + 手动触发目标 Skill 验证输出格式 @@ -128,19 +76,7 @@ graph LR - 通用:对照 requirements.md 相关 S/F 条目的验收标准逐条检查 - [ ] 特性文档同步:修改 Skill 子命令/行为时,检查 `docs/features/` 对应文档是否需要更新 -### Skill 内容质量验证方法(推荐) - -开发或修改 Skill 内容时,推荐使用 RED-GREEN-REFACTOR 方法验证规则的准确性和完整性(非强制流程,作为质量提升指引): - -1. **基线观测(RED)**:禁用目标 Skill → 观察 Claude 的默认行为 → 记录与期望行为的具体偏差 - - 记录格式:"无 Skill 时 Claude 做了 [X],期望行为是 [Y]" - - 至少用 2 个不同复杂度的场景观测 -2. **最小规则(GREEN)**:针对观测到的偏差写最小修正规则 → 启用 Skill → 确认偏差被修正 - - 原则:一条规则修正一个偏差,不做预防性规则 -3. **漏洞补充(REFACTOR)**:用不同复杂度场景(S/M/L)压力测试 → 发现新偏差 → 补充合理化预防表 - - 重点关注:Claude 在长会话或复杂任务中是否"合理化"跳过规则 - -**适用场景**:新 Skill 开发、Skill 规则重大修改、发现 Skill 行为偏差时。 +Skill 开发/修改时,推荐使用 RED-GREEN-REFACTOR 验证方法(详见 `.claude/references/skill-content-check.md`)。 ## §5 任务完成与更新 @@ -175,115 +111,9 @@ graph LR 1. 快照(定位当前阶段和里程碑) 2. 当前任务表(定位"进行中"任务的中断点) 3. 近期会话(理解最近几次会话的上下文演进) -- "进行中"任务的"说明"列 = 结构化中断点描述 -- 恢复步骤: - 1. 读取"继续"段确定下一步操作 - 2. 若存在"涉及"段,先读取列出的文件确认当前状态(文件可能被其他会话或用户修改) - 3. 读取"已完成"段避免重复工作 +- 恢复步骤(解析 §6 中断点格式):读"继续"段 → 定下一步;有"涉及"段 → 先读文件确认状态;读"已完成"段 → 避免重复 - 若"说明"列为空但状态为"进行中",先 `git log` 查看最近提交确认进度 ## §8 文档级联处理 -当检测到上游文档变更(§1 自动检测)或用户主动修改设计文档时,按本节执行影响分析和级联更新。 - -### §8.1 权威链定义 - -``` -vision.md (WHY) → design.md (HOW) → requirements.md (WHAT) → roadmap.md (WHEN) -``` - -变更只能沿此方向级联(上游 → 下游),不可反向。 - -**多文件同时变更**:当多个上游文件在同一时段被修改时,按权威链从上游到下游依次处理:先 vision.md → 再 design.md → 最后 requirements.md。上游文件的级联结果可能覆盖下游文件的独立变更,因此必须按此顺序,避免重复或矛盾的级联更新。 - -### §8.2 场景 A:vision.md 变更 - -**触发**:OBJ 增删改、北极星调整、护城河策略变化。 - -**影响分析**: - -1. 识别受影响的 OBJ → 通过 roadmap.md "对应 OBJ" 定位受影响的 Phase -2. 检查 design.md 哪些章节基于该 OBJ 设计(参考 design.md §1 设计优先级) -3. 检查 requirements.md 哪些 S/F 条目源自该 OBJ -4. 评估 progress.md 当前任务是否受影响 - -**动作**:更新受影响的下游文档,或标记 ``。 - -### §8.3 场景 B:design.md 变更 - -**触发**:UX 原则变化、状态机修改、工作流重设计、新增/删除章节。 - -**影响分析**: - -1. 通过 design.md Skill 映射(§12)定位受影响的 Skill -2. 检查 requirements.md 哪些 S/F/NF 条目基于变更的设计章节 -3. 检查已实现的 Skill 是否需要适配 -4. 评估 progress.md 是否需要新增任务 - -**动作**:更新 requirements.md 受影响条目 + 必要时在 progress.md 新增任务。 - -### §8.4 场景 C:requirements.md 变更 - -**触发**:新增场景、验收标准修改、功能需求变更、优先级调整。 - -**影响分析**: - -1. 识别受影响的 S/F/NF 条目 → 定位 progress.md 对应的任务 -2. 检查已实现的 Skill 是否需要返工 -3. 评估是否需要在 progress.md 新增任务 - -**动作**:更新 progress.md 任务列表 + 必要时调整 roadmap.md 里程碑。 - -### §8.5 场景 D:自触发级联(Claude 修改上游文档) - -**触发**:当前任务本身要求修改 vision.md / design.md / requirements.md。由 §3 第 5 条或反向反馈触发。 - -**与场景 A/B/C 的区别**: -- 不需要"提示用户建议评估"——Claude 自己就是修改者,直接评估 -- 变更内容已知——不需要 diff,直接从修改内容出发分析影响 - -**处理步骤**: - -1. 完成上游文档修改并 git commit -2. 明确记录本次修改了什么(哪个文档、哪些章节、变更性质) -3. 根据修改的文档级别,按场景 A/B/C 对应的影响分析维度,评估对下游的影响 -4. 检查 progress.md "当前任务"表中其他"进行中"或"待做"任务是否受影响 - - 受影响的任务:在"说明"列添加备注 `[design.md §X 已更新,需适配]` - - 需要新增任务:立即添加到 progress.md(遵循 §5 关联条目填写要求) -5. 在 progress.md "变更记录"添加条目,原因列标注"自触发:任务 [任务名]" - -### §8.6 场景 E:roadmap.md 变更 - -**触发**:里程碑增删改、Phase 调整、验证计划变更。 - -**影响分析**: - -1. 识别受影响的里程碑 → 检查 progress.md 对应任务 -2. 评估任务定义是否需要调整(关联条目、里程碑归属) - -**动作**:更新 progress.md 任务列表。roadmap.md 是权威链终点,不再向下级联。 - -### §8.7 级联执行清单(通用) - -1. 识别变更范围(哪个文档、哪个章节/条目) -2. 沿权威链向下追踪影响 -3. 对每个受影响的下游文档:读取、评估、更新或标记待审 -4. 在 progress.md "变更记录"添加条目:`| 日期 | 变更描述 | 原因 |` -5. 若有进行中的任务受影响,在 progress.md "当前任务"的"说明"列添加备注 -6. 若变更涉及 `.claude/` 文件,评估 `rules/` 下其他文件是否受影响 - -### §8.8 陈旧标记 - -使用 HTML 注释标记在下游文档的受影响位置: - -``` - -``` - -解决后移除标记。 - -### §8.9 变更决策记录 - -所有级联处理的结果记入 progress.md "变更记录"表。 - -格式:`| 日期 | [源文档] 变更:[内容] → [下游影响] | [原因] |` +详细步骤见 `.claude/references/cascade-procedures.md`(按需加载,仅在 §1 检测到上游变更时使用)。§0 速查表中的级联速查已足够日常使用。 diff --git a/.claude/rules/info-architecture.md b/.claude/rules/info-architecture.md new file mode 100644 index 0000000..2a8d36d --- /dev/null +++ b/.claude/rules/info-architecture.md @@ -0,0 +1,197 @@ +# 信息架构规则(devpace 适配) + +> **职责**:将通用 IA 原则(`references/ia-principles.md`)映射到 devpace 项目结构(`references/plugin-info-layers.md`)的具体规则。 + +Plugin 不是程序,是通过组织化 Markdown 塑造 LLM 行为的**信息架构**。传统软件靠编译器确定性执行;Plugin 靠 LLM 概率性解释——信息组织失误直接导致行为偏差。 + +## §0 速查卡片 + +### 11 原则一览(devpace 层级映射) + +| # | 原则 | 一句话 | 对应层级 | +|---|------|--------|---------| +| IA-1 | 单向依赖 | 信息从抽象流向具体,不可反向 | 全局(层间关系) | +| IA-2 | 抽象分层 | 不同抽象层级放在不同文件 | Layer 6-1 | +| IA-3 | 稳定-易变分离 | 高扇入文件隔离易变内容以保持稳定 | 全局(变更频率) | +| IA-4 | 信息分类 | 不同信息类型不混放 | 全局(类型维度) | +| IA-5 | 按需加载 | 只加载当前上下文所需的最小信息集 | Layer 3-1 | +| IA-6 | 单一权威 | 每条信息有且仅有一个权威定义点 | 全局(同步维度) | +| IA-7 | 确定性分级 | 约束的执行保障级别匹配其关键程度 | Layer 5 + Hooks | +| IA-8 | 可发现性优先 | 入口信息为发现而设计,不为完整而设计 | Layer 3(description) | +| IA-9 | 认知清晰 | 面向 LLM 的指令精确无歧义,阻止合理化绕过 | 全局(精确性) | +| IA-10 | 契约隔离 | 数据格式由独立契约层约束 | Layer 4 | +| IA-11 | 单一职责 | 每个文件只有一个核心职责 | 全局(职责内聚) | + +通用原则定义与约束分级标记:详见 `references/ia-principles.md`。 +六层架构与信息类型映射:详见 `references/plugin-info-layers.md`。 + +## §1 单向依赖(IA-1) + +> **核心**:信息从抽象流向具体——上层定义,下层实现,不可反向引用。 + +**规则**: +1. 分发层(`rules/`、`skills/`、`knowledge/`)不得引用开发层(`docs/`、`.claude/`) `iron rule` +2. 六层架构中,下层文件只引用同层或上层文件 `required` +3. 合法的"反向注释"仅限两种:上层的**映射说明**(解释如何对应到下层)和**权威委托**(显式标记"详见 XXX") `recommended` + +**正确**:`rules/devpace-rules.md` 引用 `knowledge/theory.md`(同层/上层引用) +**错误**:`skills/pace-dev/SKILL.md` 引用 `docs/design/design.md`(分发层 → 开发层) + +**检测**:`grep -r "docs/\|\.claude/" rules/ skills/ knowledge/` 应返回空 + +**预防合理化**:`"只是引用一下设计文档的章节编号" → 分发层必须独立可分发,任何开发层引用都破坏这一点` + +## §2 抽象分层(IA-2) + +> **核心**:不同抽象层级的内容放在不同文件——路由与步骤分离,Manifest 与组件分离。 + +**规则**: +1. `.claude-plugin/` 仅放 `plugin.json` + `marketplace.json`;所有组件在 Plugin 根目录 `iron rule` +2. SKILL.md 放路由逻辑("做什么");详细步骤超 ~50 行时拆出 `*-procedures.md`("怎么做") `recommended` +3. 每个文件包含单一主抽象层级 `recommended` + +**正确**:SKILL.md 含 100 行路由表 + 输入/输出定义;procedures 文件含分步指令 +**错误**:SKILL.md 内嵌 500 行详细指令与路由逻辑混杂 + +**检测**:SKILL.md 超过 ~500 行 → 疑似层级混杂,检查是否需要分拆 + +**预防合理化**:`"内容不多,放一起方便" → 超过 50 行详细规则就该拆;合并不是方便,是让 LLM 混淆路由与执行` + +## §3 稳定-易变分离(IA-3) + +> **核心**:高扇入(被多处引用)的文件必须保持稳定——通过将易变内容隔离到独立文件实现。 + +**规则**: +1. 被多个 Skill 引用的文件(schema、theory)维持低变更频率 `recommended` +2. 始终加载的文件(rules)控制体量;膨胀时将稳定核心与易变索引分拆 `recommended` +3. 加载策略与稳定性对齐:最稳定的被动加载(按引用);最易变的条件路由加载 `recommended` + +**检测**:`git log --oneline | wc -l` — 被多处引用的文件若变更频率高于其依赖者,考虑分拆 + +## §4 信息分类(IA-4) + +> **核心**:不同类型的信息不混放——步骤、约束、概念、结构、路由、实例各归其位。 + +**规则**: +1. 一个文件包含一种主信息类型 `recommended` +2. 任务内容(执行指令)与参考内容(背景知识)分开存放 `recommended` +3. 背景知识放 `knowledge/`,约束放 `rules/`,执行步骤放 `*-procedures.md` `recommended` + +**检测**:审查文件内容,若同一文件包含大段理论 + 分步指令 + 格式定义,则违反 IA-4 + +## §5 按需加载(IA-5) + +> **核心**:只加载当前上下文所需的最小信息集——上下文窗口是硬约束,浪费即降质。 + +**规则**: +1. `description` 仅提供触发信息;完整内容在激活后才加载 `iron rule` +2. 复杂 Skill 使用两级延迟加载:SKILL.md(路由)→ procedures(操作) `recommended` +3. `knowledge/` 文件被动加载(按引用),不主动注入上下文 `recommended` +4. 上下文预算:description < 300 字符,SKILL.md < 500 行,单次执行加载 < 800 行 `recommended` + +加载策略详见 `references/plugin-info-layers.md`。 + +**正确**:SKILL.md 加载 80 行路由表,根据当前状态加载一个 200 行 procedures 文件 +**错误**:SKILL.md 一次性加载全部 6 个 procedures 文件(1200+ 行),不论当前状态 + +**检测**:衡量每次 Skill 调用消耗的上下文 token;单次加载超 800 行需警示 + +**预防合理化**:`"提前加载省得来回切换" → 多加载的内容不只浪费 token,还会污染推理(步骤泄漏)` + +## §6 单一权威(IA-6) + +> **核心**:每条信息有且仅有一个权威定义点——DRY 从"消除重复"变为"管理重复"。 + +**规则**: +1. 每个信息项有一个权威文件;派生文件注明来源 `recommended` +2. 多处出现的内容使用"源 → 派生"树模型管理 `recommended` +3. 变更沿 源 → 派生 方向传播;派生文件不得单方面修改权威内容 `recommended` + +**合法反 DRY 场景**: +- §0 速查摘要——为快速索引的刻意冗余(IA-5 按需加载优先于 IA-6) +- 权威委托标记——上层用"(权威源)"后缀委托下层详细定义 + +**正确**:SKILL.md 定义子命令(源)→ rules.md §0 索引子命令(派生,注明"详见 SKILL.md") +**错误**:SKILL.md 和 rules.md 各自独立定义子命令行为,无权威声明 + +**检测**:对每个多处出现的内容,确认有一个文件被显式标记为权威;grep 确认派生文件引用它 + +## §7 确定性分级(IA-7) + +> **核心**:约束的执行保障级别必须匹配其关键程度——安全关键规则不能只靠文本。 + +**规则**: +1. 约束按关键程度选择执行机制(四级保障详见 `references/component-reference.md`) `iron rule` +2. 高级安全规则采用"三重保险":Hook 阻断 + Rules 声明 + 铁律标记 `recommended` + +**检测**:每条铁律都有对应的 Hook 或自动化测试执行保障 + +## §8 可发现性优先(IA-8) + +> **核心**:入口信息为发现(When)而设计,不为完整(What)而设计。 + +**规则**: +1. SKILL.md `description` 仅写触发条件(When),不写行为描述(What) `iron rule` +2. `description` 使用具体触发关键词("Use when user says '开始做/实现/修复'") `recommended` +3. 每个 rules/ 和 _schema/ 文件提供 §0 速查卡片 `recommended` + +**正确**:`description: Use when user requests development work or says "implement/fix/build"` +**错误**:`description: Analyzes code quality, runs gate checks, generates diff summary, updates status` + +**检测**:审查所有 `description` 字段——不应包含描述内部动作的动词 + +**预防合理化**:`"写清楚做什么帮助用户理解" → description 的消费者是 Claude 的路由逻辑,不是用户;写了 What 会导致 Claude 跳过读完整 SKILL.md 直接按摘要行动` + +## §9 认知清晰(IA-9) + +> **核心**:面向 LLM 的指令精确无歧义——阻止合理化绕过。 + +**规则**: +1. 平台约束(frontmatter 字段、Hook 大小写、路径格式、零摩擦入门)严格遵循官方规格(详见 `plugin-dev-spec.md`) `iron rule` +2. 铁律配反合理化清单——列举可能的绕过借口及反驳 `recommended` + +**正确**:铁律 + 反合理化:"IR: 需人工审批。绕过借口:'太简单了' → 简化审批已覆盖此场景;'用户信任我' → 信任 ≠ 跳过" +**错误**:模糊规则:"重要变更大概应该找人审一下" + +**检测**:每条铁律可被自动化测试或确定性检查验证 + +**预防合理化**:`"这条规则含义很明确,不需要反合理化清单" → 恰恰是'看起来明确'的规则最容易被合理化绕过,因为 LLM 会找到你没想到的边界情况` + +## §10 契约隔离(IA-10) + +> **核心**:共享数据格式由独立契约层定义——生产方和消费方都依赖契约,不依赖对方内部实现。 + +**规则**: +1. 有状态交互的 Plugin 定义独立 Schema 文件 `recommended` +2. Skill 输出必须符合 Schema 定义 `recommended` +3. Schema 变更支持影响追溯(哪些 Skill 受影响) `recommended` + +**契约交互模型**: + +``` +Skill A(生产方) Schema(契约) Skill B(消费方) +写入 state.md --> record-format.md <-- 读取 state.md +符合 Schema 定义字段/类型/ 按 Schema 校验 + 必填章节 +``` + +**正确**:Skill A(写入)→ `knowledge/_schema/cr-format.md`(契约)← Skill B(读取)——双方都依赖 Schema +**错误**:Skill A 用自创格式写入;Skill B 基于对 Skill A 输出的逆向假设解析 + +**检测**:每个共享状态文件有对应 Schema 文件;自动化测试验证结构合规 + +**预防合理化**:`"只有一个 Skill 写这个文件,不需要 Schema" → 未来会有新 Skill 读取它;Schema 是预防性投资,不是事后补救` + +## §11 单一职责(IA-11) + +> **核心**:每个文件只有一个核心职责——一个主要的变更原因。 + +**与 IA-2/IA-4 的关系**:IA-2 按抽象层级分文件,IA-4 按信息类型分文件,IA-11 提供判断"何时需要分拆"的元标准——当文件内容有多个独立的变更原因时。 + +**规则**: +1. 每个文件有一个可清晰陈述的核心职责 `recommended` +2. 越界内容压缩为引用优先于创建新文件("详见 X") `recommended` + +**检测**:若某章节的变更原因与文件其他部分完全无关,则疑似 SRP 违规 + +**预防合理化**:`"放在一起方便查找" → 职责混杂导致维护时意外副作用;交叉引用同样可达` diff --git a/.claude/rules/plugin-dev-spec.md b/.claude/rules/plugin-dev-spec.md index fa009c3..bc8089c 100644 --- a/.claude/rules/plugin-dev-spec.md +++ b/.claude/rules/plugin-dev-spec.md @@ -1,26 +1,52 @@ # Claude Code 组件开发规范 -> **职责**:开发 devpace Plugin 时 Claude 必须遵循的组件规范。基于官方文档(2026-02)。 +> **职责**:开发 devpace Plugin 时 Claude 必须遵循的组件规范。基于官方文档。 -本项目是一个 Claude Code Plugin。以下是组件开发规范,开发过程中必须遵循。 +## §0 速查卡片 + +### Plugin 目录规则 + +| 位置 | 内容 | 注意 | +|------|------|------| +| `.claude-plugin/` | 仅 `plugin.json` + `marketplace.json` | 组件不放这里 | +| Plugin 根目录 | `commands/`、`agents/`、`skills/`、`hooks/`、`rules/` | 所有组件在此 | + +### SKILL.md 关键字段 + +| 字段 | 用途 | 常见错误 | +|------|------|---------| +| `description` | 触发条件(When),不写行为(What) | 写成行为摘要导致跳过读 SKILL.md | +| `allowed-tools` | 免确认工具,逗号分隔 | — | +| `disable-model-invocation` | `true` = 仅用户可调用 | — | +| `context` | `fork` = 子 agent 运行 | — | + +### 分拆模式 + +SKILL.md 放"做什么"(输入/输出/路由),详细规则超 ~50 行拆出 `*-procedures.md`("怎么做")。参考 `pace-dev/` 和 `pace-change/`。 + +### 查证优先级 + +1. `claude-code-guide` agent → 2. 官方文档 `code.claude.com/docs/en/` → 3. `claude --debug` + +### 组件参考 + +Agent / Hook / MCP 详细规格 → `.claude/references/component-reference.md`(按需加载) ## Plugin 结构 ``` -devpace/ # Plugin 根目录 -├── .claude-plugin/ # plugin.json + marketplace.json -├── commands/ # 命令文件(根目录,不在 .claude-plugin/ 内) -├── agents/ # Agent 定义(根目录) -├── skills/ # Skill 目录(根目录,自动发现) -├── hooks/ # Hook 配置 -├── rules/ # Rules 文件(自动加载) -├── output-styles/ # 输出风格定义(plugin.json outputStyles 声明) -├── settings.json # Plugin 默认配置(Agent 设置等) -└── .mcp.json # MCP Server 配置 +devpace/ +├── .claude-plugin/ +├── commands/ +├── agents/ +├── skills/ +├── hooks/ +├── rules/ +├── output-styles/ +├── settings.json +└── .mcp.json ``` -**关键规则**:只有 `plugin.json` 和 `marketplace.json` 放在 `.claude-plugin/` 内,其余组件(commands/、agents/、skills/、hooks/)必须在 Plugin 根目录。 - ## plugin.json 当前采用最小格式,仅含 `name`、`description`、`author`。`name` 是唯一必填字段(当 manifest 存在时),同时作为 Skill 的命名空间前缀(`devpace:pace-init`)。 @@ -65,8 +91,6 @@ devpace/ # Plugin 根目录 **字符串替换**:Skill 内容中可使用 `$ARGUMENTS`(全部参数)、`$0`/`$1`(按位参数)、`` !`command` ``(预处理器,执行 shell 命令并替换输出)。 -**分拆模式**:SKILL.md 放"做什么"(输入/输出/高层步骤),详细规则超 ~50 行时拆出 `*-procedures.md`("怎么做")。参考 `pace-dev/` 和 `pace-change/`。 - ### SKILL.md 章节顺序规范 标准章节顺序如下(可选章节仅在需要时出现): @@ -82,85 +106,6 @@ devpace/ # Plugin 根目录 ## 输出 ``` -## Agent 定义 - -Agent 文件放在 `agents/` 目录,frontmatter 必须包含 `name` 和 `description`。 - -**合法 frontmatter 字段**:`name`(必填)、`description`(必填)、`tools`、`disallowedTools`、`model`、`color`、`permissionMode`、`maxTurns`、`skills`、`mcpServers`、`memory`、`hooks`、`isolation`。 - -`color` 字段:Agent 在 UI 中的背景色标识。官方 Plugin(feature-dev、plugin-dev)使用此字段。可选值:`blue`、`cyan`、`green`、`yellow`、`red`、`magenta`。 - -`memory` 字段(`user`/`project`/`local`):Agent 记忆自动持久化到 `~/.claude/agent-memory//`(user)或 `.claude/agent-memory//`(project)。下次 fork 时自动加载。 - -`isolation` 字段(`worktree`):Agent 在独立的 git worktree 中运行。无变更时自动清理。 - -通过 Task 工具调用:`Task(subagent_type="agent-name", prompt="...", description="...")`。子 agent 不能再嵌套调用 Task。 - -## Hooks - -Hook 事件名称区分大小写。可用事件: - -| 事件 | 触发时机 | 可阻断? | -|------|---------|---------| -| `PreToolUse` | 工具执行前 | 是(exit 2) | -| `PostToolUse` | 工具执行成功后 | 否 | -| `PostToolUseFailure` | 工具执行失败后 | 否 | -| `UserPromptSubmit` | 用户提交 prompt | 是 | -| `PreCompact` | 上下文压缩前(manual/auto) | 否 | -| `Stop` | Claude 完成响应 | 是 | -| `SessionStart` / `SessionEnd` | 会话开始/结束 | 否 | -| `SubagentStart` / `SubagentStop` | 子 agent 启停 | 部分 | -| `TeammateIdle` / `TaskCompleted` | 团队协作事件 | 是(exit 2) | - -配置位置(按优先级):managed settings → `.claude/settings.json`(项目共享)→ `.claude/settings.local.json`(项目本地)→ `~/.claude/settings.json`(全局)→ Plugin `hooks/hooks.json`。 - -Hook 脚本中使用 `${CLAUDE_PLUGIN_ROOT}` 引用 Plugin 根目录。exit 0 = 成功,exit 2 = 阻断,其他 = 非阻断错误。 - -### Hook 类型 - -| 类型 | 说明 | 超时默认 | -|------|------|---------| -| `command` | 执行 shell 命令,通过 stdin 接收 JSON 输入 | 无默认 | -| `prompt` | LLM 评估 prompt 内容,决定是否放行 | 30s | -| `agent` | LLM agent 执行(有工具访问权限),决定是否放行 | 60s | - -`command` 类型额外支持 `"async": true`(后台执行,不阻塞主流程)。`prompt`/`agent` 类型具有语义理解能力,适合替代简单的正则匹配做复杂判断。 - -### Skill 级 Hooks - -SKILL.md 的 `hooks` frontmatter 字段支持定义仅在该 Skill 激活时生效的 Hook: - -```yaml -hooks: - PreToolUse: - - matcher: - tool_name: "Write|Edit" - hooks: - - type: prompt - prompt: "验证此写入是否合法..." - timeout: 15 -``` - -Skill 级 Hook 与全局 hooks.json 互补——全局做通用检查,Skill 级做精细控制。 - -## MCP Server 配置 - -项目级配置放在根目录 `.mcp.json`,格式: - -```json -{ - "mcpServers": { - "server-name": { - "command": "path/to/server", - "args": ["--flag"], - "env": { "KEY": "${ENV_VAR}", "KEY2": "${VAR:-default}" } - } - } -} -``` - -Plugin 内部引用路径时使用 `${CLAUDE_PLUGIN_ROOT}`。也可在 `plugin.json` 的 `mcpServers` 字段内联定义。 - ## 常见陷阱 | 问题 | 原因 | 解决 | @@ -173,74 +118,6 @@ Plugin 内部引用路径时使用 `${CLAUDE_PLUGIN_ROOT}`。也可在 `plugin.j | Plugin 路径用绝对路径 | 必须相对且以 `./` 开头 | 改为相对路径 | | MCP 环境变量不展开 | 语法错误 | 使用 `${VAR}` 或 `${VAR:-default}` | -## 规范查证方法 - -不确定时,按优先级查证: - -1. `Task(subagent_type="claude-code-guide", prompt="查询 [具体问题]")`——内置 agent,可访问官方文档 -2. 官方文档:`https://code.claude.com/docs/en/`(plugins、skills、hooks、mcp、sub-agents、agent-teams) -3. `claude --debug` 查看加载日志排查问题 - -### 官方 plugin-dev 工具(推荐) - -Anthropic 官方 plugin-dev Plugin 提供综合开发工具。安装后可用于 devpace 开发验证: - -| 组件 | 用途 | 使用场景 | -|------|------|---------| -| **plugin-validator** Agent | 10 步综合验证(Manifest/目录/Skills/Hooks/安全) | 任何 Plugin 结构变更后 | -| **skill-reviewer** Agent | Skill 质量审查(description/内容/渐进披露) | Skill 新增或修改后 | -| **agent-creator** Agent | AI 辅助 Agent 创建 | 新增 Agent 定义时 | -| `/plugin validate` | 内置命令,验证 plugin.json 基本结构 | 快速检查 | - -安装:`/plugin install plugin-dev@claude-plugins-official` - ## skill-creator 集成约定 -`/skill-creator` 的评估工作区(`skills/-workspace/`)已通过 `.gitignore` 排除入库。使用时遵循以下约定: - -### 产物位置规范 - -| 产物 | 位置 | 入库 | -|------|------|------| -| 评估工作区 | `skills/-workspace/`(skill-creator 默认行为) | 否(.gitignore) | -| behavioral eval | `tests/evaluation//evals.json` | 是(权威源) | -| trigger eval | `tests/evaluation//trigger-evals.json` | 是(权威源) | -| `--static` HTML | `skills/-workspace/iteration-N/review.html` | 否 | -| 跨 Skill 文件 | `tests/evaluation/_cross-cutting/` | 是 | - -**`--static` 路径规则**:当 `generate_review.py` 需要使用 `--static` 模式时,输出路径必须放在工作区内:`/iteration-N/review.html`,不要使用 `/tmp/` 或其他外部路径。 - -### Eval 目录结构 - -``` -tests/evaluation/ -├── _cross-cutting/ # 跨 Skill 全局文件(acceptance-matrix、shared-assertions 等) -├── pace-init/ # per-Skill 子目录(目录名 = Skill 名) -│ ├── evals.json # behavioral eval(统一命名,不带 Skill 名前缀) -│ └── trigger-evals.json # trigger eval -├── pace-dev/ -│ ├── evals.json -│ └── trigger-evals.json -└── ... -``` - -**命名规则**: -- 子目录名 = Skill 目录名(如 `pace-dev`) -- eval 文件统一命名 `evals.json` 和 `trigger-evals.json`(由目录区分 Skill) -- `_cross-cutting/` 前缀 `_` 确保排在最前 - -### 三层评估体系 - -| 层级 | 文件 | 用途 | 执行时机 | -|------|------|------|---------| -| T1 Trigger | `trigger-evals.json` | 触发精度(~20 查询/Skill) | description 修改后 | -| T2 Behavioral | `evals.json` | 行为正确性断言 | procedures/SKILL.md 修改后 | -| T3 Full Cycle | skill-creator 完整流程 | with/without 对比 + grading | 重大重构/新 Skill | - -### Eval 格式权威来源 - -eval JSON 格式由 **skill-creator 自身**(`references/schemas.md`)定义。devpace 不在 `knowledge/_schema/` 中重复定义 eval 格式——eval 是开发层关注点,不属于产品层。 - -### 跨 Skill 交叉污染测试 - -创建 trigger eval 时,必须包含兄弟 Skill 的典型查询作为负面测试用例。重点测试对见 `tests/evaluation/_cross-cutting/shared-assertions.md`。 +详见 `.claude/references/skill-creator-integration.md`(按需参考,仅在创建/评估 Skill 时使用)。 diff --git a/.claude/rules/project-structure.md b/.claude/rules/project-structure.md new file mode 100644 index 0000000..e77ddb3 --- /dev/null +++ b/.claude/rules/project-structure.md @@ -0,0 +1,125 @@ +# devpace 项目目录结构 + +> **职责**:文件放置规则。新建文件时查此文件确定目标位置。 + +## §0 速查卡片 + +### 目录-层级映射表 + +| 目录 | 层级 | 自动加载 | 随 Plugin 分发 | 说明 | +|------|------|---------|---------------|------| +| `.claude-plugin/` | 产品 | — | 是 | 仅 plugin.json + marketplace.json | +| `rules/` | 产品 | 是(Rules) | 是 | Plugin 运行时行为规则 | +| `skills/` | 产品 | 按触发 | 是 | Skill 定义(每 Skill 一个目录) | +| `knowledge/` (含 `_schema/`) | 产品 | 按引用 | 是 | 理论、指标、数据格式契约 | +| `hooks/` | 产品 | 事件驱动 | 是 | Hook 脚本 + hooks.json | +| `agents/` | 产品 | 按调用 | 是 | Agent 定义 | +| `output-styles/` | 产品 | 按选择 | 是 | 输出风格定义 | +| `settings.json` | 产品 | 是 | 是 | Plugin 默认配置 | +| `.claude/rules/` | 开发 | 是(Rules) | 否 | 开发规范(自动加载) | +| `.claude/references/` | 开发 | 按需读取 | 否 | 参考文档(不自动加载) | +| `docs/` | 开发 | 否 | 否 | design/、planning/、features/、research/、plans/、brand/、scratch/ | +| `dev-scripts/` | 开发 | 否 | 否 | 开发工具脚本 | +| `eval/` | 开发 | 否 | 否 | Skill 评估自动化工具 | +| `tests/` | 开发 | 否 | 否 | 测试套件 | +| `examples/` | 开发 | 否 | 否 | 示例项目 | +| `.github/` | CI/配置 | — | 否 | GitHub Actions、模板 | +| `.githooks/` | CI/配置 | — | 否 | Git hooks(pre-commit 等) | + +### 配置文件速查 + +**必须**:`.claude-plugin/plugin.json`(manifest) +**可选**:`marketplace.json`、`settings.json`、`hooks/hooks.json`、`Makefile`、`pytest.ini`、`.markdownlint-cli2.jsonc` + +### 禁止事项 + +- 组件放 `.claude-plugin/`(仅 plugin.json + marketplace.json) +- 测试文件散在 `skills/` 中(统一放 `tests/`) +- 分层架构约束(5 条)→ 详见 CLAUDE.md "分层架构"章节 + +## §1 产品层目录结构 + +``` +devpace/ +├── .claude-plugin/ +│ ├── plugin.json +│ └── marketplace.json +├── rules/ +│ └── devpace-rules.md +├── skills/ +│ ├── pace-xxx/ ← SKILL.md + *-procedures*.md +│ └── scripts/ +├── knowledge/ +│ ├── _schema/*-format.md +│ └── *.md +├── hooks/ +│ ├── hooks.json +│ ├── lib/ +│ ├── skill/ +│ └── *.mjs / *.sh +├── agents/pace-*.md +├── output-styles/ +└── settings.json +``` + +## §2 开发层目录结构 + +``` +devpace/ +├── .claude/ +│ ├── CLAUDE.md +│ ├── rules/ +│ └── references/ +├── docs/ +│ ├── design/ +│ ├── planning/ +│ ├── features/ +│ ├── research/ +│ ├── plans/ +│ ├── brand/ +│ └── scratch/ +├── dev-scripts/ +├── eval/ +├── tests/ +│ ├── static/ +│ ├── evaluation/pace-xxx/ +│ ├── hooks/ +│ ├── integration/ +│ └── scenarios/ +└── examples/ +``` + +## §3 新文件放置决策树 + +``` +新文件 +├─ Plugin 运行时需要? +│ ├─ 行为规则 → rules/ +│ ├─ Skill → skills/pace-xxx/ +│ ├─ Schema → knowledge/_schema/ +│ ├─ 参考知识 → knowledge/ +│ ├─ Hook → hooks/(Skill 域 → hooks/skill/) +│ ├─ Agent → agents/ +│ └─ 输出风格 → output-styles/ +├─ 开发规范? +│ ├─ 自动加载规则 → .claude/rules/ +│ └─ 按需参考 → .claude/references/ +├─ 文档?(均在 docs/ 下) +│ → design/ | planning/ | features/ | research/ | plans/ +├─ 测试?(均在 tests/ 下) +│ → static/ | evaluation/pace-xxx/ | hooks/ | integration/ | scenarios/ +├─ Eval 工具? → eval/ +├─ 脚本 → dev-scripts/ +├─ CI/CD → .github/workflows/ +└─ 不确定 → 先问,不要放项目根目录 +``` + +## §4 跨文档引用 + +| 内容 | 参见 | +|------|------| +| 分层架构 5 条硬性约束 | CLAUDE.md "分层架构"章节(权威源) | +| 组件格式(SKILL.md frontmatter 等) | `plugin-dev-spec.md` | +| 文件命名规范 | `common.md` | +| 信息架构原则(IA-1 至 IA-11) | `info-architecture.md` | +| 新建 Skill 同步清单 | `references/sync-checklists.md` | diff --git a/.githooks/pre-commit b/.githooks/pre-commit new file mode 100755 index 0000000..26acc5b --- /dev/null +++ b/.githooks/pre-commit @@ -0,0 +1,15 @@ +#!/bin/bash +# Lightweight pre-commit hook for devpace +# Runs fast quality checks (~seconds) before each commit. +# Enable: git config core.hooksPath .githooks +# or: make setup-hooks + +set -euo pipefail + +echo "▸ pre-commit: layer-check" +make layer-check + +echo "▸ pre-commit: lint" +make lint + +echo "✓ pre-commit checks passed" diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index dcd7ee1..4d8f56a 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -14,12 +14,12 @@ ## Scope / 影响范围 - [ ] Product layer / 产品层(rules/, skills/, knowledge/, .claude-plugin/) -- [ ] Dev layer / 开发层(.claude/, docs/, tests/, scripts/) +- [ ] Dev layer / 开发层(.claude/, docs/, tests/, dev-scripts/) - [ ] Both / 两者 ## Checklist / 检查清单 -- [ ] `bash scripts/validate-all.sh` passes / 通过 +- [ ] `bash dev-scripts/validate-all.sh` passes / 通过 - [ ] Layer separation check passes / 分层检查通过 - [ ] `plugin.json` synced with filesystem / 与实际文件同步 - [ ] New Skills use legal frontmatter fields only / 新 Skill 仅使用合法字段 diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 6238906..5b1c93a 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -61,7 +61,7 @@ jobs: - name: Check layer separation run: | - result=$(grep -r "docs/\|\.claude/" rules/ skills/ knowledge/ 2>/dev/null || true) + result=$(grep -r --exclude-dir='*-workspace' "docs/\|\.claude/" rules/ skills/ knowledge/ 2>/dev/null || true) if [ -n "$result" ]; then echo "::error::Layer separation violation" echo "$result" @@ -90,7 +90,7 @@ jobs: - name: Extract changelog id: changelog run: | - NOTES=$(python3 scripts/extract-changelog.py "${{ steps.version.outputs.version }}") + NOTES=$(python3 dev-scripts/extract-changelog.py "${{ steps.version.outputs.version }}") # Write to file for gh release echo "$NOTES" > /tmp/release-notes.md diff --git a/.github/workflows/validate.yml b/.github/workflows/validate.yml index 76a4360..1d58cef 100644 --- a/.github/workflows/validate.yml +++ b/.github/workflows/validate.yml @@ -6,6 +6,15 @@ on: pull_request: branches: [main] workflow_dispatch: + inputs: + eval_skill: + description: 'Skill name for live eval (e.g. pace-dev, or "all")' + required: false + default: '' + eval_runs: + description: 'Runs per query for live eval' + required: false + default: '3' jobs: lint: @@ -24,7 +33,7 @@ jobs: - name: Check layer separation run: | - result=$(grep -r "docs/\|\.claude/" rules/ skills/ knowledge/ 2>/dev/null || true) + result=$(grep -r --exclude-dir='*-workspace' "docs/\|\.claude/" rules/ skills/ knowledge/ 2>/dev/null || true) if [ -n "$result" ]; then echo "::error::Layer separation violation — product layer references dev layer" echo "$result" @@ -51,3 +60,140 @@ jobs: - name: Run static tests run: pytest tests/static/ -v + + hooks: + name: Hook Tests (Node.js) + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + + - name: Run hook tests + run: | + PASS=0 + FAIL=0 + for f in tests/hooks/test_*.mjs; do + [ -f "$f" ] || continue + if node --test "$f" > /dev/null 2>&1; then + PASS=$((PASS + 1)) + else + echo "FAIL: $f" + node --test "$f" 2>&1 | tail -20 + FAIL=$((FAIL + 1)) + fi + done + echo "Hook tests: $PASS passed, $FAIL failed" + [ "$FAIL" -eq 0 ] + + eval-stale: + name: Eval Staleness Check + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Detect stale evals + run: | + STALE=0 + for skill_dir in skills/pace-*/; do + skill=$(basename "$skill_dir") + case "$skill" in *-workspace) continue;; esac + eval_dir="tests/evaluation/$skill" + [ -d "$eval_dir" ] || continue + skill_ts=$(git log -1 --format=%ct -- "skills/$skill/" 2>/dev/null || echo 0) + eval_ts=$(git log -1 --format=%ct -- "$eval_dir/" 2>/dev/null || echo 0) + if [ "$skill_ts" -gt "$eval_ts" ] 2>/dev/null; then + echo "::warning::$skill — Skill updated after eval (eval may be stale)" + STALE=$((STALE + 1)) + fi + done + if [ "$STALE" -gt 0 ]; then + echo "::warning::$STALE Skill(s) have stale evals — consider updating" + fi + echo "Eval staleness check complete: $STALE stale" + + # P4.1: Offline regression check (zero API cost) + eval-regress: + name: Eval Regression Check (offline) + runs-on: ubuntu-latest + if: github.event_name == 'pull_request' + steps: + - uses: actions/checkout@v4 + + - name: Check for skill changes + id: skill-changes + run: | + CHANGED=$(git diff --name-only origin/main -- skills/ | head -1) + if [ -n "$CHANGED" ]; then + echo "has_changes=true" >> "$GITHUB_OUTPUT" + else + echo "has_changes=false" >> "$GITHUB_OUTPUT" + fi + + - name: Set up Python 3.12 + if: steps.skill-changes.outputs.has_changes == 'true' + uses: actions/setup-python@v5 + with: + python-version: '3.12' + + - name: Install dependencies + if: steps.skill-changes.outputs.has_changes == 'true' + run: pip install -r requirements-dev.txt + + - name: Run offline regression check + if: steps.skill-changes.outputs.has_changes == 'true' + run: | + echo "Comparing baseline vs latest results (no API calls)..." + python3 -m eval regress || { + echo "::error::Eval regression detected — baseline vs latest comparison failed" + exit 1 + } + + - name: Upload regression report + if: steps.skill-changes.outputs.has_changes == 'true' && always() + uses: actions/upload-artifact@v4 + with: + name: eval-regress-report + path: tests/evaluation/regress/latest-report.json + if-no-files-found: ignore + + # P4.2: Live eval (manual dispatch, requires API key) + eval-live: + name: Live Eval (${{ github.event.inputs.eval_skill || 'skipped' }}) + runs-on: ubuntu-latest + if: github.event_name == 'workflow_dispatch' && github.event.inputs.eval_skill != '' + steps: + - uses: actions/checkout@v4 + + - name: Set up Python 3.12 + uses: actions/setup-python@v5 + with: + python-version: '3.12' + + - name: Install dependencies + run: pip install -r requirements-dev.txt + + - name: Run live trigger eval + env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + run: | + SKILL="${{ github.event.inputs.eval_skill }}" + RUNS="${{ github.event.inputs.eval_runs }}" + if [ "$SKILL" = "all" ]; then + make eval-trigger RUNS="$RUNS" + else + make eval-trigger-one S="$SKILL" RUNS="$RUNS" + fi + + - name: Upload eval results + if: always() + uses: actions/upload-artifact@v4 + with: + name: eval-results + path: tests/evaluation/*/results/latest.json + if-no-files-found: ignore diff --git a/.gitignore b/.gitignore index a2874a5..97feb54 100644 --- a/.gitignore +++ b/.gitignore @@ -27,6 +27,9 @@ venv/ # skill-creator 评估工作区(运行时产物,不入库) skills/*-workspace/ +# Eval 运行结果(不入库) +tests/evaluation/*/results/ + # Playwright MCP .playwright-mcp/ diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8cd5ae4..d4cdaef 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -21,7 +21,7 @@ Thank you for your interest in contributing to devpace! This guide covers everyt | Step 2 | Understand the architecture (10 min) | "Project Structure" and "Plugin runtime architecture" sections in this file | | Step 3 | Understand design intent (10 min) | `docs/design/vision.md` + `docs/design/design.md` §0 quick reference | | Step 4 | Understand dev conventions (5 min) | Three files in `.claude/rules/` (`common.md` / `plugin-dev-spec.md` / `dev-workflow.md`) | -| Step 5 | Hands-on verification (2 min) | `make setup && make validate && claude --plugin-dir ./` | +| Step 5 | Hands-on verification (2 min) | `make init && make check && claude --plugin-dir ./` | ### Quick reference for key files @@ -42,11 +42,11 @@ Thank you for your interest in contributing to devpace! This guide covers everyt git clone https://github.com/arch-team/devpace.git cd devpace -# Install test dependencies -pip install -r requirements-dev.txt +# One-step setup (Python deps + git hooks + tool checks) +make init -# Verify environment -make validate +# Quick verification +make check # (Recommended) Install official dev tools # Run in a Claude Code session: @@ -60,7 +60,7 @@ devpace has a strict **layered architecture**. You must understand this before m | Layer | Directories | Purpose | Distributed? | |-------|-------------|---------|:------------:| | **Product layer** | `rules/`, `skills/`, `knowledge/`, `.claude-plugin/`, `hooks/`, `agents/`, `output-styles/`, `settings.json` | Plugin runtime assets delivered to users | Yes | -| **Dev layer** | `.claude/`, `docs/`, `tests/`, `scripts/` | Internal dev conventions and documentation | No | +| **Dev layer** | `.claude/`, `docs/`, `tests/`, `dev-scripts/` | Internal dev conventions and documentation | No | **Hard constraint**: Product layer files must not reference dev layer files (`docs/` or `.claude/`). Verification: @@ -84,7 +84,7 @@ graph LR end subgraph "Dev layer (not distributed)" ClaudeDir[".claude/"] --> Docs["docs/"] - Tests["tests/"] --> Scripts["scripts/"] + Tests["tests/"] --> Scripts["dev-scripts/"] end ClaudeDir -.->|"may reference"| RulesP RulesP -.-x|"must not reference"| Docs @@ -134,7 +134,7 @@ graph TB ```bash # Full validation suite (recommended before PR) -bash scripts/validate-all.sh +bash dev-scripts/validate-all.sh # Markdown linting (product layer) make lint @@ -241,13 +241,13 @@ test(scripts): add hook cross-platform test 1. Create a feature branch from `main` 2. Make changes following the guidelines above -3. Run the full validation suite: `bash scripts/validate-all.sh` +3. Run the full validation suite: `bash dev-scripts/validate-all.sh` 4. Verify plugin loading: `claude --plugin-dir ./` 5. Write a clear PR description explaining what changed and why ### PR Checklist -- [ ] `bash scripts/validate-all.sh` passes +- [ ] `bash dev-scripts/validate-all.sh` passes - [ ] Layer separation check passes (no product → dev references) - [ ] `plugin.json` in sync with actual files - [ ] New Skills use only valid frontmatter fields diff --git a/CONTRIBUTING_zh.md b/CONTRIBUTING_zh.md index 71336c8..be880d0 100644 --- a/CONTRIBUTING_zh.md +++ b/CONTRIBUTING_zh.md @@ -20,7 +20,7 @@ | 第 2 步 | 理解架构(10 min) | 本文件的"项目结构"和"插件运行时架构"两节 | | 第 3 步 | 理解设计意图(10 min) | `docs/design/vision.md` + `docs/design/design.md` §0 速查卡片 | | 第 4 步 | 理解开发规范(5 min) | `.claude/rules/` 三个文件(`common.md` / `plugin-dev-spec.md` / `dev-workflow.md`) | -| 第 5 步 | 动手验证(2 min) | `make setup && make validate && claude --plugin-dir ./` | +| 第 5 步 | 动手验证(2 min) | `make init && make check && claude --plugin-dir ./` | ### 权威文件速查表 @@ -41,11 +41,11 @@ git clone https://github.com/arch-team/devpace.git cd devpace -# 安装测试依赖 -pip install -r requirements-dev.txt +# 一键初始化(Python 依赖 + git hooks + 工具检查) +make init -# 验证环境 -make validate +# 快速验证 +make check # (推荐)安装官方开发工具 # 在 Claude Code 会话中执行: @@ -59,7 +59,7 @@ devpace 有严格的**分层架构**。在做任何修改前必须理解这一 | 层次 | 目录 | 用途 | 是否分发 | |------|------|------|:--------:| | **产品层** | `rules/`、`skills/`、`knowledge/`、`.claude-plugin/`、`hooks/`、`agents/`、`output-styles/`、`settings.json` | 交付给用户的插件运行时资产 | 是 | -| **开发层** | `.claude/`、`docs/`、`tests/`、`scripts/` | 内部开发规范和文档 | 否 | +| **开发层** | `.claude/`、`docs/`、`tests/`、`dev-scripts/` | 内部开发规范和文档 | 否 | **硬性约束**:产品层文件不得引用开发层文件(`docs/` 或 `.claude/`)。验证方法: @@ -83,7 +83,7 @@ graph LR end subgraph "开发层(不分发)" ClaudeDir[".claude/"] --> Docs["docs/"] - Tests["tests/"] --> Scripts["scripts/"] + Tests["tests/"] --> Scripts["dev-scripts/"] end ClaudeDir -.->|"可以引用"| RulesP RulesP -.-x|"禁止引用"| Docs @@ -133,7 +133,7 @@ graph TB ```bash # 完整验证套件(PR 前推荐) -bash scripts/validate-all.sh +bash dev-scripts/validate-all.sh # Markdown 格式检查(产品层) make lint @@ -240,13 +240,13 @@ test(scripts): add hook cross-platform test 1. 从 `main` 创建功能分支 2. 按照上述指南进行修改 -3. 运行完整验证套件:`bash scripts/validate-all.sh` +3. 运行完整验证套件:`bash dev-scripts/validate-all.sh` 4. 验证插件加载:`claude --plugin-dir ./` 5. 编写清晰的 PR 描述,说明改了什么以及为什么 ### PR 检查清单 -- [ ] `bash scripts/validate-all.sh` 通过 +- [ ] `bash dev-scripts/validate-all.sh` 通过 - [ ] 分层检查通过(无产品→开发引用) - [ ] `plugin.json` 与实际文件同步 - [ ] 新 Skill 仅使用合法的 frontmatter 字段 diff --git a/Makefile b/Makefile index bd536c3..4bfca26 100644 --- a/Makefile +++ b/Makefile @@ -1,21 +1,74 @@ -.PHONY: help test validate lint layer-check plugin-load setup clean release-check bump \ - eval-trigger eval-trigger-one eval-behavior eval-coverage eval-stale +.PHONY: help init setup setup-hooks clean check test test-static test-hooks \ + validate lint layer-check plugin-load release-check bump \ + eval-trigger eval-trigger-one eval-behavior eval-behavior-one eval-behavior-all eval-coverage eval-stale eval-all \ + eval-trigger-smoke eval-trigger-deep eval-fix eval-fix-diff eval-fix-apply \ + eval-regress eval-regress-offline eval-baseline-save eval-baseline-diff eval-baseline-save-all \ + eval-trigger-changed + +# Eval 配置 +RUNS ?= 3 +TIMEOUT ?= 90 +SMOKE_N ?= 5 +MAX_TURNS ?= 5 +MODEL ?= + +# Skill 列表(排除 *-workspace 目录) +SKILLS := $(shell ls -d skills/pace-*/ 2>/dev/null | xargs -I{} basename {} | grep -v '\-workspace$$') help: ## 显示帮助 - @grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}' + @awk 'BEGIN {FS = ":.*##"; printf "Usage: make \033[36m\033[0m\n"} \ + /^##@/ {printf "\n\033[1m%s\033[0m\n", substr($$0, 5)} \ + /^[a-zA-Z_-]+:.*?## / {printf " \033[36m%-18s\033[0m %s\n", $$1, $$2}' $(MAKEFILE_LIST) + +##@ Development + +init: setup setup-hooks ## 完整开发环境初始化(依赖 + hooks + lint 工具) + @command -v node >/dev/null 2>&1 || echo "⚠️ Node.js not found — needed for lint & hook tests" + @command -v markdownlint-cli2 >/dev/null 2>&1 || \ + (command -v npx >/dev/null 2>&1 && echo "ℹ️ markdownlint-cli2 will run via npx") || \ + echo "⚠️ markdownlint-cli2 not available — run: npm install -g markdownlint-cli2" + @echo "✅ Development environment ready. Run 'make check' to verify." + +setup: ## 安装 Python 开发依赖 + pip install -r requirements-dev.txt + +setup-hooks: ## 启用本地 git hooks + git config core.hooksPath .githooks + @echo "Git hooks path set to .githooks" + +clean: ## 清理缓存文件和 eval workspace + find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true + find . -type d -name .pytest_cache -exec rm -rf {} + 2>/dev/null || true + find . -type f -name "*.pyc" -delete 2>/dev/null || true + find . -type d -name "*-workspace" -path "*/skills/*" -exec rm -rf {} + 2>/dev/null || true + +plugin-load: ## 以插件模式启动 Claude + @command -v claude >/dev/null 2>&1 || { echo "Error: claude CLI not found. Install: https://docs.anthropic.com/en/docs/claude-code"; exit 1; } + claude --plugin-dir ./ -test: ## 运行静态测试 +##@ Testing + +check: layer-check lint test ## 快速本地验证(无需 claude CLI) + +test-static: ## 仅运行 Python 静态测试 pytest tests/static/ -v +test-hooks: ## 运行 Node.js Hook 测试 + node --test tests/hooks/test_*.mjs + +test: test-static test-hooks ## 运行所有测试(静态 + Hook) + +##@ Quality + +validate: ## 运行完整验证(含集成测试) + bash dev-scripts/validate-all.sh + lint: ## Markdown 格式检查(产品层) npx markdownlint-cli2 "rules/**/*.md" "skills/**/*.md" "knowledge/**/*.md" -validate: ## 运行完整验证 - bash scripts/validate-all.sh - layer-check: ## 检查分层完整性 @echo "检查产品层是否引用开发层..." - @result=$$(grep -r "docs/\|\.claude/" rules/ skills/ knowledge/ 2>/dev/null || true); \ + @result=$$(grep -r --exclude-dir='*-workspace' "docs/\|\.claude/" rules/ skills/ knowledge/ 2>/dev/null || true); \ if [ -z "$$result" ]; then \ echo "通过:产品层未引用开发层"; \ else \ @@ -24,20 +77,11 @@ layer-check: ## 检查分层完整性 exit 1; \ fi -plugin-load: ## 以插件模式启动 Claude - claude --plugin-dir ./ +##@ Release -setup: ## 安装开发依赖 - pip install -r requirements-dev.txt - -clean: ## 清理缓存文件 - find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true - find . -type d -name .pytest_cache -exec rm -rf {} + 2>/dev/null || true - find . -type f -name "*.pyc" -delete 2>/dev/null || true - -release-check: ## 预发布验证(validate-all + 版本一致性) +release-check: ## 预发布验证(validate-all + 版本 + eval 健康) @echo "Running pre-release checks..." - bash scripts/validate-all.sh + bash dev-scripts/validate-all.sh @PLUGIN_V=$$(python3 -c "import json; print(json.load(open('.claude-plugin/plugin.json'))['version'])"); \ MARKET_V=$$(python3 -c "import json; print(json.load(open('.claude-plugin/marketplace.json'))['plugins'][0]['version'])"); \ echo "plugin.json: $$PLUGIN_V"; \ @@ -47,18 +91,19 @@ release-check: ## 预发布验证(validate-all + 版本一致性) exit 1; \ fi; \ echo "Version consistency: OK ($$PLUGIN_V)" + @echo ""; echo "Checking eval freshness..." + @$(MAKE) eval-stale bump: ## 版本 bump(make bump V=1.5.0) @if [ -z "$(V)" ]; then echo "Usage: make bump V=1.5.0"; exit 1; fi - bash scripts/bump-version.sh $(V) + bash dev-scripts/bump-version.sh $(V) -# ── Eval targets ───────────────────────────────────────────────────────── +##@ Eval eval-coverage: ## 报告 eval 覆盖率(哪些 Skill 有/缺 eval) @echo "Eval coverage report:"; \ total=0; covered=0; \ - for skill in $(shell ls -d skills/pace-*/ | xargs -I{} basename {}); do \ - case "$$skill" in *-workspace) continue;; esac; \ + for skill in $(SKILLS); do \ total=$$((total + 1)); \ if [ -f "tests/evaluation/$$skill/evals.json" ] && [ -f "tests/evaluation/$$skill/trigger-evals.json" ]; then \ echo " ✅ $$skill (evals + trigger)"; covered=$$((covered + 1)); \ @@ -74,38 +119,160 @@ eval-coverage: ## 报告 eval 覆盖率(哪些 Skill 有/缺 eval) eval-stale: ## 检测过期 eval(Skill 变更但 eval 未更新) @echo "Stale eval detection:"; \ - for skill in $(shell ls -d skills/pace-*/ | xargs -I{} basename {}); do \ - case "$$skill" in *-workspace) continue;; esac; \ + for skill in $(SKILLS); do \ eval_dir="tests/evaluation/$$skill"; \ [ -d "$$eval_dir" ] || continue; \ - skill_ts=$$(git log -1 --format=%ct -- "skills/$$skill/" 2>/dev/null || echo 0); \ - eval_ts=$$(git log -1 --format=%ct -- "$$eval_dir/" 2>/dev/null || echo 0); \ - if [ "$$skill_ts" -gt "$$eval_ts" ] 2>/dev/null; then \ + skill_ts=$$(git log -1 --format=%ct -- "skills/$$skill/" 2>/dev/null); \ + [ -z "$$skill_ts" ] && continue; \ + eval_ts=$$(git log -1 --format=%ct -- "$$eval_dir/" 2>/dev/null); \ + if [ -z "$$eval_ts" ]; then \ + echo " ⚠️ $$skill — eval has no git history"; continue; \ + fi; \ + if [ "$$skill_ts" -gt "$$eval_ts" ]; then \ echo " ⚠️ $$skill — Skill updated after eval (eval may be stale)"; \ fi; \ done; \ echo "Done." -eval-trigger-one: ## 单 Skill 触发测试(make eval-trigger-one S=pace-dev) +eval-trigger-one: ## 单 Skill 触发测试(make eval-trigger-one S=pace-dev [RUNS=3] [TIMEOUT=90] [MODEL=]) @if [ -z "$(S)" ]; then echo "Usage: make eval-trigger-one S="; exit 1; fi - @eval_file="tests/evaluation/$(S)/trigger-evals.json"; \ - if [ ! -f "$$eval_file" ]; then echo "Error: $$eval_file not found"; exit 1; fi; \ - echo "Running trigger eval for $(S)..."; \ - skill-creator eval-trigger --skill "skills/$(S)" --evals "$$eval_file" + @echo "Running trigger eval for $(S)..." + python3 -m eval trigger --skill "$(S)" --runs $(RUNS) --timeout $(TIMEOUT) --max-turns $(MAX_TURNS) $(if $(MODEL),--model $(MODEL)) eval-trigger: ## 全量触发测试(所有有 trigger-evals.json 的 Skill) - @echo "Running trigger evals for all Skills..."; \ - for skill in $(shell ls -d skills/pace-*/ | xargs -I{} basename {}); do \ - case "$$skill" in *-workspace) continue;; esac; \ - eval_file="tests/evaluation/$$skill/trigger-evals.json"; \ - [ -f "$$eval_file" ] || continue; \ + @start=$$(date +%s); passed=0; failed=0; \ + echo "Running trigger evals for all Skills..."; \ + for skill in $(SKILLS); do \ + [ -f "tests/evaluation/$$skill/trigger-evals.json" ] || continue; \ echo " → $$skill"; \ - skill-creator eval-trigger --skill "skills/$$skill" --evals "$$eval_file" || true; \ - done + if python3 -m eval trigger --skill "$$skill" --runs $(RUNS) --timeout $(TIMEOUT) --max-turns $(MAX_TURNS) $(if $(MODEL),--model $(MODEL)) > /dev/null; then \ + passed=$$((passed + 1)); \ + else \ + failed=$$((failed + 1)); \ + fi; \ + done; \ + elapsed=$$(($$(date +%s) - start)); \ + echo ""; echo "Done in $${elapsed}s ($$passed passed, $$failed failed)"; \ + if [ $$failed -gt 0 ]; then exit 1; fi + +eval-trigger-smoke: ## 快速冒烟测试(runs=1, 每 Skill 取 5 条关键查询) + @start=$$(date +%s); passed=0; failed=0; \ + echo "Running smoke trigger evals..."; \ + for skill in $(SKILLS); do \ + [ -f "tests/evaluation/$$skill/trigger-evals.json" ] || continue; \ + echo " → $$skill"; \ + if python3 -m eval trigger --skill "$$skill" --runs 1 --timeout $(TIMEOUT) --max-turns $(MAX_TURNS) --smoke --smoke-n $(SMOKE_N) $(if $(MODEL),--model $(MODEL)) > /dev/null; then \ + passed=$$((passed + 1)); \ + else \ + failed=$$((failed + 1)); \ + fi; \ + done; \ + elapsed=$$(($$(date +%s) - start)); \ + echo ""; echo "Smoke done in $${elapsed}s ($$passed passed, $$failed failed)"; \ + if [ $$failed -gt 0 ]; then exit 1; fi + +eval-trigger-deep: ## 深度测试(runs=5, 全部查询) + @start=$$(date +%s); passed=0; failed=0; \ + echo "Running deep trigger evals (runs=5)..."; \ + for skill in $(SKILLS); do \ + [ -f "tests/evaluation/$$skill/trigger-evals.json" ] || continue; \ + echo " → $$skill"; \ + if python3 -m eval trigger --skill "$$skill" --runs 5 --timeout $(TIMEOUT) --max-turns $(MAX_TURNS) $(if $(MODEL),--model $(MODEL)) > /dev/null; then \ + passed=$$((passed + 1)); \ + else \ + failed=$$((failed + 1)); \ + fi; \ + done; \ + elapsed=$$(($$(date +%s) - start)); \ + echo ""; echo "Deep done in $${elapsed}s ($$passed passed, $$failed failed)"; \ + if [ $$failed -gt 0 ]; then exit 1; fi + +eval-trigger-changed: ## 仅测试有变更的 Skill(make eval-trigger-changed [BASE=origin/main]) + @echo "Detecting changed skills..."; \ + changed=$$(python3 -m eval changed --base $(or $(BASE),origin/main) 2>/dev/null | grep "changed:" | awk '{print $$NF}'); \ + if [ -z "$$changed" ]; then echo " no skill changes detected"; exit 0; fi; \ + passed=0; failed=0; \ + for skill in $$changed; do \ + [ -f "tests/evaluation/$$skill/trigger-evals.json" ] || continue; \ + echo " → $$skill (changed)"; \ + if python3 -m eval trigger --skill "$$skill" --runs $(RUNS) --timeout $(TIMEOUT) --max-turns $(MAX_TURNS) $(if $(MODEL),--model $(MODEL)) > /dev/null; then \ + passed=$$((passed + 1)); \ + else \ + failed=$$((failed + 1)); \ + fi; \ + done; \ + echo "Changed skills: $$passed passed, $$failed failed"; \ + if [ $$failed -gt 0 ]; then exit 1; fi + +eval-behavior-one: eval-behavior ## eval-behavior 的别名(命名一致性) eval-behavior: ## 单 Skill 行为 eval(make eval-behavior S=pace-dev) @if [ -z "$(S)" ]; then echo "Usage: make eval-behavior S="; exit 1; fi @eval_file="tests/evaluation/$(S)/evals.json"; \ if [ ! -f "$$eval_file" ]; then echo "Error: $$eval_file not found"; exit 1; fi; \ echo "Running behavioral eval for $(S)..."; \ - skill-creator eval --skill "skills/$(S)" --evals "$$eval_file" + bash eval/eval-runner.sh eval --skill "skills/$(S)" --evals "$$eval_file" + +eval-behavior-all: ## 全量行为测试(所有有 evals.json 的 Skill) + @start=$$(date +%s); passed=0; failed=0; \ + echo "Running behavioral evals for all Skills..."; \ + for skill in $(SKILLS); do \ + eval_file="tests/evaluation/$$skill/evals.json"; \ + [ -f "$$eval_file" ] || continue; \ + echo " → $$skill"; \ + if bash eval/eval-runner.sh eval --skill "skills/$$skill" --evals "$$eval_file"; then \ + passed=$$((passed + 1)); \ + else \ + failed=$$((failed + 1)); \ + fi; \ + done; \ + elapsed=$$(($$(date +%s) - start)); \ + echo ""; echo "Done in $${elapsed}s ($$passed passed, $$failed failed)"; \ + if [ $$failed -gt 0 ]; then exit 1; fi + +eval-all: ## 一键运行所有 eval(trigger + behavior + coverage + stale) + @echo "=== Trigger Evals ==="; $(MAKE) eval-trigger; \ + echo ""; echo "=== Behavioral Evals ==="; $(MAKE) eval-behavior-all; \ + echo ""; echo "=== Coverage Report ==="; $(MAKE) eval-coverage; \ + echo ""; echo "=== Stale Detection ==="; $(MAKE) eval-stale + +##@ Eval Optimize + +eval-fix: ## 自动改进 description(make eval-fix S=pace-dev MODEL= [N=5]) + @if [ -z "$(S)" ]; then echo "Usage: make eval-fix S= MODEL= [N=5]"; exit 1; fi + @if [ -z "$(MODEL)" ]; then echo "Error: MODEL is required (e.g. MODEL=claude-sonnet-4-20250514)"; exit 1; fi + python3 -m eval loop --skill "$(S)" --model "$(MODEL)" --iterations $(or $(N),5) --timeout $(TIMEOUT) --max-turns $(MAX_TURNS) + +eval-fix-diff: ## 对比当前 vs 最优 description(make eval-fix-diff S=pace-dev) + @if [ -z "$(S)" ]; then echo "Usage: make eval-fix-diff S="; exit 1; fi + python3 eval/apply.py diff --skill "$(S)" + +eval-fix-apply: ## 应用最优 description 到 SKILL.md(make eval-fix-apply S=pace-dev) + @if [ -z "$(S)" ]; then echo "Usage: make eval-fix-apply S="; exit 1; fi + python3 eval/apply.py apply --skill "$(S)" + +##@ Eval Regression + +eval-regress: ## 全量回归检查(重新运行 eval + 多维对比) + @echo "Running regression check..." + @$(MAKE) eval-trigger + @python3 -m eval regress + +eval-regress-offline: ## 离线回归检查(仅 JSON diff,零 API 调用) + @echo "Running offline regression check (baseline vs latest)..." + @python3 -m eval regress + +eval-baseline-save: ## 将当前 latest.json 保存为 baseline(make eval-baseline-save S=pace-dev) + @if [ -z "$(S)" ]; then echo "Usage: make eval-baseline-save S="; exit 1; fi + python3 -m eval baseline save --skill "$(S)" + +eval-baseline-diff: ## 对比当前结果与基线(make eval-baseline-diff S=pace-dev) + @if [ -z "$(S)" ]; then echo "Usage: make eval-baseline-diff S="; exit 1; fi + python3 -m eval baseline diff --skill "$(S)" + +eval-baseline-save-all: ## 保存所有 Skill 的基线 + @for skill in $(SKILLS); do \ + [ -f "tests/evaluation/$$skill/results/latest.json" ] || continue; \ + echo " saving baseline: $$skill"; \ + python3 -m eval baseline save --skill "$$skill"; \ + done diff --git a/README.md b/README.md index 8e36728..83e4ec3 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ Give your Claude Code projects a steady development pace — requirements change, rhythm stays. +> A development harness for Claude Code — rules, schemas, gates, and feedback loops that keep AI-assisted development traceable and measurable. + ![version](https://img.shields.io/github/v/release/arch-team/devpace?label=version) ![license](https://img.shields.io/badge/license-MIT-green) ![type](https://img.shields.io/badge/Claude%20Code-Plugin-purple) ## Why devpace diff --git a/README_zh.md b/README_zh.md index 38e2cf2..6230565 100644 --- a/README_zh.md +++ b/README_zh.md @@ -4,6 +4,8 @@ 给 Claude Code 项目一个稳定的研发节奏——需求在变,节奏不乱。 +> Claude Code 的研发节奏 harness——用规则、Schema、门禁和反馈循环,让 AI 辅助开发可追溯、可度量。 + ![version](https://img.shields.io/github/v/release/arch-team/devpace?label=version) ![license](https://img.shields.io/badge/license-MIT-green) ![type](https://img.shields.io/badge/Claude%20Code-Plugin-purple) ## 为什么需要 devpace diff --git a/scripts/bump-version.sh b/dev-scripts/bump-version.sh similarity index 91% rename from scripts/bump-version.sh rename to dev-scripts/bump-version.sh index c00708a..1e53a58 100755 --- a/scripts/bump-version.sh +++ b/dev-scripts/bump-version.sh @@ -2,12 +2,12 @@ # Bump version number across all project files and optionally commit + tag. # # Usage: -# bash scripts/bump-version.sh # bump only -# bash scripts/bump-version.sh --commit # bump + commit -# bash scripts/bump-version.sh --tag # bump + commit + tag -# bash scripts/bump-version.sh --release # bump + commit + tag + push + trigger CI release +# bash dev-scripts/bump-version.sh # bump only +# bash dev-scripts/bump-version.sh --commit # bump + commit +# bash dev-scripts/bump-version.sh --tag # bump + commit + tag +# bash dev-scripts/bump-version.sh --release # bump + commit + tag + push + trigger CI release # -# Example: bash scripts/bump-version.sh 1.5.1 --release +# Example: bash dev-scripts/bump-version.sh 1.5.1 --release set -euo pipefail @@ -184,6 +184,6 @@ fi if [ -z "$ACTION" ]; then echo "Next steps:" echo " 1. Edit CHANGELOG.md — fill the v$NEW_VERSION section" - echo " 2. bash scripts/bump-version.sh $NEW_VERSION --release" + echo " 2. bash dev-scripts/bump-version.sh $NEW_VERSION --release" echo " (or manually: git commit → git tag v$NEW_VERSION → git push --tags)" fi diff --git a/scripts/extract-changelog.py b/dev-scripts/extract-changelog.py similarity index 100% rename from scripts/extract-changelog.py rename to dev-scripts/extract-changelog.py diff --git a/scripts/record-demos/README.md b/dev-scripts/record-demos/README.md similarity index 100% rename from scripts/record-demos/README.md rename to dev-scripts/record-demos/README.md diff --git a/scripts/record-demos/gif-1-pace-init.tape b/dev-scripts/record-demos/gif-1-pace-init.tape similarity index 100% rename from scripts/record-demos/gif-1-pace-init.tape rename to dev-scripts/record-demos/gif-1-pace-init.tape diff --git a/scripts/record-demos/gif-2-natural-language-dev.tape b/dev-scripts/record-demos/gif-2-natural-language-dev.tape similarity index 100% rename from scripts/record-demos/gif-2-natural-language-dev.tape rename to dev-scripts/record-demos/gif-2-natural-language-dev.tape diff --git a/scripts/record-demos/gif-3-cross-session-restore.tape b/dev-scripts/record-demos/gif-3-cross-session-restore.tape similarity index 100% rename from scripts/record-demos/gif-3-cross-session-restore.tape rename to dev-scripts/record-demos/gif-3-cross-session-restore.tape diff --git a/scripts/validate-all.sh b/dev-scripts/validate-all.sh similarity index 76% rename from scripts/validate-all.sh rename to dev-scripts/validate-all.sh index 8a44808..b36a49c 100755 --- a/scripts/validate-all.sh +++ b/dev-scripts/validate-all.sh @@ -1,7 +1,7 @@ #!/bin/bash # validate-all.sh — Run all devpace static validations # -# Usage: bash scripts/validate-all.sh +# Usage: bash dev-scripts/validate-all.sh # Exit: 0 = all pass, 1 = failures detected set -euo pipefail @@ -58,19 +58,8 @@ fi echo "" -# ── Tier 1.5: Layer separation quick-check (redundant with pytest) ───── -echo -e "${YELLOW}[Tier 1.5] Layer separation grep check${NC}" - -LAYER_VIOLATIONS=$(grep -r --exclude-dir='*-workspace' "docs/\|\.claude/" "$PROJECT_ROOT/rules/" "$PROJECT_ROOT/skills/" "$PROJECT_ROOT/knowledge/" 2>/dev/null || true) -if [ -z "$LAYER_VIOLATIONS" ]; then - echo -e "${GREEN} ✓ No product→dev layer references${NC}" -else - echo -e "${RED} ✗ Product layer references dev layer:${NC}" - echo "$LAYER_VIOLATIONS" - FAILURES=$((FAILURES + 1)) -fi - -echo "" +# ── Tier 1.5: Layer separation — covered by pytest test_layer_separation.py ── +# (No standalone grep check; pytest is the SSOT for layer separation validation) # ── Tier 1.7: Token budget check ───────────────────────────────────── echo -e "${YELLOW}[Tier 1.7] Token budget check${NC}" @@ -90,6 +79,40 @@ echo -e " ℹ Product layer total: ${TOTAL_PRODUCT} lines" echo "" +# ── Tier 1.9: Hook tests (Node.js) ───────────────────────────────────── +echo -e "${YELLOW}[Tier 1.9] Hook tests (Node.js)${NC}" + +HOOKS_TEST_DIR="$PROJECT_ROOT/tests/hooks" +if [ -d "$HOOKS_TEST_DIR" ]; then + if command -v node &>/dev/null; then + HOOK_PASS=0 + HOOK_FAIL=0 + for test_file in "$HOOKS_TEST_DIR"/test_*.mjs; do + [ -f "$test_file" ] || continue + if node --test "$test_file" >/dev/null 2>&1; then + HOOK_PASS=$((HOOK_PASS + 1)) + else + HOOK_FAIL=$((HOOK_FAIL + 1)) + echo -e "${RED} ✗ $(basename "$test_file")${NC}" + fi + done + if [ "$HOOK_FAIL" -eq 0 ] && [ "$HOOK_PASS" -gt 0 ]; then + echo -e "${GREEN} ✓ Hook tests passed (${HOOK_PASS}/${HOOK_PASS})${NC}" + elif [ "$HOOK_PASS" -eq 0 ] && [ "$HOOK_FAIL" -eq 0 ]; then + echo -e "${YELLOW} ⚠ No hook test files found${NC}" + else + echo -e "${RED} ✗ Hook tests: ${HOOK_PASS} passed, ${HOOK_FAIL} failed${NC}" + FAILURES=$((FAILURES + 1)) + fi + else + echo -e "${YELLOW} ⚠ node not found — skipping hook tests${NC}" + fi +else + echo -e "${YELLOW} ⚠ tests/hooks/ directory not found${NC}" +fi + +echo "" + # ── Tier 2: Integration test (optional — requires claude CLI) ────────── echo -e "${YELLOW}[Tier 2] Integration test (plugin loading)${NC}" diff --git a/docs/brand/devpace-one-pager.md b/docs/brand/devpace-one-pager.md deleted file mode 100644 index b20100a..0000000 --- a/docs/brand/devpace-one-pager.md +++ /dev/null @@ -1,163 +0,0 @@ -# devpace — One-Pager (Product Showcase) - -> 以下为 PPT 单页内容脚本。布局建议:深色背景 + 亮色强调,左右分栏或上下三段式。 - ---- - -## 标题区 - -**devpace** -*给 AI 编程一个稳定的研发节奏* - -副标题:Claude Code 首个 BizDevOps 研发管理插件 — 从业务目标到代码交付,全链路可追溯 - ---- - -## 左栏:痛点 → 方案 - -### 没有 devpace 时 - -| 痛点 | 真实场景 | -|------|---------| -| 每次开聊都要"从头解释" | 会话中断后,Claude 忘了你做到哪 | -| 需求一变就失控 | 改一个需求,不知道影响多少在做的功能 | -| 质量全靠自觉 | Claude 有时跳过测试、忘了检查 | -| 代码和业务目标脱节 | 做着做着偏离方向,回头才发现白做 | -| 交付过程黑盒 | 组织要审计,但 Claude 的决策不可追溯 | - -### 有 devpace 后 - -``` -会话断了 → 自动恢复上下文,零手动解释 -需求变了 → 即时影响分析,有序调整 -质量跑偏 → 4 级质量门自动拦截 -方向偏了 → 业务目标→功能→代码,始终可追溯 -要审计 → 完整决策轨迹 + DORA 度量 -``` - ---- - -## 中栏:核心架构图(视觉焦点) - -``` - ┌─────────────────────────────────────────────┐ - │ BizDevOps 价值交付链路 │ - │ │ - │ 业务目标(OBJ) → 产品功能(PF) → 代码变更(CR) │ - │ WHY WHAT HOW │ - │ │ - │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌─────┐│ - │ │创建 │→│开发 │→│验证 │→│审批 │→│合并 ││ - │ │ │ │ │ │ │ │ │ │ ││ - │ └──────┘ └──────┘ └──────┘ └──────┘ └─────┘│ - │ ↑Gate1 ↑Gate2 ↑Gate3 │ - │ Claude 自动 Claude 自动 人类审批 │ - │ │ - │ ⟲ 需求变更? → 影响分析 → 有序调整 → 继续推进 │ - └─────────────────────────────────────────────┘ -``` - ---- - -## 右栏:18 个智能命令,覆盖完整研发周期 - -### 规划 - -| 命令 | 一句话 | -|------|-------| -| `/pace-init` | 一句话初始化项目,2 步开始 | -| `/pace-plan` | 迭代规划 + 智能排期 | -| `/pace-next` | AI 推荐下一步最重要的事 | - -### 开发 - -| 命令 | 一句话 | -|------|-------| -| `/pace-dev` | 进入推进模式,绑定 CR 写代码 | -| `/pace-change` | 需求变了?影响分析 + 有序调整 | -| `/pace-review` | 代码审查 + 对抗式检验 | -| `/pace-test` | 三层测试管理(执行/策略/AI 验收) | - -### 交付 - -| 命令 | 一句话 | -|------|-------| -| `/pace-release` | 发布管理全生命周期 | -| `/pace-sync` | 同步到 GitHub/Linear/Jira | -| `/pace-guard` | 风险预判 + 实时监控 | - -### 洞察 - -| 命令 | 一句话 | -|------|-------| -| `/pace-status` | 项目全景仪表盘 | -| `/pace-retro` | 迭代回顾 + DORA 度量 | -| `/pace-role` | 切换视角(PM/Dev/Tester/Ops) | -| `/pace-trace` | 追溯任何 AI 决策的完整轨迹 | - ---- - -## 底部:差异化定位 - -``` - 任务管理工具 devpace - (TodoWrite/Issues) (BizDevOps 研发管理) - ┌──────────────────────┐ ┌──────────────────────────┐ - │ 核心:任务列表 │ │ 核心:价值交付链路 │ - │ 变更:计划是稳定的 │ │ 变更:变更是一等公民 │ - │ Claude:执行者 │ │ Claude:自治协作者 │ - │ 追溯:任务→代码 │ │ 追溯:目标→功能→变更→代码 │ - │ 度量:完成数 │ │ 度量:质量 + 价值对齐 │ - └──────────────────────┘ └──────────────────────────┘ -``` - ---- - -## 快速开始 & CTA - -```bash -# 一行安装 -/plugin install devpace@devpace - -# 两步启动 -/pace-init my-project -# → Claude 问你一句话,然后你就可以开始了 -``` - -**需求永远在变,但研发节奏不应该因此失控。** - -GitHub: github.com/anthropics-contrib/devpace | License: MIT - ---- - -## PPT 设计建议 - -### 布局方案(推荐 16:9) - -``` -┌─────────────────────────────────────────────────────────────┐ -│ LOGO + 标题 + 副标题 快速开始 │ -├──────────────────┬──────────────────┬───────────────────────┤ -│ │ │ │ -│ 痛点→方案 │ 核心架构图 │ 18 个命令 │ -│ (5 行对比) │ (价值链 + 状态机) │ (分 4 组,图标) │ -│ │ │ │ -├──────────────────┴──────────────────┴───────────────────────┤ -│ 差异化对比表 CTA: 一行安装 + slogan │ -└─────────────────────────────────────────────────────────────┘ -``` - -### 配色建议 - -- 背景:#0F172A(深蓝黑) -- 主强调:#38BDF8(天蓝) -- 副强调:#A78BFA(薰衣草紫) -- 文字:#F8FAFC(近白) -- 成功色:#34D399(翠绿,用于 Gate 通过等) -- 警告色:#FBBF24(琥珀,用于痛点高亮) - -### 字体建议 - -- 标题:Inter Bold / 思源黑体 Bold -- 正文:Inter Regular / 思源黑体 Regular -- 代码:JetBrains Mono diff --git a/docs/brand/devpace-one-pager.pptx b/docs/brand/devpace-one-pager.pptx deleted file mode 100644 index 58a4243..0000000 Binary files a/docs/brand/devpace-one-pager.pptx and /dev/null differ diff --git a/model.drawio b/docs/design/model.drawio similarity index 100% rename from model.drawio rename to docs/design/model.drawio diff --git a/principle.md b/docs/design/principles-notes.md similarity index 100% rename from principle.md rename to docs/design/principles-notes.md diff --git a/docs/design/skill-dependencies.md b/docs/design/skill-dependencies.md index 10408e3..1dc961a 100644 --- a/docs/design/skill-dependencies.md +++ b/docs/design/skill-dependencies.md @@ -1,10 +1,12 @@ # Skill 间依赖关系 -> **职责**:记录 devpace 18 个 Skill + 6 个 Hook 之间的耦合关系,便于变更影响评估。 +> **职责**:记录 devpace 19 个 Skill + 11 个 Hook(8 全局 + 3 Skill 级)之间的耦合关系,便于变更影响评估。 > > **维护规则**:修改任何 Skill 的 procedures 文件或 Hook 逻辑时,对照本文档评估是否需要同步更新关联方。 +> +> **最后更新**:2026-03-14 -## §0 速查:三大耦合集群 +## §0 速查:四大耦合集群 ``` ┌─────────────────────────────────────────────────────────┐ @@ -12,28 +14,43 @@ │ │ │ pace-dev ──writes──> CR files ──reads──> pace-review │ │ pace-dev ──triggers──> pace-learn (merged 后管道) │ -│ pace-dev ──triggers──> pace-guard (L/XL pre-flight) │ -│ pace-dev ──triggers──> pace-test (dryrun 建议) │ -│ pace-review ──reads──> pace-test accept 证据 │ +│ pace-dev ──delegates──> pace-guard scan (risk-format) │ +│ pace-dev ──delegates──> pace-test strategy/dryrun │ +│ pace-review ──consumes──> pace-test accept (契约) │ +│ pace-review ──feeds──> pace-learn (打回信息) │ +│ pace-test dryrun ──refs──> pace-release procedures │ │ post-cr-update Hook ──triggers──> pace-learn 管道 │ └─────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────┐ │ 计划与变更管理(中等耦合) │ │ │ -│ pace-change ──inlines──> pace-plan adjust 逻辑 │ +│ pace-change ──delegates──> pace-plan adjust 逻辑 │ │ pace-change ──suggests──> pace-dev / pace-sync │ -│ pace-retro ──produces──> 学习请求 ──consumed──> pace-learn│ +│ pace-retro ──delegates──> pace-learn 管道 (7 处引用) │ │ pace-retro ──writes──> iterations/ ──consumed──> pace-plan│ +│ pace-biz ──suggests──> pace-change / pace-plan / pace-dev│ └─────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────┐ -│ 节奏监测(扇入耦合) │ +│ 节奏监测(扇入耦合 + 信号缓存枢纽) │ │ │ -│ pace-pulse ──reads──> 多 Skill 的数据文件(13 信号源) │ -│ pace-next ──hardcodes──> 14 个 Skill 命令名映射 │ +│ pace-pulse ──writes──> .signal-cache ──reads──> pace-next│ +│ pace-pulse ──writes──> .signal-cache ──reads──> pace-status│ +│ pace-next ──hardcodes──> 21 个信号→命令映射 │ +│ pace-status ──positioned-as──> pace-next 轻量子集 │ │ pulse-counter Hook ──coordinates──> pace-pulse │ └─────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────┐ +│ 支撑与辅助(松散耦合) │ +│ │ +│ pace-role ──defines──> 角色维度 ──consumed-by──> 6 Skill │ +│ pace-init ──delegates──> pace-sync setup │ +│ pace-release ──delegates──> pace-feedback / pace-test │ +│ pace-feedback ──routes──> pace-change / pace-biz │ +│ pace-trace ──navigates──> pace-theory / pace-learn │ +└─────────────────────────────────────────────────────────┘ ``` ## §1 核心开发管道 @@ -47,51 +64,56 @@ | 写入方 | pace-dev(创建 CR、更新状态/事件记录) | | 读取方 | pace-review(读取 CR 状态、验收条件、事件记录、复杂度、分支名) | | 共享格式 | `knowledge/_schema/cr-format.md` | -| 触发位置 | `skills/pace-dev/SKILL.md:96`("到达 in_review → 自动运行 /pace-review 逻辑") | +| 触发位置 | `skills/pace-dev/SKILL.md:95`("到达 in_review → 自动运行 /pace-review 逻辑") | | 反向感知 | `skills/pace-review/review-procedures-common.md:30`(简化审批由 pace-dev 直接处理) | -### pace-dev → pace-learn(merged 后管道) +### pace-dev → pace-test(Schema 契约 + 命令委托) | 属性 | 值 | |------|-----| -| 耦合类型 | 管道触发(CR 进入 merged 状态后自动执行) | -| 风险等级 | 中 | -| 触发机制 | `post-cr-update.mjs:43` 输出 `pace-learn knowledge extraction` 步骤 | -| 位置 | `hooks/post-cr-update.mjs:42-43` | +| 耦合类型 | **Schema 契约**(通过 test-strategy-format.md)+ 命令委托 + 数据文件共享 | +| 风险等级 | **低**(已解耦) | +| 契约位置 | `dev-procedures-developing.md:147`(引用 `knowledge/_schema/test-strategy-format.md`,或委托 `/pace-test strategy`) | +| 命令建议 | `dev-procedures-developing.md:172`(`/pace-test dryrun 1`)、`dev-procedures-gate.md:23`(`/pace-test generate`)、`dev-procedures-intent.md:181`(`/pace-test generate`) | +| 共享数据 | `.devpace/rules/test-strategy.md`、`.devpace/rules/checks.md` | -### pace-dev → pace-guard(L/XL 预检 + Schema 契约) +### pace-dev → pace-guard(Schema 契约 + 命令委托) | 属性 | 值 | |------|-----| -| 耦合类型 | **Schema 契约**(通过 risk-format.md 接口)+ 命令委托 | +| 耦合类型 | **Schema 契约**(通过 risk-format.md)+ 命令委托 | | 风险等级 | **低**(已解耦) | -| 位置 | `skills/pace-dev/dev-procedures-intent.md:206`(引用 `knowledge/_schema/risk-format.md` 风险预评估章节,或委托 `/pace-guard scan`) | -| 描述 | pace-dev 通过共享 Schema 定义输出格式,不再直接引用 pace-guard 内部 procedures 文件 | +| 位置 | `dev-procedures-intent.md:252`(引用 `knowledge/_schema/risk-format.md`,或委托 `/pace-guard scan`) | | 影响 | 修改 guard-procedures 内部实现不影响 pace-dev(只要 risk-format.md 契约不变) | | 反向感知 | `skills/pace-guard/SKILL.md:2`(NOT 声明排除触发混淆) | -### pace-dev → pace-test(Schema 契约 + 命令委托) +### pace-dev → pace-learn(merged 后管道 + Gate 反思) | 属性 | 值 | |------|-----| -| 耦合类型 | **Schema 契约**(通过 test-strategy-format.md 接口)+ 命令委托 + 数据文件共享 | -| 风险等级 | **低**(已解耦) | -| 契约位置 | `skills/pace-dev/dev-procedures-developing.md:146`(引用 `knowledge/_schema/test-strategy-format.md`,或委托 `/pace-test strategy`) | -| 命令建议 | `dev-procedures-developing.md:135`(`/pace-test generate`)、`:171`(`/pace-test dryrun 1`)、`dev-procedures-gate.md:17`(`/pace-test generate`) | -| 共享数据 | `.devpace/rules/test-strategy.md`、`.devpace/rules/checks.md` | -| 影响 | 修改 test-procedures 内部实现不影响 pace-dev(只要 test-strategy-format.md 契约不变) | +| 耦合类型 | 管道触发 + 数据共享 | +| 风险等级 | 中 | +| 触发机制 | `hooks/post-cr-update.mjs:46` 输出 `devpace:post-merge` → §11 管道含 pace-learn | +| Gate 反思 | `dev-procedures-gate.md:38`(反思内容在 merged 后纳入 pace-learn 范围) | +| Defect 模式 | `dev-procedures-defect.md:29`("merged 后自动触发 pace-learn 提取根因 pattern") | + +### pace-dev → pace-sync(命令委托) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令委托 | +| 风险等级 | 低 | +| 位置 | `dev-procedures-common.md:33`(用户同意后执行 `/pace-sync create CR-{id}`) | ### pace-review → pace-test accept(共享契约) | 属性 | 值 | |------|-----| -| 耦合类型 | **共享 Schema 契约**(通过 accept-report-contract.md 接口) | -| 风险等级 | **中**(已从"很高"降级——格式变更通过契约文件协调) | -| 契约文件 | `knowledge/_schema/accept-report-contract.md`(生产方和消费方的共享接口) | +| 耦合类型 | **共享 Schema 契约**(通过 accept-report-contract.md) | +| 风险等级 | **中** | +| 契约文件 | `knowledge/_schema/accept-report-contract.md` | | 声明位置 | `skills/pace-test/SKILL.md:19`("pace-review Gate 2:可消费 /pace-test accept 的验收映射报告") | -| 消费逻辑 | `skills/pace-review/review-procedures-gate.md`(引用 accept-report-contract.md 提取规则) | -| 模板嵌入 | `review-procedures-gate.md` 摘要模板的 accept 字段按契约定义的格式填充 | -| 影响 | 变更 accept 输出格式时修改 accept-report-contract.md,双方同步适配 | +| 消费逻辑 | `review-procedures-gate.md:191,220,264`(引用契约提取 accept 验证结果) | ### pace-review → pace-learn(打回信息数据流) @@ -99,17 +121,58 @@ |------|-----| | 耦合类型 | 数据文件共享(CR 事件表写入供下游消费) | | 风险等级 | 低 | -| 位置 | `skills/pace-review/review-procedures-feedback.md:7,35`(结构化打回信息写入事件表供 pace-learn 提取) | -| 位置 | `skills/pace-review/review-procedures-common.md:112`("支持 /pace-learn:不仅有打回原因,还有完整审查发现") | +| 位置 | `review-procedures-feedback.md:7,35`(打回信息写入事件表供 pace-learn 提取) | +| 位置 | `review-procedures-common.md:112`("支持 /pace-learn:不仅有打回原因,还有完整审查发现") | -### pace-test → pace-dev(Gate 检查互认) +### pace-review → pace-trace(决策轨迹) | 属性 | 值 | |------|-----| -| 耦合类型 | 概念对齐 | +| 耦合类型 | 数据共享(Review 判断作为决策轨迹) | +| 风险等级 | 低 | +| 位置 | `review-procedures-common.md:113` | + +### pace-test → pace-release(跨 Skill 文件路径引用) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | **直接文件路径引用** | +| 风险等级 | **中**(非命令委托,直接引用 procedures 文件) | +| 位置 | `test-procedures-dryrun.md:24`(直接引用 `skills/pace-release/release-procedures-create-enhanced.md` 的 Gate 4 检查项) | +| 影响 | pace-release 重命名或重组 procedures 文件会导致 pace-test 引用断裂 | + +### pace-test → pace-retro(基准线数据共享) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 数据共享 | +| 风险等级 | 低 | +| 位置 | `SKILL.md:41`、`test-procedures-baseline.md:13`("baseline 供 /pace-retro 度量使用") | + +### pace-guard → pace-pulse(monitor 触发) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令触发(pace-pulse 周期性触发 monitor) | | 风险等级 | 低 | -| 位置 | `skills/pace-test/SKILL.md:18`("/pace-dev Gate 1/2:消费 checks.md 中的测试命令") | -| 位置 | `skills/pace-test/test-procedures-dryrun.md:8`("与 /pace-dev Gate 的区别"对照表) | +| 位置 | `SKILL.md:18,28-29`(monitor 由 pace-pulse 第 8 信号触发) | + +### pace-guard → pace-retro(trends 数据消费) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 数据共享 | +| 风险等级 | 低 | +| 位置 | `guard-procedures-trends.md:68`(被 /pace-retro 消费时自动升级详细度) | + +### insights.md 写入冲突风险(已修复) + +| 属性 | 值 | +|------|-----| +| 风险等级 | ~~中(SSOT 冲突)~~ → **低**(已修复) | +| SSOT 声明方 | pace-learn(`learn-procedures.md:11` 声称唯一写入者) | +| 修复方案 | pace-test flaky Step 6 改为构造学习请求交给 pace-learn 统一管道(与 pace-retro 模式一致) | +| 修复日期 | 2026-03-14 | ## §2 计划与变更管理 @@ -117,76 +180,121 @@ | 属性 | 值 | |------|-----| -| 耦合类型 | **命令委托**(通过 `/pace-plan adjust` 委托 + iteration-format.md 契约) | +| 耦合类型 | **命令委托**(通过 `/pace-plan adjust` + iteration-format.md 契约) | | 风险等级 | **低**(已解耦) | -| 位置 | `skills/pace-change/change-procedures-types.md:85` | -| 描述 | "用户确认 → 委托 `/pace-plan adjust` 执行,迭代写入规则见 iteration-format.md" | -| 影响 | 修改 adjust-procedures.md 内部实现不影响 pace-change(只要 iteration-format.md 契约不变) | +| 位置 | `change-procedures-types.md:85`(容量溢出时自动委托 `/pace-plan adjust`) | +| 契约 | `knowledge/_schema/iteration-format.md` | ### pace-change → pace-dev / pace-sync(流转建议) | 属性 | 值 | |------|-----| -| 耦合类型 | 命令建议(变更执行后的下一步引导) | +| 耦合类型 | 命令建议 | | 风险等级 | 低 | -| 位置 | `skills/pace-change/change-procedures-types.md:99-103`(5 种变更类型各有对应的下一步命令) | -| 位置 | `skills/pace-change/change-procedures-execution.md:52-53`(建议 `/pace-sync push`) | +| 位置 | `change-procedures-types.md:99-103`(5 种变更类型的下一步引导) | +| 位置 | `change-procedures-execution.md:52-53`(建议 `/pace-sync push`) | ### pace-change → pace-test impact(测试影响建议) | 属性 | 值 | |------|-----| -| 耦合类型 | 命令建议 | +| 耦合类型 | 命令建议 + 数据共享 | | 风险等级 | 低 | -| 位置 | `skills/pace-change/change-procedures-types.md:71`(建议 `/pace-test impact`) | +| 位置 | `change-procedures-types.md:71`(建议 `/pace-test impact`、`/pace-test strategy`) | +| 位置 | `change-procedures-risk.md:7`(引用 pace-test 写入的影响分析 section) | ### pace-retro → pace-learn(学习请求管道) | 属性 | 值 | |------|-----| -| 耦合类型 | **数据格式依赖** | -| 风险等级 | 中 | -| 位置 | `skills/pace-retro/retro-procedures.md:159-206` | -| 描述 | pace-retro Step 5 构造"学习请求",交给 pace-learn 统一写入管道处理 | -| 共享格式 | 学习请求结构(pace-learn 定义,pace-retro 生产) | +| 耦合类型 | **数据格式依赖**(最强外部依赖,7 处引用) | +| 风险等级 | **中** | +| 位置 | `retro-procedures.md:190,201,210,215,230,237,253`(pattern 统一写入管道) | +| 共享格式 | `knowledge/_schema/insights-format.md`(retro:195 引用) | ### pace-retro → pace-plan(迭代传递清单) | 属性 | 值 | |------|-----| -| 耦合类型 | 数据文件共享 | +| 耦合类型 | 数据文件共享(结构化传递) | | 风险等级 | 中 | -| 位置 | `skills/pace-retro/retro-procedures.md:529`(写入 `iterations/current.md` 的回顾 section) | -| 消费方 | `skills/pace-retro/SKILL.md:86`("供 /pace-plan next 消费") | +| 位置 | `retro-procedures.md:560`(传递清单写入 `iterations/current.md`) | +| 消费方 | `SKILL.md:89`("供 /pace-plan next 消费") | + +### pace-retro → pace-role(角色适配) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 概念对齐(角色适配权威源引用) | +| 风险等级 | 低 | +| 位置 | `retro-procedures.md:74`(引用 `skills/pace-role/role-procedures-dimensions.md`) | +| 位置 | `retro-procedures.md:73`(引用 `devpace-rules.md §13` 读取当前角色) | + +### pace-biz → pace-change / pace-plan(命令建议密集) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令建议(最大引用发出者) | +| 风险等级 | 低(纯建议,无数据依赖) | +| 向 pace-change | 6 处引用(SKILL.md、epic、decompose、output 等 procedures 中的 downstream 引导) | +| 向 pace-plan | 8 处引用(SKILL.md、decompose、discover、import、infer 等 procedures 中的引导) | ### pace-change → pace-init(降级引导) | 属性 | 值 | |------|-----| -| 耦合类型 | 命令建议(未初始化时引导) | +| 耦合类型 | 命令建议 | | 风险等级 | 低 | -| 位置 | `skills/pace-change/change-procedures-degraded.md:38,54-55` | +| 位置 | `change-procedures-degraded.md:38,54-55`(未初始化时引导) | ## §3 节奏监测 -### pace-next → 14 个 Skill 命令映射(硬编码) +### 信号缓存枢纽(.signal-cache) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | **数据文件共享**(三 Skill 核心枢纽) | +| 风险等级 | **中** | +| 写入方 | pace-pulse session-start(`pulse-procedures-session-start.md:48`) | +| 读取方 | pace-next(`next-procedures.md:30`)、pace-status overview(`status-procedures-overview.md:21`) | +| TTL | 5 分钟 | +| 共享格式 | `knowledge/signal-collection.md`(缓存格式定义) | + +### pace-next → 21 个信号命令映射(硬编码) | 属性 | 值 | |------|-----| | 耦合类型 | **数据依赖(硬编码命令名表)** | | 风险等级 | 中 | -| 位置 | `skills/pace-next/next-procedures-output-default.md:27-40` | -| 映射表 | S1→pace-review, S2→pace-guard report, S3→pace-dev, S4→pace-change resume, S5→pace-release, S6→pace-guard report, S7→pace-plan adjust, S8→pace-retro+pace-plan, S9→pace-retro, S10→pace-guard report, S11→pace-sync push, S12→pace-retro, S14→pace-plan | -| 影响 | 新增/重命名 Skill 命令时须同步更新此映射表 | +| SKILL.md 映射 | `next-procedures-output-default.md:27-41`(S1-S16 共 16 个信号→命令映射) | +| 脚本映射 | `scripts/collect-signals.mjs:354-543`(S1-S25 共 21 个信号→命令映射,含 S16-S19 pace-biz、S21-S25 新增信号) | +| 影响 | 新增/重命名 Skill 命令时须同步更新两处映射表 | + +### pace-status ↔ pace-next(双向互斥 + 轻量子集) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 概念对齐(双向互斥边界声明) | +| 风险等级 | 低 | +| pace-status → pace-next | `SKILL.md:2`("NOT for next-step recommendations")、`status-procedures-overview.md:23`(定位为"pace-next 轻量子集")、`:38`(升级导航到 `/pace-next detail`) | +| pace-next → pace-status | `SKILL.md:2`("NOT for current progress overview") | -### pace-pulse → 多 Skill 数据文件(只读扇入) +### pace-pulse ↔ pace-status(推/拉去重) | 属性 | 值 | |------|-----| -| 耦合类型 | 只读数据消费(13 信号源) | -| 风险等级 | 低(只读不影响源 Skill) | -| 位置 | `skills/pace-pulse/pulse-procedures-core.md:10-20` | -| 信号源 | `current.md` PF 完成率、CR 滞留时间、checks.md 失败率、current.md 变更记录、`.devpace/risks/` 风险文件、`sync-mapping.md` 同步状态、`dashboard.md` 度量更新日期、`state.md` 会话时间 | +| 耦合类型 | 数据共享(去重协调) | +| 风险等级 | 低 | +| 位置 | `pulse-procedures-session-start.md:29`(引用 `status-procedures-overview.md` "推/拉去重"节) | +| 机制 | pulse 是"推送式",status 是"拉取式",距会话开始 < 5 分钟时省略建议行 | + +### pace-next ↔ pace-pulse(session-start 去重) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 概念对齐(无直接文件引用) | +| 风险等级 | 低 | +| 位置 | `next-procedures.md:67-69`(距 session-start < 5 分钟时跳过已通知信号) | ### pulse-counter Hook ↔ pace-pulse(协调互补) @@ -194,53 +302,238 @@ |------|-----| | 耦合类型 | 运行时协调(时间戳文件) | | 风险等级 | 低 | -| 位置 | `hooks/pulse-counter.mjs:7-16` | -| 协调机制 | pulse-counter 检查 `.devpace/.pulse-last-run`(pace-pulse 写入),若 pace-pulse 近 5 分钟内运行过则跳过提醒 | +| 位置 | `hooks/pulse-counter.mjs:92-98` | +| 协调机制 | pulse-counter 读取 `.devpace/.pulse-last-run`(pace-pulse 写入),5 分钟内不重复提醒 | -## §4 Hook ↔ Rules 章节引用 +### 共享 knowledge 依赖 -| Hook 文件 | 引用的 Rules 章节 | 位置 | -|-----------|-------------------|------| -| `intent-detect.mjs` | `devpace-rules.md §9` | :48 | -| `sync-push.mjs` | `§16 rule 9`(噪声控制)、`§11 step 7`(close-loop) | :14, :84-85 | -| `pre-tool-use.mjs` | `devpace-rules.md §2`(双模式、Gate 3) | :12-13 | -| `post-cr-update.mjs` | `§11`(合并后管道步骤) | :6, :40 | +| knowledge 文件 | pace-next | pace-pulse | pace-status | +|---------------|-----------|------------|-------------| +| `signal-priority.md` | 排序规则权威源 | — | 信号子集权威源 | +| `signal-collection.md` | 缓存规则 | 缓存格式 | 缓存规则 | -## §5 变更影响矩阵 +## §4 支撑与辅助 -修改左侧 Skill/Hook 时,需检查右侧关联方: +### pace-role → 6 个消费方(角色维度权威源) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | **概念对齐**(`role-procedures-dimensions.md` 被 6 个 Skill 引用为权威源) | +| 风险等级 | **中**(修改角色定义影响面广) | +| 消费方 | pace-retro(`:74`)、pace-next(`:104`)、pace-pulse(`SKILL.md:60`)、pace-status(`status-procedures-roles.md`)、pace-change(`change-procedures-impact.md:72`)、pace-theory(`theory-procedures-default.md:78`) | +| 影响 | 新增/修改角色维度时须同步 CLAUDE.md "pace-role 角色扩展清单"中列出的 13 个文件 | + +### pace-init → pace-sync setup(命令委托) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令委托(8 处引用,含 2 处自动执行) | +| 风险等级 | 低 | +| 自动执行 | `init-procedures-full.md:124,126`、`init-procedures-core.md:198-199`(用户选择后自动执行 `/pace-sync setup`) | +| 命令建议 | `init-procedures-core.md:130-132,203,268,285` 等 | + +### pace-release → pace-feedback(命令委托) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令委托 | +| 风险等级 | 低 | +| 位置 | `release-procedures-common.md:74`(deployed 后问题走 `/pace-feedback report`) | +| 位置 | `release-procedures-verify.md:78`(验证失败引导 `/pace-feedback report`) | + +### pace-release → pace-test(命令委托) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令委托(可选联动) | +| 风险等级 | 低 | +| 位置 | `release-procedures-create-enhanced.md:55-58,146,149`(测试覆盖联动、Release 测试报告) | + +### pace-release ↔ pace-pulse / pace-plan(概念联动) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 概念对齐(无直接数据依赖) | +| 风险等级 | 低 | +| 位置 | `release-procedures-scheduling.md:5,11,36`(发布窗口信号由 pace-pulse 触发) | +| 位置 | `release-procedures-scheduling.md:11,40`(迭代结束提示联动 pace-plan) | + +### pace-feedback → pace-change / pace-biz(功能请求路由) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令委托(功能请求分流) | +| 风险等级 | 低 | +| 位置 | `feedback-procedures-intake.md:42,46-47`(有 Epic 结构 → `/pace-biz discover`,无 Epic → `/pace-change add`) | + +### pace-trace → pace-theory / pace-learn(导航链接) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 命令建议(导航链接表) | +| 风险等级 | 低 | +| 位置 | `trace-procedures-analysis.md:21-24`(6 个导航链接指向 pace-theory、pace-change、pace-test、pace-guard、pace-learn、pace-role) | +| 位置 | `trace-procedures-gates.md:20-22`(指向 pace-learn、pace-retro、pace-status) | + +### pace-theory → knowledge/theory.md(数据源) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 数据共享(运行时数据源) | +| 风险等级 | 低 | +| 位置 | `SKILL.md:44`(按路由表选择性读取)、`theory-procedures-why.md:62`(§11 设计决策权威源)、`theory-procedures-search.md:14`(Grep 搜索) | + +### pace-sync → devpace-rules.md §2(状态转换验证) + +| 属性 | 值 | +|------|-----| +| 耦合类型 | 概念对齐 | +| 风险等级 | 低 | +| 位置 | `sync-procedures-pull.md:26`(外部状态同步回写时验证状态转换合法性) | + +## §5 Hook 系统 + +### 全局 Hook 事件绑定 + +| Hook 文件 | 事件 | async | 性质 | +|-----------|------|-------|------| +| `intent-detect.mjs` | UserPromptSubmit | true | Advisory | +| `pre-tool-use.mjs` | PreToolUse (Write\|Edit) | false | **BLOCKING** (exit 2) + Advisory | +| `post-cr-update.mjs` | PostToolUse (Write\|Edit) | true | Advisory | +| `pulse-counter.mjs` | PostToolUse (Write\|Edit) | true | Advisory | +| `sync-push.mjs` | PostToolUse (Write\|Edit) | true | Advisory | +| `post-schema-check.mjs` | PostToolUse (Write\|Edit) | true | Advisory | +| `post-tool-failure.mjs` | PostToolUseFailure (Write\|Edit) | false | Advisory | +| `subagent-stop.mjs` | SubagentStop | false | Advisory | + +Shell Hook:`session-start.sh`(SessionStart)、`pre-compact.sh`(PreCompact)、`session-stop.sh`(Stop)、`session-end.sh`(SessionEnd)。 + +Skill 级 Hook:`pace-dev-scope-check.mjs`(PreToolUse)、`pace-review-scope-check.mjs`(PreToolUse)、`pace-init-scope-check.mjs`(PreToolUse)。 + +### Hook → devpace-rules.md 章节引用 + +| Hook 文件 | 引用章节 | 行号 | 类型 | +|-----------|---------|------|------| +| `intent-detect.mjs` | §9 | :51 | **运行时输出**(变更管理流程) | +| `pre-tool-use.mjs` | §2 | :15 | 注释(Gate 3 人类审批来源) | +| `post-cr-update.mjs` | §11 | :46 | **运行时输出**(post-merge pipeline) | +| `sync-push.mjs` | §11 step 7 | :89-90 | **运行时输出**(close-loop) | +| `sync-push.mjs` | §16 rule 9 | :14 | 注释(噪音抑制设计) | + +### Hook → /pace-* 命令引用 + +| 命令 | Hook 来源 | 行号 | 引用方式 | +|------|----------|------|---------| +| `/pace-dev`(间接) | pre-tool-use.mjs | :48 | 阻断消息建议进入推进模式 | +| `/pace-learn` | post-cr-update.mjs | :53, :58 | 学习触发建议 | +| `/pace-sync push` | sync-push.mjs | :90, :93 | 外部同步建议 | +| `/pace-status` | pulse-counter.mjs | :86, :105 | 进度检查建议 | +| `/pace-status` | subagent-stop.mjs | :106 | 状态修正建议 | + +### Hook 间协调关系 + +| 协调对 | 机制 | +|--------|------| +| `pulse-counter` ↔ `pace-pulse` (Skill) | `.pulse-last-run` 时间戳,5 分钟内不重复提醒 | +| `pre-tool-use` ↔ Skill 级 hooks | 全局 hook 执行 Gate 3 阻断,Skill 级 hook 注释 "delegated to global hook" | +| `pre-tool-use` ↔ `post-tool-failure` | 共用 `isAdvanceMode` 判断(同一 `state.md` 数据源) | +| 4 个 PostToolUse hooks | 并行 async 执行,职责互补不重叠 | + +### devpace: 消息前缀汇总 + +| 前缀 | 来源 | 性质 | +|------|------|------| +| `devpace:change-detected` | intent-detect.mjs:51 | advisory | +| `devpace:blocked` | pre-tool-use.mjs:48,62 | **BLOCKING** | +| `devpace:gate-reminder` | pre-tool-use.mjs:70,73,76 | advisory | +| `devpace:post-merge` | post-cr-update.mjs:46 | advisory | +| `devpace:learn-trigger` | post-cr-update.mjs:53,58 | advisory | +| `devpace:sync-push` | sync-push.mjs:90,93 | advisory | +| `devpace:stuck-warning` | pulse-counter.mjs:86 | advisory | +| `devpace:write-volume` | pulse-counter.mjs:105 | advisory | +| `devpace:tool-failure` | post-tool-failure.mjs:28,30 | advisory | +| `devpace:schema-check` | post-schema-check.mjs:73 | advisory | +| `devpace:subagent-check` | subagent-stop.mjs:102 | advisory | + +## §6 变更影响矩阵 + +修改左侧组件时,需检查右侧关联方: | 修改对象 | 需检查 | |---------|--------| -| pace-dev (CR 状态转换逻辑) | pace-review, post-cr-update Hook, sync-push Hook, pace-pulse | +| pace-dev (CR 状态转换逻辑) | pace-review, post-cr-update Hook, sync-push Hook, pace-pulse, subagent-stop Hook | | pace-dev (Gate 检查逻辑) | pace-test (dryrun 对照表), pre-tool-use Hook | -| pace-guard `guard-procedures-scan.md` | 无直接依赖方(pace-dev 已改引用 risk-format.md 契约) | -| pace-test `test-procedures-strategy-gen.md` | 无直接依赖方(pace-dev 已改引用 test-strategy-format.md 契约) | -| pace-test accept (输出格式) | **accept-report-contract.md**(变更格式时需同步更新契约文件) | +| pace-guard `guard-procedures-scan.md` | 无直接依赖方(通过 risk-format.md 契约) | +| pace-test `test-procedures-strategy-gen.md` | 无直接依赖方(通过 test-strategy-format.md 契约) | +| pace-test accept (输出格式) | **accept-report-contract.md**(契约文件)→ **pace-review**(消费方) | +| pace-test dryrun (Gate 4 检查项) | 直接引用 `pace-release/release-procedures-create-enhanced.md`(文件路径耦合) | | pace-plan `adjust-procedures.md` | 无直接依赖方(pace-change 已改为委托 `/pace-plan adjust`) | | **accept-report-contract.md** | **pace-test accept**(生产方)+ **pace-review**(消费方) | -| pace-learn (写入管道格式) | pace-retro (学习请求格式), post-cr-update Hook | +| pace-learn (写入管道格式) | pace-retro (学习请求格式, 7 处引用), post-cr-update Hook | | pace-retro (迭代传递清单格式) | pace-plan next (消费传递清单) | -| pace-next 命令映射表 | 新增/重命名任何 Skill 命令时 | -| pace-pulse 信号→命令映射 | 新增/重命名任何 Skill 命令时(4 个 procedures 文件) | -| CR Schema (`cr-format.md`) | pace-dev, pace-review, pace-change, pace-status, pace-pulse, 所有 Hook | -| `devpace-rules.md §2` | pre-tool-use Hook(注释) | -| `devpace-rules.md §9` | intent-detect Hook(**运行时输出** :48) | -| `devpace-rules.md §11` | post-cr-update Hook(注释), sync-push Hook(**运行时输出** :85) | -| `devpace-rules.md §16` | sync-push Hook(注释) | - -## §6 共享数据文件索引 +| pace-next 命令映射表 | 新增/重命名任何 Skill 命令时(SKILL.md + collect-signals.mjs 两处) | +| pace-pulse 信号→命令映射 | 新增/重命名任何 Skill 命令时(core + session-start 两个 procedures 文件) | +| **pace-role 角色维度定义** | **6 个消费方**:pace-retro, pace-next, pace-pulse, pace-status, pace-change, pace-theory(+ CLAUDE.md 扩展清单中 13 个文件) | +| pace-release (procedures 文件结构) | pace-test dryrun(直接文件路径引用) | +| pace-feedback (feedback-log 格式) | pace-retro (回顾时可用数据), pace-plan (规划时扫描) | +| CR Schema (`cr-format.md`) | pace-dev, pace-review, pace-change, pace-status, pace-feedback, pace-pulse, 所有 Hook | +| `devpace-rules.md §2` | pre-tool-use Hook(注释), pace-sync pull(状态转换验证) | +| `devpace-rules.md §9` | intent-detect Hook(**运行时输出** :51) | +| `devpace-rules.md §10` | pace-pulse SKILL.md(脉搏触发时机), pulse-counter Hook | +| `devpace-rules.md §11` | post-cr-update Hook(**运行时输出** :46), sync-push Hook(**运行时输出** :90), pace-feedback(§11 连锁扫描) | +| `devpace-rules.md §13` | pace-retro(角色读取), pace-next(角色意识), pace-role inference(权威源) | +| `devpace-rules.md §15` | pace-status roles(教学标记去重) | +| `devpace-rules.md §16` | sync-push Hook(注释,噪音抑制) | +| `knowledge/signal-priority.md` | pace-next(排序权威源), pace-status overview(信号子集) | +| `knowledge/signal-collection.md` | pace-next(缓存规则), pace-pulse session-start(缓存格式), pace-status overview(缓存规则) | + +## §7 共享数据文件索引 | 数据文件 | 写入方 | 读取方 | |---------|--------|--------| -| `.devpace/backlog/CR-*.md` | pace-dev, pace-change | pace-review, pace-status, pace-pulse, pace-next, 4 个 Hook | -| `.devpace/state.md` | pace-dev, pace-init | pre-tool-use Hook, pace-dev-scope-check Hook, post-tool-failure Hook, pace-pulse | -| `.devpace/rules/checks.md` | pace-dev, pace-test | pace-dev (Gate 1/2), pace-pulse | -| `.devpace/rules/test-strategy.md` | pace-test strategy | pace-dev (测试先行引导) | -| `.devpace/dashboard.md` | pace-retro | pace-pulse, pace-status | -| `.devpace/iterations/current.md` | pace-plan, pace-retro | pace-pulse, pace-plan next | -| `.devpace/risks/` | pace-guard | pace-pulse | -| `.devpace/integrations/sync-mapping.md` | pace-sync | sync-push Hook, post-cr-update Hook, pace-pulse | +| `.devpace/backlog/CR-*.md` | pace-dev, pace-change | pace-review, pace-status, pace-pulse, pace-next, pace-retro, pace-test, pace-guard, pace-feedback, 5 个 Hook (pre-tool-use, post-cr-update, sync-push, pulse-counter, subagent-stop) | +| `.devpace/state.md` | pace-dev, pace-init, pace-plan | pre-tool-use Hook, post-tool-failure Hook, subagent-stop Hook, pulse-counter Hook(间接), intent-detect Hook(存在性), pace-pulse, pace-next, pace-status, pace-theory | +| `.devpace/project.md` | pace-init, pace-retro(accept MoS), pace-role(set-default), pace-pulse(自主级别) | pace-plan, pace-retro, pace-change, pace-guard, pace-next, pace-status, pace-biz, pace-theory | +| `.devpace/rules/checks.md` | pace-init, pace-test | pace-dev (Gate 1/2), pace-change, pace-pulse, pace-status, pace-theory | +| `.devpace/rules/test-strategy.md` | pace-dev(自动生成), pace-test(strategy/generate/coverage/flaky) | pace-dev, pace-test(多个子命令), pace-change(陈旧标记) | +| `.devpace/rules/test-baseline.md` | pace-test(core/baseline) | pace-retro(common/focus) | +| `.devpace/context.md` | pace-dev(首次推进) | pace-test(common/verify), pace-guard(scan) | +| `.devpace/metrics/insights.md` | pace-learn(**SSOT**), pace-init(from 导入) | pace-guard(common/scan/trends), pace-retro(retro/focus), pace-plan, pace-next, pace-change, pace-status metrics | +| `.devpace/metrics/dashboard.md` | pace-retro(common), pace-test(report), pace-plan(close), pace-feedback(status) | pace-pulse, pace-next, pace-status, pace-plan, pace-guard(scan), pace-retro(compare/history) | +| `.devpace/iterations/current.md` | pace-plan, pace-retro(传递清单), pace-change(变更记录) | pace-dev(postmerge), pace-pulse, pace-next, pace-status, pace-retro, pace-change, pace-theory | +| `.devpace/risks/RISK-*.md` | pace-guard(scan/common) | pace-guard(monitor/report/trends/resolve), pace-learn, pace-pulse, pace-next, pace-plan, pace-retro(forecast) | +| `.devpace/releases/*.md` | pace-release | pace-test(report), pace-retro(DORA/focus), pace-next, pace-change, pace-feedback, pace-status | +| `.devpace/integrations/sync-mapping.md` | pace-sync(setup) | sync-push Hook, pace-dev(common), pace-pulse, pace-next, pace-status, pace-init(reset) | +| `.devpace/integrations/config.md` | pace-sync(setup), pace-init(full) | pace-test(CI), pace-feedback(hotfix) | +| `.devpace/.signal-cache` | pace-pulse(session-start) | pace-next, pace-status overview | +| `.devpace/.pulse-last-run` | pace-pulse(advance mode) | pulse-counter Hook | +| `.devpace/.pulse-counter` | pulse-counter Hook | pulse-counter Hook | +| `.devpace/.pulse-cr-writes` | pulse-counter Hook | pulse-counter Hook | | `.devpace/.sync-state-cache` | sync-push Hook | sync-push Hook | -| `.devpace/.pulse-last-run` | pace-pulse (advance mode) | pulse-counter Hook | -| `.devpace/insights.md` | pace-learn | pace-retro (引用), pace-feedback | +| `.devpace/feedback-log.md` | pace-feedback | pace-feedback(trace/status) | +| `.devpace/feedback-inbox.md` | pace-feedback | pace-plan(规划时扫描) | +| `.devpace/incidents/*.md` | pace-feedback(incident) | pace-feedback(列出) | +| `.devpace/decisions/ADR-*.md` | pace-trace(arch) | pace-trace(列出) | +| `.devpace/epics/EPIC-*.md` | pace-biz(epic) | pace-biz(扫描) | +| `.devpace/opportunities.md` | pace-biz(opportunity) | pace-biz(discover) | +| `.devpace/scope-discovery.md` | pace-biz(discover) | pace-biz(discover) | +| `.devpace/rules/workflow.md` | pace-init(模板复制) | pace-dev | +| `.devpace/rules/change-templates.md` | (pace-learn 建议创建) | pace-change(apply) | +| `.devpace/features/*` | pace-dev(postmerge 溢出) | — | +| `.devpace/requirements/*` | pace-dev(postmerge 溢出) | — | +| `.devpace/reports/test-report-*.md` | pace-test(report) | (人类消费) | + +## §8 共享工具函数(lib/utils.mjs) + +所有 Hook 共用 `hooks/lib/utils.mjs` 中的工具函数: + +| 函数 | 使用该函数的 Hook | +|------|-----------------| +| `readStdinJson` | 全部 8 个 | +| `getProjectDir` | 全部 8 个 | +| `extractFilePath` | pre-tool-use, post-cr-update, sync-push, pulse-counter, post-tool-failure, post-schema-check | +| `isCrFile` | pre-tool-use, post-cr-update, sync-push, pulse-counter, post-tool-failure | +| `readCrState` | pre-tool-use, post-cr-update, sync-push, pulse-counter, subagent-stop | +| `isDevpaceFile` | pre-tool-use, post-schema-check | +| `isAdvanceMode` | pre-tool-use, post-tool-failure, subagent-stop | +| `CR_STATES` | pre-tool-use, post-cr-update, sync-push, subagent-stop | diff --git a/docs/design/vision.md b/docs/design/vision.md index 1303215..b6f455d 100644 --- a/docs/design/vision.md +++ b/docs/design/vision.md @@ -15,7 +15,7 @@ ## 一句话概述 -**devpace** 是一个Claude Code插件,为 AI 辅助开发带来完整的BizDevOps研发节奏管理。它将"业务目标→产品功能→代码变更"串成一条可追溯的价值链,让 Claude 不只是写代码的工具,而是理解业务意图的研发协作者。会话中断?自动恢复上下文,无需重复解释。需求变更?即时评估影响范围,有序调整而非推倒重来。质量跑偏?内置门禁自动拦截,未就绪的变更无法流入下游。**需求永远在变,但从规划到交付的研发节奏,不应该因此失控。** devpace 完整覆盖 BizDevOps 三域(业务、开发、运维),Ops 域采用分阶段策略逐步深化。 +**devpace** 是一个Claude Code插件,为 AI 辅助开发带来完整的BizDevOps研发节奏管理。它将"业务目标→产品功能→代码变更"串成一条可追溯的价值链,让 Claude 不只是写代码的工具,而是理解业务意图的研发协作者。会话中断?自动恢复上下文,无需重复解释。需求变更?即时评估影响范围,有序调整而非推倒重来。质量跑偏?内置门禁自动拦截,未就绪的变更无法流入下游。**需求永远在变,但从规划到交付的研发节奏,不应该因此失控。** devpace 完整覆盖 BizDevOps 三域(业务、开发、运维),Ops 域采用分阶段策略逐步深化。从 Harness Engineering 视角看,devpace 是 Claude Code 的研发节奏 harness——用规则、Schema、门禁和反馈循环约束 Agent 行为,让 AI 辅助开发从 vibe coding 进化为可追溯、可度量的工程实践。 ## 北极星 diff --git a/docs/features/pace-change_zh.md b/docs/features/pace-change_zh.md new file mode 100644 index 0000000..33e8358 --- /dev/null +++ b/docs/features/pace-change_zh.md @@ -0,0 +1,299 @@ +🌐 [English](pace-change.md) | 中文版 + +# 需求变更管理(`/pace-change`) + +在大多数 AI 辅助工作流中,需求变更是一种临时处理的干扰事件——这里加个注释、那里手动改个文件,散落的笔记无人能追溯。devpace 将变更视为**一等公民**。`/pace-change` 提供结构化分诊、多层影响分析、风险量化和可追溯执行,让需求变动不再打乱开发节奏。 + +## 快速开始 + +``` +1. /pace-change add "支持 OAuth 登录" → 分诊 → 影响分析 → 提案 +2. 用户确认提案 → 跨所有项目文件执行变更 +3. 变更记录写入迭代日志 → 完整可追溯性 +``` + +或让 Claude 交互式引导你: + +``` +You: /pace-change +Claude: [基于项目上下文的智能推荐] + 或选择:add / pause / resume / reprioritize / modify / batch +``` + +## 工作流 + +### Step 0:经验预加载 + +在分析开始之前,`/pace-change` 读取 `insights.md` 中与当前变更类型匹配的历史模式。历史数据会在影响报告中引用,回滚模式会提升风险等级。没有历史数据?静默跳过。 + +### Step 1:分诊 + +并非每个变更请求都需要完整分析。在进入影响评估之前,`/pace-change` 将每个请求路由到分诊门禁: + +| 决策 | 含义 | 后续操作 | +|------|------|---------| +| **Accept** | 继续完整分析 | 进入影响分析 | +| **Decline** | 拒绝并记录理由 | 记录原因,结束 | +| **Snooze** | 延迟至满足触发条件 | 持久化记录 + 触发条件,结束。Pulse 在会话启动、新迭代创建和 CR 合并时自动检查触发条件 | + +Claude 根据与当前迭代目标的对齐度、紧急程度和项目方向自动建议分诊决策。Hotfix/关键变更直接跳过分诊。用户始终可以覆盖建议。 + +### Step 2:影响分析(4 层 3 级输出) + +通过分诊的变更进入 **BR 到代码的影响追踪**——与 devpace 从业务需求到代码变更维护的同一条价值链: + +1. **业务需求(BR)层** — 是否影响整个业务目标?成功度量(MoS)是否有风险? +2. **产品功能(PF)层** — 哪些功能受影响?多少需要范围或验收标准变更? +3. **变更请求(CR)层** — 哪些进行中或计划中的 CR 受影响?存在哪些依赖关系? +4. **代码层** — 哪些模块、文件和接口需要修改? + +**三级渐进输出**(对齐 design.md §2): +- **表面层**(默认):1 行结论——"此变更影响 2 个功能,风险为低。" +- **中间层**(追问或中等风险时):3-5 行,包含受影响功能和风险摘要 +- **深入层**(进一步追问或高风险时):完整 4 层追踪、风险矩阵、依赖链,可选 Mermaid 可视化 + +对于中/高风险变更,传递依赖最多追踪 3 层深度(CR-A → CR-B → CR-C)。 + +当变更影响某个 BR 50% 以上的功能(或用户明确指向某个 BR)时,报告升级为 BR 级视图,展示完整的下游级联。 + +### Step 3:风险量化 + 成本估算 + +影响分析之后,`/pace-change` 在三个维度生成半定量风险摘要: + +| 维度 | 低 | 中 | 高 | +|------|:--:|:--:|:--:| +| 受影响模块数 | <=2 | 3-5 | >5 | +| 受影响的进行中 CR | 0 | 1-2 | >=3 | +| 需重置的质量检查 | 0 | 1-3 | >3 | + +综合风险:**低**(所有维度均为低)、**中**(任一为中,无高)或 **高**(任一为高)。高风险变更额外获得建议的测试重点区域和分阶段执行策略。 + +**成本估算**:基于 `insights.md` 中的历史数据(可用时)或启发式规则,附加预估额外工作量以辅助决策。 + +如果 CR 已包含影响分析部分(由 `/pace-test impact` 写入),该数据会被复用而非重新评估。 + +### Step 4:提案、预览与确认 + +Claude 展示具体的调整方案——要创建的新 CR、要转换的状态、要重置的质量检查、迭代容量影响——以及**受影响文件的预览**。用户可请求 `--dry-run` 仅查看预览而不执行。 + +**在你说"执行"之前,不会修改任何内容。** + +### Step 5:执行与记录 + +确认后,所有受影响的项目文件原子化更新: + +1. **CR 文件** — 状态、意图、质量检查、事件日志 +2. **project.md** — 功能树(新条目、暂停/恢复标记、状态变更) +3. **PF 文件** — 验收标准更新 + 历史注解(用于 `modify` 已拆分的功能) +4. **iterations/current.md** — 变更日志条目(迭代跟踪激活时) +5. **state.md** — 当前工作快照和下一步建议 +6. **dashboard.md** — 变更管理指标(增量更新) +7. **git commit** — 所有变更记录在单个可追溯的提交中 + +### Step 6:下游引导 + 外部同步 + +执行后,`/pace-change` 提供**按类型区分的下一步引导**: + +| 变更类型 | 引导 | +|---------|------| +| add | "开始开发?" → 确认后进入 `/pace-dev` | +| modify | "N 个检查需要重新验证,继续?" → 恢复 `/pace-dev` | +| pause | "调整迭代范围?" → 确认后进入 `/pace-plan adjust` | +| resume | "继续开发?" → 确认后进入 `/pace-dev` | +| reprioritize | "切换到新的开始工作?" → 确认后进入 `/pace-dev` | + +同时检查外部 Issue 关联并在需要时生成同步摘要——保持你的 GitHub Issues(或其他工具)一致,不产生自动副作用。 + +**容量协调**:`add` 导致迭代容量溢出后,Claude 直接询问是否调整范围(内联 `/pace-plan adjust`)。`pause` 释放容量后,建议拉入等待中的功能。 + +## 变更类型 + +| 类型 | 语法 | 描述 | +|------|------|------| +| **add** | `/pace-change add <描述>` | 插入新需求。创建 PF + CR 条目,评估迭代容量。 | +| **pause** | `/pace-change pause <功能>` | 暂停功能。所有关联 CR 转为 `paused`,保留先前状态,解除依赖阻塞。功能树显示暂停标记。 | +| **resume** | `/pace-change resume <功能>` | 恢复已暂停的功能。CR 恢复到暂停前的状态。质量检查根据暂停期间的代码变更重新验证。 | +| **reprioritize** | `/pace-change reprioritize <描述>` | 调整优先级顺序。更新下一步推荐并重排迭代计划。 | +| **modify** | `/pace-change modify <功能> <变更>` | 变更现有功能的范围或验收标准。受影响的质量检查基于敏感度范围匹配精确重置。PF 文件添加历史注解。 | +| **batch** | `/pace-change batch <描述>` | 一次执行多个变更。合并影响分析、交叉影响检测、单次确认、单次 git commit。 | +| **undo** | `/pace-change undo` | 撤销上一次 `/pace-change` 操作(仅限当前会话)。使用 `git revert` 进行精确回滚。 | +| **history** | `/pace-change history [功能\|--all\|--recent N]` | 查询变更历史,从迭代日志、CR 事件、PF 注解和 git log 聚合。 | +| **apply** | `/pace-change apply <模板>` | 应用 `.devpace/rules/change-templates.md` 中预定义的变更模板。 | + +### 快速引用 + +- `#N` — 按编号引用 CR(如 `pause #3` = 暂停 CR-003 的功能) +- `--last` — 应用到最近工作的 CR(如 `modify --last`) +- `--dry-run` — 仅预览,不执行 + +省略类型参数时启动**智能引导对话**,Claude 扫描项目上下文(已暂停的 CR、Snooze 触发条件、迭代容量、频繁变更的功能)并给出个性化推荐,然后回退到标准选项列表。 + +## 核心特性 + +### 上下文感知智能引导 + +无参数调用时,`/pace-change` 扫描项目状态并推荐最可能的操作——"恢复已暂停的功能 X?"或"评估之前 Snooze 的变更 Y?"——然后再展示标准选项列表。 + +### 三级渐进输出 + +影响报告遵循表面层 → 中间层 → 深入层的渐进模式。日常低风险变更仅显示 1 行;复杂变更在追问后展开。典型变更的阅读量从 15-20 行降低到 3-5 行。 + +### 精确质量检查重置 + +修改需求时,`/pace-change` 将变更范围与每个质量检查的 `sensitivity` 字段(`checks.md` 中的 glob 模式)比较。仅重置范围重叠的检查——不再过度重置或遗漏重置。 + +### 批量变更 + +一次处理多个变更:合并影响分析、交叉影响检测(如暂停 A 而 B 依赖 A)、单次确认、单次 git commit。支持显式 `/pace-change batch` 和自然语言("暂停 A 和 B")。 + +### 变更撤销 + +基于 git commit 历史的精确回滚。在迭代变更日志中追加"undo"条目。限于当前会话以防止跨会话状态不一致。 + +### 变更历史查询 + +从四个分散来源(迭代日志、CR 事件、PF 注解、git log)聚合变更历史为统一时间线。功能被变更超过 3 次时主动发出警告。 + +### Snooze 主动提醒 + +Snooze 的变更以触发条件持久化保存。Pulse 系统在会话启动、新迭代创建和 CR 合并时自动检查条件——确保延迟的变更不会被遗忘。每个 Snooze 项仅提醒一次。 + +### 经验驱动分析 + +`insights.md` 中的历史模式为影响分析提供信息("类似变更历史上影响了 N 个模块"),并为曾需要回滚的模式提升风险等级。 + +### 传递依赖追踪 + +影响分析追踪最多 3 层传递依赖,显示深度随风险等级缩放。直接、间接和深层影响通过缩进展示。 + +### 下游流自动化 + +执行后,按类型区分的引导帮助用户无缝过渡到下一步——开始开发、重新验证检查或调整迭代范围——通过一键确认进入相关 Skill。 + +### 结构化分诊路由 + +每个变更请求在消耗分析工作量之前都经过 Accept/Decline/Snooze 路由。防止低价值或不成熟的请求干扰活跃工作,同时确保没有请求被静默丢弃。 + +### BR 级影响视图 + +当变更大到影响整个业务需求时,影响报告自动从功能级升级到 BR 级,展示完整级联:业务目标影响、成功度量风险、所有受影响功能和所有受影响 CR。 + +### 影响可视化(Mermaid) + +对于中/高风险变更(或应请求),Mermaid 图表可视化影响链——从变更到受影响功能再到单个 CR。低风险变更跳过可视化以避免噪音。 + +### 变更成本估算 + +风险量化包含预估工作量维度——基于 `insights.md` 中的历史检查点计数,或在无历史数据时使用启发式规则。 + +### 外部同步检查 + +任何状态变更操作后,`/pace-change` 为外部关联的 CR 生成同步摘要并在需要时提示同步——保持你的 GitHub Issues(或其他工具)一致,不产生自动副作用。 + +### 变更管理度量指标 + +执行数据输入 `metrics.md` 变更管理指标(频率、类型分布、返工率、分诊分布、执行时间),在 `dashboard.md` 中增量更新。 + +## 降级模式 + +当 `.devpace/` 未初始化时,`/pace-change` 仍可工作——它优雅降级而非失败: + +- **分诊** 简化为口头确认:"现在处理还是记下来以后再说?" +- **影响分析** 使用 `Glob`/`Grep`/`Read` 直接扫描代码库,分析 `import`/`require` 依赖图,识别受影响模块和风险区域 +- **Git 历史增强**:共变分析(频繁共同修改的文件组)和热点识别(近期变更热点) +- **风险量化** 从依赖图评估模块扩散;CR 和质量检查维度标记为"不可用" +- **执行** 仅操作代码——不创建或修改 `.devpace/` 文件 +- **增量丰富**:分析结果可选择性持久化到 `.devpace-lite/changes.md`。累计 3 次以上降级使用后,升级提示变得更具体 +- **完成时** 自然提及:"完整初始化可启用持久化追溯、质量门禁和度量指标。运行 `/pace-init` 开始。" + +这意味着你可以在任何项目、任何时间使用 `/pace-change`,即使尚未决定完全采用 devpace。 + +## 使用场景 + +### 场景 1:迭代中途插入新需求 + +利益相关者在团队专注于核心 API 开发时要求支持 OAuth 登录。 + +``` +You: /pace-change add "通过 Google 和 GitHub 支持 OAuth 登录" +Claude: 分诊:这是一个与当前 API 冲刺无关的新功能。 + 建议:Snooze 至 API 里程碑完成。 + Accept / Decline / Snooze? + +You: Accept——客户演示需要。 +Claude: 此变更新增 1 个功能,风险为低。 + [追问:"具体是什么?"] + 提案:创建 2 个 CR,将"限流"延后到下一迭代。 + 迭代容量超出——现在调整范围?(Y/N) + +You: Yes +Claude: 完成。2 个 CR 已创建,迭代计划已更新,限流已延后。 + 开始开发? → /pace-dev +``` + +### 场景 2:批量暂停和重排优先级 + +``` +You: /pace-change batch "暂停通知功能,将导出排在搜索前面" +Claude: 检测到 2 个变更意图: + ① pause:通知系统 + ② reprioritize:导出排在搜索前面 + 是否正确? + +You: 是 +Claude: 合并影响:3 个 CR 暂停(通知),1 个 CR 重排序。 + 未检测到交叉影响。风险:低。 + 继续? + +You: 执行 +Claude: 完成。通知已暂停(3 个 CR 保留),导出已提升优先级。 + 开始处理导出? → /pace-dev +``` + +### 场景 3:修改并精确重置检查 + +``` +You: /pace-change modify search "响应时间必须在 200ms 以内,而不是 500ms" +Claude: 此变更影响 1 个功能,风险为中。 + CR-005(搜索实现):Gate 1 已通过,3 个检查中有 1 个需要重置。 + [精确匹配:"performance test" 检查的敏感度范围为 "src/search/", + 与变更重叠。"lint" 和 "type check" 保留。] + 继续? + +You: 是 +Claude: 完成。CR-005 的 1 个检查已重置。历史注解已添加。 + 1 个检查需要重新验证——继续开发? +``` + +### 场景 4:撤销上次变更 + +``` +You: /pace-change undo +Claude: 上次变更(2 分钟前):暂停了"通知系统"(3 个 CR)。 + 撤销此变更? + +You: 是 +Claude: 已撤销。通知系统已恢复,3 个 CR 回到之前的状态。 +``` + +## 与其他命令的集成 + +| 命令 | 关系 | +|------|------| +| `/pace-dev` | 变更执行后,`/pace-dev` 接手新建或修改的 CR 进行实现。已暂停的 CR 从开发队列中排除。下游引导提供一键过渡。 | +| `/pace-plan` | `/pace-change add` 创建的 PF 和 CR 条目成为计划的一部分。容量溢出时触发内联 `adjust` 协调。重排优先级的调整反映在 `/pace-plan` 的迭代视图中。 | +| `/pace-sync` | 状态变更操作后,`/pace-change` 生成同步摘要并提示推送同步。 | +| `/pace-test` | `modify` 重置的质量检查(通过敏感度范围精确匹配)在下次 `/pace-test` 门禁审查时重新评估。`/pace-test impact` 的影响数据在 `/pace-change` 风险量化中被复用。 | +| `/pace-status` | 反映 `/pace-change` 做出的所有变更——已暂停的功能、新 CR、更新的优先级。 | +| `/pace-pulse` | Snooze 唤醒检查由 Pulse 在会话启动、新迭代和 CR 合并时执行。变更预警信号(验收漂移、重复失败、需求冲突)是 Pulse 信号。 | +| `/pace-retro` | 变更管理指标(频率、类型分布、返工率)在回顾报告中汇总。 | + +## 相关资源 + +- [用户指南 — /pace-change 部分](../user-guide.md) — 快速参考 +- [设计文档 — 变更管理](../design/design.md) — 架构和设计原则 +- [skills/pace-change/](../../skills/pace-change/) — 操作规程(按步骤拆分:common、triage、impact、risk、execution、types;按子命令拆分:batch、undo、history、apply、degraded) +- [cr-format.md](../../knowledge/_schema/cr-format.md) — CR 文件 Schema(包含 `paused` 状态定义) +- [checks-format.md](../../knowledge/_schema/checks-format.md) — 质量检查 Schema(包含敏感度范围) +- [metrics.md](../../knowledge/metrics.md) — 变更管理度量指标定义 +- [devpace-rules.md](../../rules/devpace-rules.md) — 运行时行为规则 diff --git a/docs/features/pace-dev_zh.md b/docs/features/pace-dev_zh.md new file mode 100644 index 0000000..cc8a17c --- /dev/null +++ b/docs/features/pace-dev_zh.md @@ -0,0 +1,234 @@ +# 开发工作流(`/pace-dev`) + +`/pace-dev` 是 devpace 的核心开发 Skill。它驱动变更请求(CR)走完完整生命周期——从意图澄清到代码实现再到质量门禁——在一个自主工作流中完成。Claude 进入"推进模式",编写代码、运行测试、遇到失败时自修正,并在每个有意义的步骤进行 commit。该 Skill 根据变更的复杂度自适应调整严格程度:单文件拼写修复走最小化流程,多模块功能则生成完整的执行计划并征求用户确认。 + +## 快速开始 + +``` +1. /pace-dev "add user login" --> 定位或创建 CR,澄清意图,开始编码 +2. (Claude 自主工作) --> 实现、测试、提交,运行 Gate 1 和 Gate 2 +3. "LGTM" --> 人工审批(Gate 3)--> CR 合并 +``` + +发出初始命令后,Claude 自主驱动工作流,无需额外 prompt,直到需要你做决策或到达人工审批门禁。 + +## CR 生命周期概览 + +每项变更都经历六状态生命周期。`/pace-dev` 自动管理状态转换;你只需在 Gate 3(人工审批)介入。 + +``` +created --> developing --> verifying --> in_review --> approved --> merged + | | | | | | + | 意图 | 代码 & | Gate 1 | Gate 2 | Gate 3 | 合并后 + | 检查点 | 测试 | (代码 | (需求 | (人工) | 更新 + | | | 质量) | 质量) | | +``` + +| 状态 | 发生什么 | 谁执行 | +|------|---------|--------| +| `created` | CR 已创建,包含标题和意图;完成复杂度评估 | Claude | +| `developing` | 代码实现、测试编写、每步 git commit | Claude | +| `verifying` | Gate 1——自动化代码质量检查;失败时自修正 | Claude | +| `in_review` | Gate 2——与验收标准对比;生成审查摘要 | Claude | +| `approved` | Gate 3——人工审查 diff 摘要并批准 | 你 | +| `merged` | 分支合并,state.md 更新,关联 PF 刷新 | Claude | + +## 核心特性 + +### 意图检查点 + +CR 首次进入 `developing` 时,Claude 执行一个自准备步骤来锁定范围和验收标准。这不是需要你填写的表单——Claude 在内部完成并告知你"范围已确认,开始工作"。 + +- **简单(S)**:记录你的原始请求及一条自由文本验收条件。 +- **标准(M)**:增加编号验收标准列表,用 `[TBD]` 标签标注歧义项。 +- **复杂(L/XL)**:生成完整的 Given/When/Then 验收标准、执行计划,并向你提最多 2 个澄清问题(每个附带推荐答案)。 + +如果你问"计划是什么?",Claude 会展示完整的意图部分,包括执行计划。 + +### 复杂度评估 + +Claude 从四个维度评估每个 CR 并分配大小:S、M、L 或 XL。 + +| 维度 | S | M | L | XL | +|------|---|---|---|-----| +| 涉及文件数 | 1-3 | 4-7 | 8-15 | >15 | +| 涉及目录数 | 1 | 2-3 | 4-5 | >5 | +| 验收标准数 | 1 | 2-3 | 4+ | 多组 | +| 跨模块依赖 | 无 | 单向 | 双向 | 架构级 | + +最高维度决定最终评级。L/XL 级 CR 自动进入拆分评估流程,Claude 会建议将工作拆成更小的 CR。 + +### 自适应路径 + +复杂度决定工作流的仪式化程度: + +| 路径 | 复杂度 | 行为 | +|------|--------|------| +| **Quick** | S,单文件 | 最小意图记录,无执行计划,直接编码 | +| **Standard** | S 多文件,M | 标准意图 + 编号标准;可选执行计划 | +| **Full** | L,XL | 完整意图 + 强制执行计划 + 计划反思 + 编码前用户确认门禁 | + +**升级守卫**在意图检查点期间监视范围蔓延。如果一个 S 级 CR 实际涉及多个模块,Claude 会建议升级到 M 或 L。该建议不会阻断流程——你可以选择继续使用原级别。 + +### 执行计划(L/XL) + +对于复杂变更,Claude 在编写任何代码之前生成分步计划: + +- 每步是一个原子操作,对应一次有意义的 commit。 +- 步骤包含确切的文件路径、要执行的操作和可验证的预期结果。 +- 步骤间的依赖关系被显式标注。 +- 当测试策略存在时,测试骨架步骤排在实现步骤之前。 + +生成计划后,Claude 从四个维度执行**计划反思**(需求覆盖度、过度工程风险、拆分必要性、技术假设),记录 1-3 行观察。然后将计划呈现给你确认后再开始编码。 + +### 门禁反思 + +每个质量门通过后,Claude 在 CR 事件日志中追加简短的自评: + +- **Gate 1 反思**:技术债观察、测试覆盖评估、测试先行遵从度审查。 +- **Gate 2 反思**:边界场景覆盖和验收完整度观察。 + +这些反思不会阻断工作流。它们积累质量信号,在 CR 合并时馈入经验提取。 + +### 漂移检测 + +两个互补的监控在每个检查点(git commit + CR 更新)运行: + +- **意图漂移**:将变更文件与声明的范围对比。如果超过 30% 的文件落在意图边界之外,Claude 会标记:"这些文件超出声明范围——是有意扩展还是范围蔓延?" +- **复杂度漂移**:将实际文件/目录数量与初始复杂度阈值对比。如果一个 S 级 CR 已触及 4+ 个文件,Claude 会建议升级。每个 CR 最多标记一次。 + +两种检测均为建议性质——它们不会阻断你的工作流。 + +### PF 溢出检查 + +当 CR 被创建或合并时,Claude 检查关联的产品功能(PF)是否已超出 `project.md` 中的内联容量: + +- **触发条件**:功能规格超过 15 行、3+ 个关联 CR,或之前对该 PF 执行过 `/pace-change modify`。 +- **动作**:自动将 PF 提取为 `.devpace/features/PF-xxx.md` 下的独立文件,并在 `project.md` 中更新为链接。 +- **零摩擦**:无需确认。提取会在执行摘要中报告。 + +### 快速 CR 切换 + +有多个进行中的 CR?快速切换: + +- `/pace-dev #3`——按编号直接跳转到 CR-003。 +- `/pace-dev --last`——恢复最近工作过的 CR。 +- `/pace-dev "login"`——按关键词匹配(已有行为)。 + +### 简化审批 + +简化审批路径(跳过 `in_review` 等待)现在区分外观性修复和结构性修复: + +- **外观性修复**(lint、格式化、import 排序)不计为"非首次通过"。这意味着更多 S 级 CR 有资格走简化审批。 +- **结构性修复**(逻辑错误、类型错误、缺少 return)仍然取消简化审批资格。 +- **批量审批**:当 2+ 个 CR 同时处于 `in_review` 且均符合简化审批条件时,Claude 提供批量确认提示。 + +### L/XL 步级进度 + +对于有执行计划的 L/XL 级 CR,每完成一步会产出: + +1. CR 事件日志中的步骤检查点标记(`[checkpoint: step-3-done]`)。 +2. state.md 中的步级定位器(`→ Step 3/5: middleware implementation`)。 +3. 给用户的一行进度通知:`[Step 3/5] middleware implementation complete`。 + +这实现了精确的跨会话恢复——下次会话从准确的步骤处继续。 + +### 执行计划编辑 + +在 L/XL 级 CR 的确认门禁期间,你可以用自然语言调整计划: + +- "删除步骤 3" / "合并步骤 4 和 5" / "在步骤 2 后添加一步:..." / "交换步骤 3 和 4" +- Claude 应用变更并重新呈现更新后的计划供确认。 +- 你还可以设置**暂停点**:"在步骤 3 后暂停"——Claude 会在那里停下并输出进度摘要,等待确认后再继续。 + +### 探索到推进的上下文继承 + +在同一会话中从探索模式切换到推进模式时,Claude 自动从先前讨论中提取相关上下文: + +- 方案选择及其理由 +- 发现的约束和依赖 +- 探索中达成的结论 + +该上下文会带来源标注预填到 CR 意图部分,无需重新解释已做出的决策。 + +### 开发中压缩建议 + +对于已完成执行计划 60% 以上的 L/XL 级 CR,当当前步骤与之前步骤无关时,Claude 可能建议 `/compact`——为剩余工作释放上下文空间。 + +### 推进中内联探索 + +如果你在实现过程中需要思考或讨论替代方案("让我想想..." / "有没有更好的方法?"),Claude 暂停推进工作流进行自由讨论,然后在你说"继续"时无缝恢复。讨论成果自动更新到 CR 意图中。 + +## 使用场景 + +### 场景 1:快速修复 Bug + +``` +你: /pace-dev "fix null check in auth middleware" +Claude: 未找到现有 CR。创建 CR-012(类型: defect,严重度: medium)。 + 范围已确认,开始工作。 + + ...(实现修复、添加测试、提交)... + + 变更摘要 + - 文件: 修改 1 / 新增 1 + - CR 状态: created --> developing --> verifying --> in_review + - Gate 1: 通过。Gate 2: 通过。 + 准备好供你审查。批准合并? +``` + +### 场景 2:多模块功能(L) + +``` +你: /pace-dev "add role-based access control" +Claude: CR-013 已创建。复杂度: L(8 文件、4 目录、双向依赖)。 + 建议拆分为 2 个 CR: + - CR-013a: 角色模型 + 权限中间件 + - CR-013b: 角色分配的管理界面 + 依赖关系: CR-013a 先于 CR-013b。 + 拆分还是作为一个 CR 继续? + +你: 拆分。 +Claude: 已创建 CR-013a 和 CR-013b。开始 CR-013a。 + 以下是执行计划(5 步): + 1. 在 models/role.ts 中定义 Role schema ... + ... + 确认开始,还是调整? + +你: 开始吧。 +Claude: ...(自主实现,带检查点)... +``` + +### 场景 3:恢复中断的工作 + +``` +你: /pace-dev +Claude: 恢复 CR-010(状态: developing,执行计划步骤 3/5)。 + 上次检查点: "已完成 API 路由,下一步: 编写集成测试。" + 从步骤 4 继续。 +``` + +## 与其他命令的集成 + +| 命令 | 与 `/pace-dev` 的关系 | +|------|----------------------| +| `/pace-init` | 初始化 `/pace-dev` 运行所在的 `.devpace/` 项目结构 | +| `/pace-review` | CR 到达 `in_review` 时自动调用;生成 Gate 2 的 diff 摘要 | +| `/pace-test` | 开发期间可调用,生成与验收标准对齐的测试骨架 | +| `/pace-change` | 处理需求变更(增删改 PF);`/pace-dev` 负责实现 | +| `/pace-sync` | CR 状态转换后,sync-push hook 提醒你推送状态到 GitHub | +| `/pace-status` | 随时查看当前 CR 状态和项目进度 | +| `/pace-guard` | 在 L/XL 级 CR 的意图检查点期间调用风险预扫描 | +| `/pace-next` | CR 合并后或无进行中工作时建议下一步操作 | + +## 相关资源 + +- [dev-procedures-common.md](../../skills/pace-dev/dev-procedures-common.md) -- 通用规则(context.md 生成、同步建议、决策日志、透明度摘要) +- [dev-procedures-intent.md](../../skills/pace-dev/dev-procedures-intent.md) -- 意图检查点、复杂度评估、执行计划、方案确认 +- [dev-procedures-developing.md](../../skills/pace-dev/dev-procedures-developing.md) -- 步骤隔离、漂移检测、L/XL 检查点 +- [dev-procedures-gate.md](../../skills/pace-dev/dev-procedures-gate.md) -- Gate 1/2 通过后反思 +- [dev-procedures-postmerge.md](../../skills/pace-dev/dev-procedures-postmerge.md) -- 功能发现、PF 溢出检查 +- [dev-procedures-defect.md](../../skills/pace-dev/dev-procedures-defect.md) -- 缺陷/热修复 CR 创建及修复后处理 +- [cr-format.md](../../knowledge/_schema/cr-format.md) -- CR 文件 schema(字段、状态、事件日志格式) +- [devpace-rules.md](../../rules/devpace-rules.md) -- 运行时行为规则(推进模式约束、双模式系统) +- [用户指南](../user-guide.md) -- 所有命令快速参考 diff --git a/docs/features/pace-release_zh.md b/docs/features/pace-release_zh.md new file mode 100644 index 0000000..fd1ce54 --- /dev/null +++ b/docs/features/pace-release_zh.md @@ -0,0 +1,330 @@ +# 发布管理(`/pace-release`) + +devpace 将发布视为一项**主动编排活动**,而非被动的状态追踪。`/pace-release` 驱动完整的发布生命周期——收集已合并的变更、跨环境部署、验证结果、以及通过自动化 changelog、版本号管理和标签完成发布关闭——全程基于 `git` 和 `gh` 等标准工具。它编排你的发布流水线,而非替代你的 CI/CD。 + +## 前置条件 + +| 条件 | 用途 | 是否必需? | +|------|------|:---------:| +| `.devpace/` 已初始化 | 包含已合并 CR 的核心 devpace 项目结构 | 是 | +| `.devpace/releases/` 目录 | 发布文件存储(首次 `create` 时自动创建) | 自动 | +| `integrations/config.md` | Gate 4 检查、部署命令、版本文件配置、环境晋升 | 可选 | +| `gh` CLI | 通过 `tag` 子命令创建 GitHub Release | 可选 | + +> **优雅降级**:所有功能在没有 `integrations/config.md` 时都能正常工作——只是需要手动提供更多信息。Changelog 生成始终可用,因为它直接读取 CR 元数据。 + +## 快速开始 + +``` +1. /pace-release create --> 收集已合并的 CR,建议版本号,创建 REL-001(staging) +2. /pace-release deploy --> 记录部署到目标环境(deployed) +3. /pace-release verify --> 执行验证清单(verified) +4. /pace-release close --> 生成 changelog + 升级版本号 + 创建标签 + 级联更新(closed) +``` + +或者直接调用 `/pace-release`(无参数)——引导向导会检测当前状态并引导你执行正确的下一步。 + +## 命令参考 + +### 用户层(User Layer) + +以下六个命令覆盖标准发布生命周期。大多数团队只需要这些命令。 + +#### `create` + +从已合并的 CR 创建新的 Release。 + +**语法**:`/pace-release create` + +扫描 `.devpace/backlog/` 中处于 `merged` 状态且尚未关联到 Release 的 CR。按类型排序显示候选项(hotfix > defect > feature),询问包含哪些 CR,建议语义化版本号,并创建一个处于 `staging` 状态的 `REL-xxx.md` 文件。如果存在 `integrations/config.md`,可选执行 Gate 4 系统级检查。详见 [release-procedures-create.md](../../skills/pace-release/release-procedures-create.md)。 + +#### `deploy` + +记录一次环境部署。 + +**语法**:`/pace-release deploy` + +支持单环境和多环境晋升。在多环境配置下,遵循定义的晋升路径(`env1 -> env2 -> ... -> envN`),在每个环境执行 deploy + verify 后再晋升到下一个环境。将部署记录追加到 Release 文件,并将状态从 `staging` 转换为 `deployed`。详见 [release-procedures-deploy.md](../../skills/pace-release/release-procedures-deploy.md)。 + +#### `verify` + +执行部署后验证。 + +**语法**:`/pace-release verify` + +展示验证清单(当 `integrations/config.md` 定义了验证命令时,自动验证结果会预填充)。引导你逐项确认。如果发现问题,记录问题并帮助创建与当前 Release 关联的 defect/hotfix CR。全部通过后,将状态转换为 `verified`。详见 [release-procedures-verify.md](../../skills/pace-release/release-procedures-verify.md)。 + +#### `close` + +执行所有关闭操作,完成发布。 + +**语法**:`/pace-release close` + +要求 `verified` 状态。自动执行完整的关闭链:changelog 生成、版本文件升级、Git 标签创建(每一步都会显示简要提示,可跳过),然后进行级联状态更新——CR 状态更新为 `released`、project.md 功能树标记、迭代追踪、state.md 清理和仪表盘指标更新。详见 [release-procedures-close.md](../../skills/pace-release/release-procedures-close.md) 中的 8 步关闭链。 + +#### `full` + +`close` 的推荐别名,语义更清晰("完成发布"而非"关闭")。 + +**语法**:`/pace-release full` + +行为与 `close` 完全一致。详见 [release-procedures-close.md](../../skills/pace-release/release-procedures-close.md)。 + +#### `status` + +查看当前 Release 状态和建议的下一步操作。 + +**语法**:`/pace-release status` + +显示活跃 Release 的 CR 按类型分组明细、部署问题计数、验证进度,以及推荐的下一步操作。当不存在活跃 Release 时,显示可供发布的已合并 CR 数量。详见 [release-procedures-status.md](../../skills/pace-release/release-procedures-status.md)。 + +### 专家层(Expert Layer) + +以下命令可单独使用,供需要精细控制特定发布步骤的团队使用。在正常的 `close` 流程中,步骤 1-3 会自动执行。 + +#### `changelog` + +从 CR 元数据自动生成 CHANGELOG.md。 + +**语法**:`/pace-release changelog` + +读取活跃 Release 中包含的 CR,按类型分组(Features / Bug Fixes / Hotfixes)并关联 PF,将条目写入 Release 文件和项目根目录的 `CHANGELOG.md`。详见 [release-procedures-changelog.md](../../skills/pace-release/release-procedures-changelog.md)。 + +#### `version` + +升级语义化版本号。 + +**语法**:`/pace-release version` + +从 `integrations/config.md` 读取版本文件配置(支持 JSON、TOML、YAML、纯文本)。根据 CR 类型推断升级级别:包含 feature = minor,仅 defect/hotfix = patch。用户可覆盖。就地更新版本文件。详见 [release-procedures-version.md](../../skills/pace-release/release-procedures-version.md)。 + +#### `tag` + +创建 Git 标签,可选创建 GitHub Release。 + +**语法**:`/pace-release tag` + +使用 Release 版本号和配置的前缀(默认 `v`)创建带注释的 Git 标签。当 `gh` CLI 可用时,提供创建 GitHub Release 的选项,以 changelog 内容作为 Release Notes。详见 [release-procedures-tag.md](../../skills/pace-release/release-procedures-tag.md)。 + +#### `notes` + +生成面向用户的 Release Notes,按业务影响组织。 + +**语法**:`/pace-release notes [--role biz|ops|pm]` + +与面向开发者的 changelog(按 CR 类型分组)不同,Release Notes 按 BR(业务需求)和 PF(产品功能)组织,使用产品语言,不包含技术标识符。包含"业务影响"章节,向上追溯到 OBJ 级别的目标和 MoS 进度。 + +通过 `--role` 参数生成特定角色视角的通知:`biz`(面向管理层的业务影响报告)、`ops`(面向运维的部署手册)、`pm`(面向产品经理的功能交付清单)。详见 [release-procedures-notes.md](../../skills/pace-release/release-procedures-notes.md)。 + +#### `branch` + +管理发布分支。 + +**语法**:`/pace-release branch [create|pr|merge]` + +支持 `integrations/config.md` 中配置的三种分支模式:直接发布(默认,在 main 上打标签)、发布分支(`release/v{version}` 用于最终修复)、Release PR(PR 驱动的发布流程,灵感来自 Release Please)。未配置分支模式时,所有操作在 main 分支上进行。详见 [release-procedures-branch.md](../../skills/pace-release/release-procedures-branch.md)。 + +#### `rollback` + +当已部署的 Release 出现严重问题时记录回滚。 + +**语法**:`/pace-release rollback` + +仅在 Release 处于 `deployed` 状态时可用。记录回滚原因,将回滚条目追加到部署日志,将状态转换为 `rolled_back`(终态),并引导创建根因追踪的 defect/hotfix CR。在回滚后创建新 Release 时,已回滚 Release 中无问题的 CR 会自动预填充为候选项,减少重复选择。详见 [release-procedures-rollback.md](../../skills/pace-release/release-procedures-rollback.md)。 + +#### `status history` + +查看发布历史时间线及 DORA 趋势。 + +**语法**:`/pace-release status history` + +扫描所有 Release 文件生成跨发布纵向视图:版本演进、每个 Release 的 CR 数量/类型、回滚标记、平均发布周期时长,以及 DORA 指标趋势(部署频率、前置时间、变更失败率)。默认显示最近 10 个 Release。详见 [release-procedures-status.md](../../skills/pace-release/release-procedures-status.md)。 + +## 发布状态机 + +``` + create deploy verify close + (merged CRs) -----> staging -----> deployed -----> verified -----> closed + | + | rollback + v + rolled_back +``` + +| 状态 | 含义 | 允许的转换 | +|------|------|-----------| +| `staging` | Release 已创建,CR 已收集,准备部署 | `deployed` | +| `deployed` | 已部署到目标环境 | `verified`、`rolled_back` | +| `verified` | 部署后验证通过 | `closed` | +| `closed` | 发布完成,所有关闭操作已执行(终态) | -- | +| `rolled_back` | 因严重问题回退部署(终态) | -- | + +`deployed` 和 `verified` 转换需要人工确认。`verified` 到 `closed` 的转换由 Claude 自动执行(包括关闭链)。 + +## 核心特性 + +### Gate 4:系统级发布检查 + +在 `create` 之后运行的可选预部署门禁: + +1. **构建验证**——执行 `integrations/config.md` 中的构建命令;失败时显示最后 10 行错误输出及建议修复步骤 +2. **CI 状态检查**——查询 CI 流水线状态(无显式配置时自动检测 CI 配置);失败时通过 `gh run view --web` 提供 CI 运行 URL +3. **候选项完整性**——确认所有包含的 CR 已通过 Gate 1/2/3(`merged` 状态);失败时列出具体 CR 及其未通过的 Gate +4. **测试报告**——通过 `/pace-test report` 自动生成 Release 级质量报告 + +Gate 4 不会阻断 Release 创建——它在部署前暴露问题。检查结果持久化到 Release 文件中用于审计追溯。 + +### CR 依赖检测 + +在 `create` 过程中自动检测候选 CR 之间的依赖关系: +- **功能依赖**:关联到同一 PF 的 CR 被标记为功能相关 +- **代码级依赖**:修改相同文件的 CR 被标记为代码交叉风险 +- 显示包含/排除建议的依赖关系图 + +### 发布就绪检查 + +`create` 过程中的可选预验证,扫描候选 CR 的代码变更: +- 临时代码标记(`TODO`、`FIXME`、`console.log`、`debugger`) +- 缺失的测试覆盖(没有 accept 记录的 CR) +- 生成就绪度评分(A/B/C)——仅作参考,不会阻断流程 + +### 发布影响预览 + +在 `create` 之后自动生成,提供发布级别的全景视图: +- 代码变更统计(新增/删除行数、影响文件数) +- 模块级变更热力图 +- 风险区域高亮(多个 CR 修改同一文件) +- 业务影响追溯(本次 Release 对 OBJ/BR 进度的贡献) + +### Changelog 自动生成 + +Changelog 条目完全从 CR 元数据(标题、类型、PF 关联)生成,无需手动编写。输出同时写入 Release 文件和项目根目录的 `CHANGELOG.md`(在顶部追加,保留历史记录)。 + +### 带业务影响的 Release Notes + +与 changelog 不同的独立输出:按 BR/PF 组织,使用产品语言,包含"业务影响"章节,向上追溯到 OBJ 级别的目标和 MoS 里程碑。通过 `--role` 参数支持角色视角: +- `--role biz`:面向业务(OBJ 进度、MoS 达成情况) +- `--role ops`:面向运维(部署详情、风险评估、回滚方案) +- `--role pm`:面向产品(功能交付清单、完成百分比) + +Release Notes 生成门槛已降低:当 Release 包含至少 1 个 feature CR 时即可生成(之前要求 2+ 个 CR)。 + +### 带全景视图的环境晋升 + +当 `integrations/config.md` 定义了多个环境时,`deploy` 按顺序晋升路径执行,每个环境进行 deploy + verify。每次 deploy/verify 操作都会显示完整的晋升全景图及当前位置:`[staging ✅] → [canary ✅] → [production 👈 current] → [done]`。 + +### 关闭进度追踪 + +8 步关闭链逐步显示进度(`[N/8] ✅ Step name`),并支持中断恢复——如果任何步骤失败,之前的步骤会被保留,流程可以从失败步骤恢复。 + +### 带候选项预填充的回滚路径 + +当已部署的 Release 出现严重问题时,`rollback` 提供结构化的应对方案:记录事件、回退到已知状态、创建 CR 追踪修复。在回滚后创建新 Release 时,已回滚 Release 中无问题的 CR 会自动预填充为候选项,减少重复选择。 + +### 带回滚追踪的引导向导 + +无参数向导现在会追踪已回滚的 Release,并在引导下一步操作前显示修复 CR 的状态。防止用户遗忘未解决的回滚后续事项。 + +### 发布历史时间线 + +`status history` 提供跨发布纵向视图:版本演进、每个 Release 的 CR 数量/类型、回滚标记,以及 DORA 趋势摘要。 + +### 发布调度 + +可选的发布节奏配置(`integrations/config.md`)支持在向导和 pace-pulse 中显示窗口提醒,以迭代结束作为自然发布窗口。 + +## 使用场景 + +### 场景 1:标准发布 + +``` +You: /pace-release +Claude: Found 3 merged CRs ready for release. + CR-001 (feature), CR-003 (defect), CR-005 (hotfix). + Create a new release? [Y/n] + +You: Y +Claude: Suggested version: 1.3.0 (contains features). + Confirm version? [1.3.0] + +You: 1.3.0 +Claude: REL-002 created (v1.3.0, staging). + Gate 4: Build OK, CI green, all CRs passed. + Next: /pace-release deploy + +You: /pace-release deploy +Claude: Deploy REL-002 (v1.3.0) to production? [Y/n] + +You: Y +Claude: Deployment recorded. Status: deployed. + Next: /pace-release verify + +You: /pace-release verify +Claude: Verification checklist (3/3 passed). + All items confirmed. Status: verified. + Next: /pace-release close + +You: /pace-release close +Claude: Generating changelog... done. + Version bump 1.2.0 -> 1.3.0? [Y] ... done. + Create tag v1.3.0? [Y] ... done. + + Release REL-002 complete (v1.3.0). + - Changelog written to CHANGELOG.md + - Version updated in package.json + - Tag v1.3.0 created + - 3 CRs marked as released + - Metrics updated (deploy frequency, lead time) +``` + +### 场景 2:部署后回滚 + +``` +You: /pace-release +Claude: REL-002 (v1.3.0) is deployed. Start verification? [Y/n/problem] + +You: problem +Claude: What happened? + +You: Payment processing is failing for international orders +Claude: Recording rollback for REL-002. + Reason: Payment processing failure (international orders). + Status: rolled_back. + + Create a hotfix CR for this issue? [Y/n] + +You: Y +Claude: CR-006 created (hotfix, linked to REL-002). + After fixing, create a new release with /pace-release create. +``` + +## 与其他命令的集成 + +| 命令 | 集成点 | +|------|--------| +| `/pace-dev` | CR 通过开发流水线达到 `merged` 状态,成为发布候选项 | +| `/pace-test` | `create` 通过 `/pace-test report` 自动生成 Release 级测试报告 | +| `/pace-feedback` | `verify` 中的问题触发通过反馈流创建 defect CR | +| `/pace-retro` | `close` 更新 dashboard.md 的 DORA 指标(部署频率、前置时间、变更失败率) | +| `/pace-sync` | 未来集成(Phase 19),用于外部平台发布状态同步 | + +## 相关资源 + +- [SKILL.md](../../skills/pace-release/SKILL.md) -- Skill 入口点和路由表 +- [release-procedures-common.md](../../skills/pace-release/release-procedures-common.md) -- 共享规则(版本推断 SSOT、发布规则、集成规则) +- [release-procedures-wizard.md](../../skills/pace-release/release-procedures-wizard.md) -- 引导向导(无参数流程) +- [release-procedures-create.md](../../skills/pace-release/release-procedures-create.md) -- 创建流程(CR 收集、版本建议) +- [release-procedures-create-enhanced.md](../../skills/pace-release/release-procedures-create-enhanced.md) -- 创建增强(依赖检测、就绪检查、Gate 4) +- [release-procedures-deploy.md](../../skills/pace-release/release-procedures-deploy.md) -- 部署流程(环境晋升) +- [release-procedures-verify.md](../../skills/pace-release/release-procedures-verify.md) -- 验证流程(健康检查) +- [release-procedures-close.md](../../skills/pace-release/release-procedures-close.md) -- 关闭/完成流程(8 步链) +- [release-procedures-changelog.md](../../skills/pace-release/release-procedures-changelog.md) -- Changelog 生成 +- [release-procedures-version.md](../../skills/pace-release/release-procedures-version.md) -- 版本号升级 +- [release-procedures-tag.md](../../skills/pace-release/release-procedures-tag.md) -- Git 标签和 GitHub Release +- [release-procedures-rollback.md](../../skills/pace-release/release-procedures-rollback.md) -- 回滚流程(候选项预填充) +- [release-procedures-notes.md](../../skills/pace-release/release-procedures-notes.md) -- Release Notes(角色视角) +- [release-procedures-branch.md](../../skills/pace-release/release-procedures-branch.md) -- 分支管理 +- [release-procedures-scheduling.md](../../skills/pace-release/release-procedures-scheduling.md) -- 发布调度 +- [release-procedures-status.md](../../skills/pace-release/release-procedures-status.md) -- 状态和历史 +- [integrations-format.md](../../knowledge/_schema/integrations-format.md) -- 集成配置 Schema +- [devpace-rules.md](../../rules/devpace-rules.md) -- 运行时行为规则 diff --git a/docs/features/pace-review_zh.md b/docs/features/pace-review_zh.md new file mode 100644 index 0000000..477f63c --- /dev/null +++ b/docs/features/pace-review_zh.md @@ -0,0 +1,206 @@ +# 代码审查与质量门禁(`/pace-review`) + +`/pace-review` 为处于 `in_review` 状态的变更请求(CR)生成结构化审查摘要。它将自动质量门禁检查(Gate 2)与对抗审查层和累积 Diff 报告相结合,然后移交给人类进行最终审批(Gate 3)。目标是以最低的认知负担,为审查者提供做出知情批准/拒绝决策所需的全部信息。 + +核心能力:复杂度自适应摘要深度(S 微型 / M 标准 / L-XL 含 TL;DR 的完整摘要)、reject-fix 周期的增量重审(Delta Review)、跨 CR 冲突检测、accept 报告引导的对抗聚焦、审查历史持久化,以及审批过程中的探索模式。 + +## 快速开始 + +``` +1. CR 到达 in_review 状态(通过 /pace-dev) +2. /pace-review → Gate 2 + 对抗审查 + 生成摘要 +3. 人类审查摘要 → "approved" / 拒绝 / 具体反馈 +4. approved → git merge → CR 转换为 merged +``` + +Gate 3(人类审批)在任何情况下都不可绕过(Iron Law IR-2)。 + +## 审查流程 + +### Step 1: 识别待审 CR + +扫描 `.devpace/backlog/` 中处于 `in_review` 状态的 CR。可选的关键词参数用于缩小到特定 CR。如果没有符合条件的 CR,**状态感知引导**会介入:建议检查 `verifying` 状态的 CR 或推进 `developing` 状态的 CR——用户无需了解状态机知识。 + +### Step 1.5: 跨 CR 冲突检测 + +扫描所有活跃 CR(developing/verifying/in_review),检测与当前 CR 在文件和模块层面的重叠。仅在检测到重叠时显示——无冲突时零噪音。 + +### Step 2: Gate 2——自动质量检查 + +Gate 2 是人类审查前的最后一道自动门禁。它遵循**独立验证原则**:不信任 Gate 1 的任何结果——所有证据重新采集(重新读取验收标准、重新获取 git diff)。 + +**执行顺序(强制)**: + +1. 从 CR 文件中重新读取验收标准 +2. 获取最新的 `git diff main...` +3. 首先检查意图一致性——如果意图不匹配,Gate 2 立即失败(不再执行后续检查) + +**意图一致性检查**将每条验收标准标记为:满足(pass)、未满足(fail + 缺少什么)、或部分满足(已完成与待完成的详细说明)。同时检测**范围外变更**和**范围内遗漏**。 + +**失败时**:CR 返回 `developing` 状态。Claude 修复差距、重新运行检查并重新提交——无需手动重启。 + +### Step 3: 对抗审查 + +Gate 2 通过后,思维模式从"验证正确性"切换为"发现缺陷"——这是对确认偏误的反制措施。 + +**核心规则**:零发现不可接受。如果所有维度都没有发现问题,至少输出一条可选优化建议作为下限。 + +**四个强制维度**(每个至少考虑一次): + +| 维度 | 示例 | +|------|------| +| 边界与错误路径 | 空输入、极端值、并发、超时 | +| 安全风险 | 注入、权限提升、敏感数据泄露 | +| 性能隐患 | N+1 查询、大量内存分配、阻塞操作 | +| 集成风险 | API 契约变更、向后兼容性 | + +严重性标签:`🔴 建议修复` / `🟡 建议改进` / `🟢 可选优化`。每条发现都附带误报免责声明。 + +**Accept 报告集成**:当存在 `/pace-test accept` 报告时,覆盖薄弱和低置信度区域会引导对抗聚焦——在易受攻击的代码路径上检查更多维度。 + +**可配置维度**:项目可在 `checks.md` 中定义自定义对抗维度(如数据一致性、无障碍访问、向后兼容性)。未配置时使用默认的 4 个维度。 + +**关键规则**:对抗发现不会阻断 Gate 2——它们是提供给人类审查者的参考信息。简单 CR(复杂度 S)完全跳过对抗审查。 + +### Step 4: 累积 Diff 报告 + +对于中等及以上复杂度的 CR,生成按模块分组的 diff 报告,将每个变更文件映射到对应的验收标准: + +``` +累积 diff 报告: + 模块 A (+N/-M 行): + - file1.ts (新增) → 验收标准 1 + - file2.ts (修改) → 验收标准 2 + ⚠️ 未覆盖的标准:[列表] + ⚠️ 范围外变更:[文件 + 理由] +``` + +这与 Gate 2 互补:Gate 2 检查"是否完成?",diff 报告展示"如何完成?"。简单 CR 跳过此报告。 + +### Step 5: 审查摘要 + +摘要深度根据 CR 复杂度自适应调整(遵循 P2 渐进暴露 + P6 三层透明原则): + +| 复杂度 | 摘要级别 | 内容 | +|--------|---------|------| +| S | 微型(3-5 行) | 变更内容 + 质量状态 + 等待审批 | +| M | 标准 | 意图匹配(含推理后缀)+ 对抗审查 + 累积 diff | +| L/XL | 完整 + TL;DR | 前置 2-3 行执行摘要,后接完整详情 | + +每条意图匹配判定附带不超过 15 字符的证据后缀(如 `RetryPolicy.ts:23 exponentialBackoff`)。用户可以追问"为什么?"来展开完整推理链。 + +**业务可追溯性**:自动追溯 CR → PF → BR 价值链。对不完整的追溯链如实标注,而非编造链接。 + +**审查历史**:摘要持久化到 CR 的验证证据章节(`### Review Summary (Round N, YYYY-MM-DD)`),支持会话恢复和增量重审基线。 + +详细规则见审查流程文件:[common](../../skills/pace-review/review-procedures-common.md)(始终加载)、[gate](../../skills/pace-review/review-procedures-gate.md)(M+ 审查)、[delta](../../skills/pace-review/review-procedures-delta.md)(增量审查)、[feedback](../../skills/pace-review/review-procedures-feedback.md)(决策后处理)。 + +### Step 6: 人类决定(Gate 3) + +| 人类响应 | 动作 | +|---------|------| +| "approved" / "lgtm" | CR → `approved` → `git merge` → CR → `merged` → 级联更新 | +| 拒绝 + 原因 | CR → `developing`;原因记录在事件表中(scope / quality / design) | +| 具体反馈 | Claude 修改代码 → 重新运行受影响的检查 → 更新摘要 | +| 探索性问题 | 暂停审批,进入探索模式(CR 保持 `in_review`),做出决定后恢复 | + +## 核心特性 + +### 反表演性意见处理 + +收到审查反馈时,Claude 遵循:**理解**真实意图(澄清歧义)→ **评估**与 CR 范围的对齐度 → **执行 + 验证**。禁止行为:回复"您说得对!"、接受违反 YAGNI 原则的建议、在理解意图之前修改代码。 + +### 独立验证原则 + +Gate 2 从零开始采集所有证据。不信任 Gate 1 的快照——上下文窗口会漂移,且代码可能在 `verifying` 阶段发生变更。这是一条硬性要求。 + +### 增量重审(Delta Review) + +当 CR 存在近期的 reject-fix 历史时,`/pace-review` 仅针对被拒绝时涉及的维度进行检查,而非重新运行完整审查。未变更的验收标准标记为"与上一轮相同"。用户可通过"完整审查"或"重新审查全部"强制执行完整审查。 + +### 简化审批集成 + +满足所有快速通道条件的简单 CR(S 复杂度、Gate 1/2 一次通过、0% 偏移)由 `/pace-dev` 内联审批处理。当用户在简化审批中选择"让我看看"时,`/pace-review` 以 S 级微型摘要激活——对于已通过验证的 CR 无需完整审查。 + +### 结构化拒绝记录 + +当人类拒绝 CR 时,Claude 提示输入分类原因(最多一次追问)。类别:**scope**(验收差距或范围蔓延)、**quality**(代码或测试覆盖率)、**design**(架构或 API)。记录存储在 CR 事件表中,供 `/pace-learn` 提取模式。 + +## 使用场景 + +### 场景 1:标准审查流程 + +``` +用户: /pace-review +Claude:找到 CR-007(in_review) + + ## 为支付网关添加重试逻辑 + **变更内容**:src/payment/ 下 3 个文件,新增 RetryPolicy 类 + **意图匹配**: + ✅ AC-1:指数退避 — 满足 + ✅ AC-2:最多重试 3 次 — 满足 + **对抗审查**(2 条发现): + 🟡 maxDelay 缺少上限校验 + 🟢 建议添加重试指标日志 + **质量**:✅ lint、types、tests | ⏳ 等待人类审批 + +用户: approved +Claude:✅ CR-007 → approved → 已合并到 main +``` + +### 场景 2:审查中的反馈修改 + +``` +用户: /pace-review CR-012 +Claude:[生成摘要] + +用户: parser.ts 的错误处理太宽泛了——请使用具体的 catch 块。 + +Claude:理解:按错误类型使用细粒度 catch 块。与 AC-2 对齐。 + 修改 parser.ts ... 重新运行检查 ... ✅ 全部通过。 + 更新摘要:✅ 所有标准满足 | ⏳ 等待人类审批 + +用户: lgtm +Claude:✅ CR-012 → approved → 已合并到 main +``` + +### 场景 3:拒绝后的增量重审 + +``` +用户: 错误处理太宽泛了——请使用具体的 catch 块。 +Claude:理解。CR-012 → developing。原因:quality: 宽泛的错误处理。 + + ...(修复完成,Gate 1/2 重新运行)... + +用户: /pace-review +Claude:找到 CR-012(in_review)—— 检测到之前的拒绝记录,执行增量重审。 + + ## 改进 parser 的错误处理(Delta Review · 第 2 轮) + **上次拒绝原因**:quality: 宽泛的错误处理 + **修复状态**: + ✅ 针对 ParseError、ValidationError、IOError 使用具体 catch 块 + **未变更**:AC-1(通过)、AC-3(通过)—— 未重新检查 + **质量**:✅ 全部通过 | ⏳ 等待人类审批 + +用户: lgtm +Claude:✅ CR-012 → approved → 已合并到 main +``` + +## 与其他 Skill 的集成 + +| Skill | 关系 | +|-------|------| +| `/pace-dev` | Gate 1 通过后将 CR 转换为 `in_review` 状态,移交给 `/pace-review` | +| `/pace-test` | 提供 `accept` 验证证据,在审查摘要中展示 | +| `/pace-change` | CR 状态转换(拒绝 → `developing`)遵循状态机 | +| `/pace-learn` | 结构化拒绝记录为模式提取提供数据,用于未来改进 | + +## 相关资源 + +- [SKILL.md](../../skills/pace-review/SKILL.md) -- Skill 入口点和触发描述 +- [review-procedures-common.md](../../skills/pace-review/review-procedures-common.md) -- 通用审查规则(始终加载) +- [review-procedures-gate.md](../../skills/pace-review/review-procedures-gate.md) -- M+ 审查流水线(意图、对抗、diff) +- [review-procedures-delta.md](../../skills/pace-review/review-procedures-delta.md) -- 增量审查流程 +- [review-procedures-feedback.md](../../skills/pace-review/review-procedures-feedback.md) -- 决策后处理 +- [设计文档](../design/design.md) -- 质量门禁定义和 CR 状态机 +- [devpace-rules.md](../../rules/devpace-rules.md) -- 运行时行为规则 diff --git a/docs/features/pace-test.md b/docs/features/pace-test.md index 0edd155..01e4e2e 100644 --- a/docs/features/pace-test.md +++ b/docs/features/pace-test.md @@ -184,7 +184,7 @@ AI-powered acceptance verification against PF criteria. **Syntax**: `/pace-test accept [CR-ID]` -The core differentiator of devpace testing. For each PF acceptance criterion, Claude selects a verification level -- L1 dynamic (execute tests/CLI), L2 static semantic (read code with line references), L3 manual (generate human checklists) -- and produces per-criterion evidence. Also performs a Test Oracle Check: reviews whether existing tests actually verify what they claim, downgrading weak or false coverage in `test-strategy.md`. See [verify-procedures.md](../../skills/pace-test/verify-procedures.md) for steps. +The core differentiator of devpace testing. For each PF acceptance criterion, Claude selects a verification level -- L1 dynamic (execute tests/CLI), L2 static semantic (read code with line references), L3 manual (generate human checklists) -- and produces per-criterion evidence. Also performs a Test Oracle Check: reviews whether existing tests actually verify what they claim, downgrading weak or false coverage in `test-strategy.md`. See [test-procedures-verify.md](../../skills/pace-test/test-procedures-verify.md) for steps. **Output example**: ``` @@ -255,6 +255,6 @@ Traditional tools measure **code coverage** ("what percentage of lines are execu - [test-procedures-coverage.md](../../skills/pace-test/test-procedures-coverage.md) -- Coverage - [test-procedures-impact.md](../../skills/pace-test/test-procedures-impact.md) -- Impact - [test-procedures-report.md](../../skills/pace-test/test-procedures-report.md) -- Reports -- [verify-procedures.md](../../skills/pace-test/verify-procedures.md) -- Acceptance verification +- [test-procedures-verify.md](../../skills/pace-test/test-procedures-verify.md) -- Acceptance verification - [test-procedures-advanced.md](../../skills/pace-test/test-procedures-advanced.md) -- Flaky, dryrun, baseline - [devpace-rules.md](../../rules/devpace-rules.md) -- Runtime behavior rules diff --git a/docs/features/pace-test_zh.md b/docs/features/pace-test_zh.md new file mode 100644 index 0000000..a874109 --- /dev/null +++ b/docs/features/pace-test_zh.md @@ -0,0 +1,261 @@ +# 测试策略与质量管理(`/pace-test`) + +devpace 的测试远不止"跑测试、看结果"。`/pace-test` 管理的是一条**需求驱动**的测试生命周期:Product Feature (PF) 的验收标准定义了需要验证什么,测试策略将标准映射到测试类型,覆盖分析衡量的是需求覆盖率(而非仅仅是代码覆盖率),AI 驱动的验收验证则为每条标准提供带代码级引用的证据。最终结果是一套每一步都可追溯到业务意图的测试流程。 + +## 前置条件 + +| 条件 | 用途 | 是否必须 | +|------|------|:--------:| +| `.devpace/` 已初始化 | 项目结构、PF 定义、CR 追踪 | 是 | +| `.devpace/rules/checks.md` | 为 `run` 和 `dryrun` 配置测试命令 | 推荐 | +| `.devpace/rules/test-strategy.md` | PF 到测试的映射(由 `strategy` 生成) | 推荐 | + +> **优雅降级**:若未初始化 `.devpace/`,Skill 回退到纯代码模式(自动检测测试命令)。若缺少 `checks.md`,会从 `package.json`、`pyproject.toml`、`go.mod` 或 `Cargo.toml` 中自动检测常见测试命令。若缺少 `test-strategy.md`,需求级分析仍可通过直接读取 `project.md` 中的 PF 标准运作。 + +## 快速开始 + +``` +1. /pace-test strategy --> 将 PF 验收标准映射到测试类型 +2. /pace-test generate PF-001 --full --> 基于标准生成测试用例 +3. /pace-test coverage --> 分析需求覆盖缺口 +4. /pace-test --> 运行所有已配置的测试 +5. /pace-test accept --> AI 验收验证 +6. /pace-test report --> 生成可供评审的摘要报告 +``` + +日常工作流:大多数会话只需 `/pace-test`(run)+ `/pace-test accept`。迭代期间使用 `impact --run` 可快速执行受影响的测试。 + +## 命令参考 + +### 执行层(Execution Layer) + +#### `run`(默认,无参数) + +执行所有已配置的测试命令并生成结构化报告。 + +**语法**:`/pace-test` + +读取 `checks.md` 中的测试命令(回退到自动检测),按依赖顺序执行,生成按 Gate 分组的通过/失败报告。当测试失败时,分析最近的 `git diff` 推断可能的根因。配置了 `.devpace/integrations/config.md` 时会自动附加 CI 运行状态。详细步骤见 [test-procedures-core.md](../../skills/pace-test/test-procedures-core.md)。 + +**输出示例**: +``` +| # | Check | Gate | Status | Time | Notes | +|---|----------|--------|--------|------|---------------------| +| 1 | npm test | Gate 1 | PASS | 3.2s | -- | +| 2 | eslint . | Gate 1 | FAIL | 1.1s | 3 errors in auth.ts | +Summary: 1/2 passed +Suggestion: run /pace-test dryrun 1 to pre-check Gate 1 +``` + +#### `generate` + +基于 PF 验收标准创建测试用例。 + +**语法**:`/pace-test generate [PF-title] [--full]` + +默认(skeleton)模式生成带 `// TODO` 占位符的脚手架代码。`--full` 模式生成包含断言、边界条件和错误路径的完整实现(标记 `// REVIEW: AI-generated`)。自动注册到 `test-strategy.md`。在 TDD 上下文中会追加 Red-Green-Refactor 提醒。详细步骤见 [test-procedures-generate.md](../../skills/pace-test/test-procedures-generate.md)。 + +**输出示例**: +``` +Generated 4 test cases [full] for PF "User Authentication" (4 criteria): +1. Users can log in --> test_login_email_password [3 assertions + 2 boundary] +2. Failed login error --> test_login_failure_message [2 assertions] +File: tests/test_auth.py | Mode: full (review REVIEW markers) +``` + +#### `dryrun` + +模拟 Gate 检查,不触发状态转换。 + +**语法**:`/pace-test dryrun [1|2|4]` + +以只读模式执行完整的 Gate 检查流程(命令检查、意图检查、Gate 2 的对抗性审查)。不产生 CR 状态变更,不写入事件日志。详细步骤见 [test-procedures-advanced.md](../../skills/pace-test/test-procedures-advanced.md)。 + +**输出示例**: +``` +Gate 1 Dry-Run: 1 PASS / 1 FAIL +Prediction: Gate will FAIL +Fix: resolve eslint errors, then re-run /pace-test dryrun 1 +``` + +### 策略层(Strategy Layer) + +#### `strategy` + +基于 PF 验收标准生成系统化测试策略。 + +**语法**:`/pace-test strategy` + +针对每条验收标准,推荐一种主要测试类型(unit、integration、E2E、performance、security、accessibility、manual)和 0-2 种辅助类型。通过名称和内容分析匹配现有测试文件。输出测试金字塔健康评估和实施指引。持久化到 `.devpace/rules/test-strategy.md`。详细步骤见 [test-procedures-strategy-gen.md](../../skills/pace-test/test-procedures-strategy-gen.md)。 + +**输出示例**: +``` +Strategy: 3 PFs, 12 criteria --> 5 unit / 3 integration / 2 E2E / 1 perf [+security] / 1 manual +Covered: 7 | To build: 5 +Pyramid health: needs attention (unit 42%, below 50% threshold) +Next: /pace-test generate [PF] to create tests for uncovered criteria +``` + +#### `coverage` + +分析有多少 PF 验收标准已有对应测试。 + +**语法**:`/pace-test coverage` + +交叉比对 PF 标准与 `test-strategy.md`、`checks.md` 及扫描到的测试文件。可选地收集代码覆盖率作为补充信号(Jest、pytest、go test、cargo tarpaulin)。当 `test-strategy.md` 包含阈值配置时,检查数值是否达标。详细步骤见 [test-procedures-coverage.md](../../skills/pace-test/test-procedures-coverage.md)。 + +**输出示例**: +``` +| PF | Feature | Criteria | Covered | Rate | +|--------|----------------|----------|---------|------| +| PF-001 | Authentication | 5 | 3 | 60% | +| PF-002 | Data Import | 3 | 0 | 0% | +Requirements coverage: 3/8 (38%) +Code coverage (supplementary): 72% line, 58% branch (Jest) +``` + +#### `impact` + +分析变更影响并推荐回归测试范围。 + +**语法**:`/pace-test impact [CR-ID] [--run]` + +从 `git diff` 提取变更文件,构建文件到 PF 的反向映射,识别直接和间接受影响的 PF,并评定风险等级。使用 `--run` 时,在分析后自动执行"必须运行"的测试。详细步骤见 [test-procedures-impact.md](../../skills/pace-test/test-procedures-impact.md)。 + +**输出示例**: +``` +CR-005 "Add CSV export" | Scope: 6 files / 2 modules | Risk: MEDIUM +| PF | Feature | Impact | Suggested Tests | +|--------|-------------|---------|------------------------| +| PF-002 | Data Import | Direct | test_import, test_csv | +| PF-003 | Reports | Indirect| Spot-check recommended | +Must-run: test_import, test_csv +``` + +### 分析层(Analysis Layer) + +#### `report` + +生成面向评审或发布的可读测试摘要。 + +**语法**:`/pace-test report [CR-ID|REL-xxx]` + +**CR 模式**(默认):聚合 Layer 1(测试执行)、Layer 2(需求覆盖)、Layer 3(AI 验收验证),生成合并/拒绝建议。**Release 模式**(`REL-xxx`):聚合发布内所有 CR,提供逐 CR 质量摘要和发布/延期建议。遵循"有什么报什么"原则——缺失的层级会注明,但不阻断报告生成。详细步骤见 [test-procedures-report.md](../../skills/pace-test/test-procedures-report.md)。 + +**输出示例**(CR 模式): +``` +CR-005 | L1: 8/8 passed | L2: 3/5 covered (60%) | L3: 4/5 passed, 1 manual +Recommendation: supplement tests before merge +``` + +#### `flaky` + +检测不稳定测试和主动维护问题。 + +**语法**:`/pace-test flaky` + +扫描 CR 事件历史中的间歇性故障、环境依赖故障、顺序依赖故障和超时波动。执行主动维护检测:空断言、时间膨胀、长期未更新的测试和被跳过的测试。持久化到 `insights.md`,并在 `test-strategy.md` 中降级不稳定测试。详细步骤见 [test-procedures-advanced.md](../../skills/pace-test/test-procedures-advanced.md)。 + +**输出示例**: +``` +Unstable: e2e-login 2/5 (40%) intermittent [CR-003, CR-005] +Maintenance: test_utils::helper (empty assert) | lint (+217% bloat) +Priority: fix empty assertions first (false security) +``` + +#### `baseline` + +建立或更新测试执行基线,用于趋势追踪。 + +**语法**:`/pace-test baseline` + +运行完整测试套件,记录通过率和执行时间,与上一次基线对比。持久化到 `.devpace/rules/test-baseline.md`。由 `/pace-retro` 消费以用于度量分析。详细步骤见 [test-procedures-advanced.md](../../skills/pace-test/test-procedures-advanced.md)。 + +**输出示例**: +``` +Baseline updated: pass rate 85%->92% (+7%) | exec 12.3s->10.1s (-2.2s) | checks 8->10 (+2) +``` + +### 验证层(Verification Layer) + +#### `accept` + +基于 PF 标准的 AI 驱动验收验证。 + +**语法**:`/pace-test accept [CR-ID]` + +这是 devpace 测试的核心差异化能力。针对每条 PF 验收标准,Claude 选择一个验证级别——L1 动态验证(执行测试/CLI)、L2 静态语义验证(读取代码并提供行号引用)、L3 手动验证(生成人工检查清单)——并为每条标准生成证据。同时执行 Test Oracle Check:审查现有测试是否真正验证了其声称的内容,并在 `test-strategy.md` 中降级弱覆盖或虚假覆盖。详细步骤见 [test-procedures-verify.md](../../skills/pace-test/test-procedures-verify.md)。 + +**输出示例**: +``` +CR-005 "Add CSV export" (PF: Data Import) +| # | Criterion | Status | Level | Evidence | +|---|----------------------------|---------|--------|---------------------------------| +| A1| CSV parses all columns | PASS | L1 | test_csv_parser passed | +| A2| Error on malformed rows | PASS | L2 | src/parser.ts:45 validates rows | +| A3| Progress bar during upload | PARTIAL | L2 | Complex async, needs runtime | +| A4| Accessibility | MANUAL | L3 | Checklist generated | +Summary: 2 passed, 1 needs supplement, 1 needs manual check +``` + +**降级行为**:若无 PF 关联,回退到 CR 意图验收标准(在报告中注明)。若两者均缺失,退出并给出明确提示。 + +## 核心差异化:需求驱动的测试 + +传统工具衡量的是**代码覆盖率**("多少百分比的代码行被执行了")。devpace 衡量的是**需求覆盖率**("多少百分比的 PF 验收标准有对应验证")。一个项目可以拥有 95% 的代码覆盖率,却有 0% 的需求覆盖率。`/pace-test` 通过 strategy-generate-coverage-accept-report 管道弥合这一缺口——每一个测试都可追溯到一条 PF 验收标准。 + +## 使用场景 + +### 场景 1:新功能测试规划(TDD) + +``` +/pace-test strategy --> PF-003: 6 criteria, 3 unit / 2 integration / 1 E2E +/pace-test generate PF-003 --full --> 6 test cases. TDD: run to confirm Red phase. +/pace-test coverage --> PF-003 requirements: 6/6 (100%). Code: 0% (expected). +``` + +### 场景 2:合并前质量检查 + +``` +/pace-test accept CR-005 --> 4/5 passed, 1 manual. Oracle: weak coverage found. +/pace-test report CR-005 --> 3-layer report. Recommendation: supplement, then merge. +``` + +### 场景 3:发布就绪评估 + +``` +/pace-test report REL-001 --> 5 CRs, 3 PFs. Risk: LOW. Can ship. +``` + +## 与其他命令的集成 + +| 命令 | 集成点 | +|------|--------| +| `/pace-dev` | Gate 1/2 消费 `checks.md`(与 `run` 执行相同命令)。`accept` 报告作为 Gate 2 证据。 | +| `/pace-review` | Gate 2 消费 `accept` 验证报告作为结构化评审证据。 | +| `/pace-release` | `report REL-xxx` 生成发布级质量报告。`dryrun 4` 验证发布前检查。 | +| `/pace-retro` | `baseline` 为回顾提供度量数据。`flaky` 发现写入 `insights.md`。 | +| `/pace-change` | `impact` 使用 CR 变更范围确定回归测试建议。 | + +## 向后兼容 + +| 旧名称 | 当前名称 | 说明 | +|--------|---------|------| +| `verify` | `accept` | AI 验收验证 | +| `regress` | `impact` | 变更影响分析 | +| `gen` | `generate` | 测试用例生成 | +| `gate` | `dryrun` | Gate 模拟 | + +## 相关资源 + +- [User Guide -- /pace-test 章节](../user-guide.md) +- [SKILL.md](../../skills/pace-test/SKILL.md) -- 入口与路由表 +- [test-procedures-core.md](../../skills/pace-test/test-procedures-core.md) -- Run、CI 集成 +- [test-procedures-strategy-gen.md](../../skills/pace-test/test-procedures-strategy-gen.md) -- Strategy +- [test-procedures-coverage.md](../../skills/pace-test/test-procedures-coverage.md) -- Coverage +- [test-procedures-impact.md](../../skills/pace-test/test-procedures-impact.md) -- Impact +- [test-procedures-report.md](../../skills/pace-test/test-procedures-report.md) -- Reports +- [test-procedures-verify.md](../../skills/pace-test/test-procedures-verify.md) -- 验收验证 +- [test-procedures-advanced.md](../../skills/pace-test/test-procedures-advanced.md) -- Flaky、dryrun、baseline +- [test-procedures-generate.md](../../skills/pace-test/test-procedures-generate.md) -- Generate +- [devpace-rules.md](../../rules/devpace-rules.md) -- 运行时行为规则 diff --git a/docs/planning/progress.md b/docs/planning/progress.md index e34c479..9d937fe 100644 --- a/docs/planning/progress.md +++ b/docs/planning/progress.md @@ -184,6 +184,7 @@ | T129 | pace-retro forecast 子命令 | M23.1 | OBJ-1, D1 | ✅ 完成 | retro-procedures-forecast.md(交付概率算法+瓶颈识别+风险预警)+ SKILL.md 更新 | | T130 | 安全维度深化 + Compact 恢复优化 | M23.2-M23.3 | OBJ-2, OBJ-3, G5, UX4 | ✅ 完成 | guard-procedures-scan.md 安全深度检查(Layer 1 关键词 + Layer 2 OWASP 6 类)+ pre-compact.sh 结构化恢复上下文 | | T131 | 语义漂移检测增强 | M23.4 | OBJ-3, D2 | ✅ 完成 | dev-procedures-developing.md 语义漂移检测(持续验收对齐)+ review-procedures-gate.md 语义一致性评分(🟢/🟡/🔴) | +| T132 | Agent 驱动行为验证:pace-test L1+ 浏览器验收与 Gate 流程打通 | -- | OBJ-3 | 待做 | 来源:Harness Engineering 调研 P2 #8。已有 L1+ procedures 设计(test-procedures-verify.md:105-140),需打通 Gate 流程:1) init-checks 前端项目 Playwright 建议 2) verify Step 5 证据标记 browser-verified 3) gate.md Gate 2 引用浏览器证据。详见 docs/research/harness-engineering-practices-2026-03-14.md §六 | | T114 | A4:6 个核心 Skill 特性文档 | -- | OBJ-9, OBJ-10 | ✅ 完成 | pace-dev(177 行) + pace-status(221 行) + pace-change(214 行) + pace-review(158 行) + pace-test(260 行) + pace-release(264 行)。共 1294 行。224 pytest + 83 markdownlint + 层隔离 + plugin 加载全通过 | | | **pace-plan UX 优化与功能增强** | | | | | | T115 | P0 组:空树引导 + 智能建议 | -- | OBJ-1, OBJ-8, S15, F3.5 | ✅ 完成 | E1 空功能树引导式规划(Step 3.1 降级分支)+ E2 Plan Proposal 智能建议(Step 3.6 改造为建议+确认模式) | diff --git a/docs/plans/p0-1-init-hook.md b/docs/plans/p0-1-init-hook.md deleted file mode 100644 index 06c160f..0000000 --- a/docs/plans/p0-1-init-hook.md +++ /dev/null @@ -1,160 +0,0 @@ -# P0-1: pace-init 写入守卫 prompt Hook → command Hook - -## 概述 - -将 `skills/pace-init/SKILL.md` 中的 `type: prompt` Hook 替换为 `type: command` Hook,从 LLM 路径判断(~10s/次)降级为 Node.js 程序化路径匹配(~5ms/次)。 - -## 现状分析 - -### 当前 Hook(SKILL.md:7-14) - -```yaml -hooks: - PreToolUse: - - matcher: - tool_name: "Write|Edit" - hooks: - - type: prompt - prompt: "/pace-init 写入范围守卫:仅允许写入 .devpace/ 目录下的文件、项目根目录的 CLAUDE.md、项目根目录的 .gitignore。如果写入目标不在这三个范围内,必须阻止。" - timeout: 10 -``` - -### 问题 - -- pace-init 执行过程中会触发 20+ 次 Write/Edit(创建 state.md、project.md、模板文件等) -- 每次触发 ~10s LLM 评估 → 总开销 ~200s 纯等待 -- 逻辑本质是**纯路径前缀匹配**(3 个允许范围),不需要 LLM 语义理解 - -### 允许写入的范围(精确定义) - -1. `.devpace/` 目录下的任何文件(项目根目录下的 `.devpace/`) -2. 项目根目录的 `CLAUDE.md` -3. 项目根目录的 `.gitignore` - -## 方案设计 - -### 新增文件 - -**`hooks/pace-init-scope-check.mjs`**(~30 行) - -```javascript -#!/usr/bin/env node -/** - * pace-init scope check — fast command Hook replacing LLM prompt Hook - * - * Checks: Is the target file within pace-init's allowed write scope? - * 1. .devpace/ directory (any file) - * 2. Project root CLAUDE.md - * 3. Project root .gitignore - * - * Exit codes: - * 0 = allow (in scope) - * 2 = block (out of scope) - */ - -import { readStdinJson, getProjectDir, extractFilePath } from './lib/utils.mjs'; - -const input = await readStdinJson(); -const projectDir = getProjectDir(); -const filePath = extractFilePath(input); - -if (!filePath) { - process.exit(0); // No file path — not our concern -} - -// Normalize: resolve relative to project dir -const normalizedPath = filePath.startsWith('/') - ? filePath - : `${projectDir}/${filePath}`; - -// Check 1: .devpace/ directory -if (normalizedPath.includes('.devpace/') || normalizedPath.includes('/.devpace/')) { - process.exit(0); -} - -// Check 2: Project root CLAUDE.md -if (normalizedPath.endsWith('/CLAUDE.md') || normalizedPath === 'CLAUDE.md') { - // Ensure it's the project root CLAUDE.md, not a subdirectory one - const expectedPath = `${projectDir}/CLAUDE.md`; - if (normalizedPath === expectedPath || normalizedPath.endsWith('/CLAUDE.md')) { - process.exit(0); - } -} - -// Check 3: Project root .gitignore -if (normalizedPath.endsWith('/.gitignore') || normalizedPath === '.gitignore') { - const expectedPath = `${projectDir}/.gitignore`; - if (normalizedPath === expectedPath || normalizedPath.endsWith('/.gitignore')) { - process.exit(0); - } -} - -// Out of scope — block -console.error( - `devpace:blocked /pace-init 写入范围守卫:仅允许写入 .devpace/、CLAUDE.md、.gitignore。目标文件 ${filePath} 不在允许范围内。` -); -process.exit(2); -``` - -### 修改文件 - -**`skills/pace-init/SKILL.md`** frontmatter hooks 段 - -```yaml -# 替换前(prompt Hook) -hooks: - PreToolUse: - - matcher: - tool_name: "Write|Edit" - hooks: - - type: prompt - prompt: "/pace-init 写入范围守卫:..." - timeout: 10 - -# 替换后(command Hook) -hooks: - PreToolUse: - - matcher: - tool_name: "Write|Edit" - hooks: - - type: command - command: "${CLAUDE_PLUGIN_ROOT}/hooks/pace-init-scope-check.mjs" - timeout: 5 -``` - -## 设计决策 - -| 决策 | 选择 | 理由 | -|------|------|------| -| Hook 类型 | command(替代 prompt) | 逻辑是纯路径匹配,无需 LLM 语义 | -| 脚本语言 | Node.js ESM | 与现有 Hook 基础设施一致(pace-dev-scope-check.mjs) | -| 共享库 | 复用 hooks/lib/utils.mjs | readStdinJson、getProjectDir、extractFilePath 已有 | -| 路径匹配策略 | 字符串 includes/endsWith | 简单可靠,无需正则 | -| CLAUDE.md 范围 | 仅项目根目录 | 避免写入 `.claude/CLAUDE.md` 等非目标文件 | -| 超时 | 5s(从 10s 降低) | command Hook 实际执行 <50ms,5s 是安全余量 | - -## 预期效果 - -| 指标 | 改进前 | 改进后 | 提升 | -|------|--------|--------|------| -| 单次 Hook 延迟 | ~10s(LLM) | ~5ms(Node.js) | **2000x** | -| pace-init 全流程 Hook 总开销 | ~200s(20 次 Write) | ~100ms | **2000x** | -| 判断可靠性 | LLM 可能误判 | 确定性匹配 | 100% 可靠 | - -## 改动文件清单 - -| 文件 | 操作 | 说明 | -|------|------|------| -| `hooks/pace-init-scope-check.mjs` | **新增** | command Hook 实现(~50 行) | -| `skills/pace-init/SKILL.md` | **修改** | frontmatter hooks 段替换 prompt → command | - -## 验证方案 - -1. `node hooks/pace-init-scope-check.mjs` 用模拟 stdin JSON 测试路径匹配 -2. `claude --plugin-dir ./` 加载后执行 `/pace-init --dry-run` 确认 Hook 不阻断正常写入 -3. 手动构造越界写入路径,确认 exit 2 阻断 - -## 风险评估 - -- **风险低**:逻辑极其简单(3 个路径条件),且有现有模板(pace-dev-scope-check.mjs) -- **回退容易**:恢复 SKILL.md frontmatter 为 prompt Hook 即可 diff --git a/docs/plans/p0-2-review-hook.md b/docs/plans/p0-2-review-hook.md deleted file mode 100644 index 004a97d..0000000 --- a/docs/plans/p0-2-review-hook.md +++ /dev/null @@ -1,165 +0,0 @@ -# P0-2: pace-review Gate 3 守卫 prompt Hook 优化 - -## 概述 - -优化 `skills/pace-review/SKILL.md` 的 `type: prompt` Hook。经查证 Hook 执行机制(所有 Hook 并行执行,"最严格者胜"),确认原 prompt Hook 在 approved 场景下是死代码。采用方案 B:替换为 command Hook 做快速路径放行,Gate 3 由全局 Hook 统一管理。 - -**查证结论**(2026-03-09):全局 Hook 与 Skill 级 Hook 并行执行,任一 deny → 最终 deny,无 override 机制。方案 A 不可行,采用方案 B。 - -## 现状分析 - -### 当前 Skill 级 Hook(SKILL.md:6-13) - -```yaml -hooks: - PreToolUse: - - matcher: - tool_name: "Write|Edit" - hooks: - - type: prompt - prompt: "You are a devpace review gate. During /pace-review, only the following writes are allowed: 1) Updating CR status from in_review to approved (ONLY after explicit human approval in the conversation) 2) Updating CR event table with review notes 3) Recording review rejection and returning to developing. BLOCK any write that changes CR status to approved without clear human approval text in the conversation." - timeout: 15 -``` - -### 全局 Hook 对比(hooks/pre-tool-use.mjs:45-57) - -全局 PreToolUse Hook **已有** Gate 3 command 守卫: -- 检测 `in_review` 状态的 CR 文件 -- 若新内容包含 `approved` 状态 → **无条件阻断**(exit 2) -- 没有"人类已批准则放行"的语义 - -### 两层 Hook 的角色差异 - -| Hook | 类型 | 行为 | 增量价值 | -|------|------|------|---------| -| 全局 pre-tool-use.mjs | command | 无条件阻断所有 `approved` 变更 | 防御基线(始终生效) | -| Skill 级 pace-review | prompt | 有条件阻断(无人类批准时阻断) | 允许"有人类批准的 approved"(仅 pace-review 激活时) | - -**关键洞察**:全局 Hook 已经覆盖了 Gate 3 的核心防护(无条件阻断 approved)。Skill 级 prompt Hook 的增量价值是"在 pace-review 期间,允许有人类明确批准的 approved 变更"。但当前实现中,**全局 Hook 先执行**(exit 2 无条件阻断),Skill 级 prompt Hook **可能根本无法生效**。 - -### 问题 - -1. **性能**:每次 Write/Edit ~15s LLM 评估,pace-review 过程中约 5-10 次写入 → ~75-150s 开销 -2. **冗余**:绝大多数写入(更新 review notes、event table)不涉及 approved 状态变更 -3. **全局冲突**:全局 command Hook 可能先行阻断 approved,导致 Skill 级 prompt Hook 永远无法执行有条件放行 - -## 方案设计 - -### 方案 A:分两层(command 快速路径 + prompt 降级)— 推荐 - -**新增文件**:`hooks/pace-review-scope-check.mjs` - -```javascript -#!/usr/bin/env node -/** - * pace-review scope check — fast path for non-approved writes - * - * Quick checks: - * 1. Not a CR file → allow (exit 0) - * 2. CR file but no approved state change → allow (exit 0) - * 3. CR file with approved state change → needs semantic check - * → output advisory and exit 0 (let prompt Hook handle) - * - * Exit codes: - * 0 = allow (fast path) or needs-semantic-check (advisory) - * 2 = block (never — blocking delegated to prompt Hook) - */ - -import { - readStdinJson, getProjectDir, extractFilePath, - extractWriteContent, isCrFile, isStateChangeToApproved -} from './lib/utils.mjs'; - -const input = await readStdinJson(); -const projectDir = getProjectDir(); -const filePath = extractFilePath(input); -const backlogDir = `${projectDir}/.devpace/backlog`; - -// Fast path 1: not a CR file → no Gate 3 concern -if (!isCrFile(filePath, backlogDir)) { - process.exit(0); -} - -// Fast path 2: CR file but no approved state change -const newContent = extractWriteContent(input); -if (!isStateChangeToApproved(newContent)) { - process.exit(0); -} - -// Slow path: approved state change detected → prompt Hook will evaluate -console.log('devpace:review-gate approved 状态变更检测到,需要语义验证人类审批。'); -process.exit(0); -``` - -**修改 SKILL.md frontmatter**: - -```yaml -hooks: - PreToolUse: - - matcher: - tool_name: "Write|Edit" - hooks: - - type: command - command: "${CLAUDE_PLUGIN_ROOT}/hooks/pace-review-scope-check.mjs" - timeout: 5 - - type: prompt - prompt: "You are a devpace review gate. ONLY evaluate if the previous hook output contains 'approved 状态变更检测到'. If it does: check if there is explicit human approval in the conversation. If human approved → allow. If no human approval → block. If the previous hook did NOT output this message, always allow." - timeout: 15 -``` - -**问题**:Skill 级 Hook 的多条 hooks 是串行执行还是并行?需要验证 prompt Hook 能否读取前一个 command Hook 的 stdout 输出。如果不能,此方案需要调整。 - -### 方案 B:简化为纯 command Hook + 移除冗余 — 备选 - -**分析**:全局 `pre-tool-use.mjs` 已无条件阻断所有 `approved` 变更。pace-review 的增量语义("有人类批准则放行")在当前架构下**可能无法生效**(全局 Hook 先执行并阻断)。 - -**如果确认全局 Hook 优先级高于 Skill 级 Hook**: -- Skill 级 prompt Hook 事实上是死代码(永远不会被执行到 approved 场景) -- 可以安全移除,或替换为纯 command Hook 做范围检查(如方案 A 的 fast path 部分) - -**修改 SKILL.md frontmatter**: - -```yaml -hooks: - PreToolUse: - - matcher: - tool_name: "Write|Edit" - hooks: - - type: command - command: "${CLAUDE_PLUGIN_ROOT}/hooks/pace-review-scope-check.mjs" - timeout: 5 -``` - -此方案中 `pace-review-scope-check.mjs` 只做 fast path 放行(非 CR 或非 approved → exit 0),approved 场景由全局 Hook 阻断。 - -## 实施前需要验证的问题 - -1. **Hook 执行顺序**:全局 hooks.json 的 PreToolUse Hook 与 Skill 级 hooks 的执行顺序是什么? - - 如果全局先执行 → 全局阻断 approved → Skill 级 prompt Hook 永远不会遇到 approved 场景 - - 如果 Skill 级先执行 → Skill 级 prompt Hook 可以做"有条件放行" - -2. **Skill 级多 Hook 交互**:同一 matcher 下多条 hooks 是否串行执行?前一条的 stdout 是否对后一条可见? - -3. **全局 Hook 覆盖/豁免**:Skill 级 Hook 能否覆盖全局 Hook 的行为?或者 Skill 级 Hook 能否为特定 Skill 豁免全局 Hook? - -> **建议**:在实施前通过 `claude-code-guide` agent 查证以上 3 个问题,再决定采用方案 A 还是方案 B。 - -## 预期效果 - -| 场景 | 改进前 | 改进后(方案 A/B) | -|------|--------|-------------------| -| 非 CR 文件写入(~80% 操作) | ~15s LLM | ~5ms command | -| CR 文件非 approved 写入(~18%) | ~15s LLM | ~5ms command | -| CR 文件 approved 变更(~2%) | ~15s LLM | ~15s prompt(方案 A)/ ~5ms command + 全局阻断(方案 B) | - -## 改动文件清单 - -| 文件 | 操作 | 说明 | -|------|------|------| -| `hooks/pace-review-scope-check.mjs` | **新增** | command Hook 快速路径(~30 行) | -| `skills/pace-review/SKILL.md` | **修改** | frontmatter hooks 段替换/优化 | - -## 风险评估 - -- **风险中**:需要先验证 Hook 执行顺序(全局 vs Skill 级),否则可能降低 Gate 3 防护强度 -- **回退容易**:恢复 SKILL.md frontmatter 为原始 prompt Hook 即可 diff --git a/docs/plans/p0-3-release-scripts.md b/docs/plans/p0-3-release-scripts.md deleted file mode 100644 index 47a4dc1..0000000 --- a/docs/plans/p0-3-release-scripts.md +++ /dev/null @@ -1,170 +0,0 @@ -# P0-3: pace-release 版本号操作脚本增强 - -## 概述 - -增强 pace-release 的版本推断和 Changelog 生成能力,将确定性计算逻辑从 LLM 推理迁移到专用脚本,提高可靠性。 - -## 现状分析 - -### 已有脚本 - -| 脚本 | 用途 | 当前消费者 | -|------|------|-----------| -| `scripts/bump-version.sh` | devpace 自身的版本号更新(4 文件 + CHANGELOG 占位) | devpace 开发者(手动) | -| `scripts/extract-changelog.py` | 从 git log 提取 CHANGELOG | devpace 开发者(手动) | - -### pace-release 当前实现 - -版本推断(`release-procedures-version.md` + `release-procedures-common.md`): -1. LLM 读取 `integrations/config.md` 获取版本文件路径 -2. LLM 扫描候选 CR 标题/描述,检测 breaking change 信号 -3. LLM 应用推断规则:breaking→major、feature→minor、defect-only→patch -4. LLM 展示推断依据,请用户确认 - -Changelog 生成(`release-procedures-changelog.md`): -- LLM 从 `.devpace/backlog/` CR 文件提取元数据 -- LLM 按 Epic→BR→PF 组织内容 -- LLM 写入 CHANGELOG.md - -### 问题 - -1. **版本推断不可靠**:breaking change 关键词检测、版本号递增计算是确定性逻辑,LLM 可能遗漏或计算错误 -2. **重复造轮子**:`bump-version.sh` 已实现版本号更新但仅服务 devpace 自身,未被 pace-release 使用 -3. **Changelog 效率低**:LLM 逐个读取 CR 文件、提取字段、格式化输出,大量 token 消耗在结构化数据处理上 - -## 方案设计 - -### 新增脚本 1:版本推断 - -**文件**:`scripts/infer-version-bump.mjs` - -```javascript -#!/usr/bin/env node -/** - * Infer semantic version bump from CR metadata. - * - * Usage: node scripts/infer-version-bump.mjs [current-version] - * - * Reads merged CRs not yet in a Release, analyzes for breaking/feature/defect, - * outputs JSON: { current, suggested, bump_type, reasoning[] } - */ -``` - -**核心逻辑**: - -1. 扫描 `.devpace/backlog/CR-*.md`,过滤 `状态: merged` 且未关联 Release 的 CR -2. 对每个候选 CR 提取: - - 标题 - - 描述/意图 - - 类型(feature/defect/hotfix) - - breaking 标记(关键词检测:`BREAKING`、`breaking change`、`不兼容`、`破坏性变更`、`breaking: true`) -3. 推断规则应用: - - 任一 CR 有 breaking → major - - 任一 CR 类型为 feature(无 breaking)→ minor - - 全部为 defect/hotfix → patch -4. 读取当前版本号(从 `integrations/config.md` 指定的版本文件,或命令行参数) -5. 计算新版本号(确定性递增) -6. 输出 JSON: - -```json -{ - "current": "1.2.0", - "suggested": "1.3.0", - "bump_type": "minor", - "reasoning": [ - "CR-012 (feature): 新增用户搜索功能 → minor", - "CR-015 (defect): 修复登录超时 → patch", - "最高级别:minor → 1.2.0 → 1.3.0" - ], - "candidates": [ - { "id": "CR-012", "title": "用户搜索", "type": "feature", "breaking": false }, - { "id": "CR-015", "title": "登录超时修复", "type": "defect", "breaking": false } - ] -} -``` - -**Skill 集成**:pace-release 的 `version` 子命令在 procedures 中调用 `Bash` 工具执行此脚本,消费 JSON 输出做格式化展示和用户确认。LLM 不再手动扫描和计算,只做展示和交互。 - -### 新增脚本 2:CR 元数据提取 - -**文件**:`scripts/extract-cr-metadata.mjs` - -```javascript -#!/usr/bin/env node -/** - * Extract structured metadata from CR markdown files. - * - * Usage: node scripts/extract-cr-metadata.mjs [--status <状态>] [--release ] - * - * Outputs JSON array of CR metadata objects. - */ -``` - -**核心逻辑**: - -1. 扫描 `.devpace/backlog/CR-*.md` -2. 解析每个 CR 文件,提取结构化字段: - - 编号、标题、状态、类型 - - 关联 PF、BR、Epic - - 事件表(创建/完成日期) - - breaking 标记 -3. 支持过滤:`--status merged`、`--release REL-001` -4. 输出 JSON 数组 - -**复用价值**:此脚本不仅服务 pace-release,也可被 pace-retro(指标计算)、pace-status(概览)、pace-pulse(信号采集)复用。是 P1-2(信号采集引擎)和 P2-1(度量计算引擎)的基础组件。 - -### Skill 集成方式 - -pace-release procedures 文件中增加脚本调用指导: - -```markdown -## 版本推断(使用脚本) - -1. 执行版本推断脚本: - ``` - Bash: node ${CLAUDE_PLUGIN_ROOT}/scripts/infer-version-bump.mjs .devpace - ``` -2. 解析 JSON 输出,向用户展示推断结果 -3. 等待用户确认或自定义版本号 -4. 确认后使用 Edit 工具更新版本文件 -``` - -## 设计决策 - -| 决策 | 选择 | 理由 | -|------|------|------| -| 脚本语言 | Node.js ESM | 与 hooks/ 基础设施一致,无额外依赖 | -| 输出格式 | JSON | 结构化数据便于 LLM 消费和格式化 | -| CR 解析 | 正则 + 行匹配 | CR 格式由 `_schema/cr-format.md` 契约保证,结构稳定 | -| 集成方式 | Skill procedures 指导 Bash 调用 | 最小侵入,保持 Skill 的 Markdown 指导模式 | -| 脚本位置 | `scripts/`(产品层) | 随 Plugin 分发,用户项目可用 | - -## 改动文件清单 - -| 文件 | 操作 | 说明 | -|------|------|------| -| `scripts/infer-version-bump.mjs` | **新增** | 版本推断脚本(~120 行) | -| `scripts/extract-cr-metadata.mjs` | **新增** | CR 元数据提取脚本(~150 行),复用基础 | -| `skills/pace-release/release-procedures-version.md` | **修改** | 增加脚本调用指导 | -| `skills/pace-release/release-procedures-common.md` | **无变更** | SSOT 推断规则保留(脚本实现与之对齐) | - -## 预期效果 - -| 指标 | 改进前 | 改进后 | -|------|--------|--------| -| 版本推断可靠性 | LLM 推理(可能遗漏 breaking 信号) | 确定性正则匹配 | -| 版本号计算准确性 | LLM 手动递增(可能算错) | 程序化递增(100% 正确) | -| Token 消耗 | LLM 逐个读取 CR + 推理 | 脚本一次输出 JSON,LLM 仅做展示 | -| 复用性 | 无 | extract-cr-metadata.mjs 可被 3+ Skill 复用 | - -## 风险评估 - -- **风险低**:不改变 pace-release 的用户交互流程,仅将"LLM 手动计算"替换为"脚本计算 + LLM 展示" -- **回退容易**:移除 procedures 中的脚本调用指导即可回退到纯 LLM 模式 -- **注意**:CR 文件格式变更(_schema/cr-format.md 更新)时需同步更新脚本的正则解析 - -## 验证方案 - -1. 创建测试用 CR 文件(feature/defect/breaking 各一),验证推断输出 -2. 在实际 `.devpace/` 项目中执行脚本,与 LLM 推断结果对比 -3. 边界测试:空 backlog、全部 defect、多个 breaking、无版本文件配置 diff --git a/docs/plans/promotion-tracker.md b/docs/plans/promotion-tracker.md deleted file mode 100644 index b60b2f6..0000000 --- a/docs/plans/promotion-tracker.md +++ /dev/null @@ -1,152 +0,0 @@ -# devpace Promotion Tracker - -> Operational tracking for the growth action plan. Updated as items complete. - -## 90-Day Targets - -| Metric | 30 days | 60 days | 90 days | Current | -|--------|---------|---------|---------|---------| -| GitHub Stars | 20-50 | 80-150 | 150-300 | 0 | -| Marketplace installs | 10-30 | 50-100 | 100-200 | N/A | -| Blog posts published | 1 | 2 | 3 | 0 | -| User feedback received | 3-5 | 10+ | 15+ | 0 | - ---- - -## Wave 1: Immediate Actions (Week 1, ~4h) - -### 1.1 Aggregator Platform Registration -- [ ] claudemarketplaces.com: Submit GitHub URL → [submission link] -- [ ] awesome-agent-skills: commit ready (`cb74f49` in `/tmp/awesome-agent-skills`), push blocked by Code Defender → needs personal device push + PR creation -- [ ] awesome-claude-code: PR submitted → [PR link] -- **Reference**: `docs/plans/aggregator-submissions.md` ← ready-to-paste content prepared - -### 1.2 Social Preview Version Update -- [x] `.github/social-preview.svg` updated v1.4.0 → v1.5.0 -- [ ] `.github/social-preview.png` regenerated from SVG - - **Manual step**: Open SVG in browser → screenshot at 1280x640, or: - - `brew install cairo && python3 -c "import cairosvg; cairosvg.svg2png(url='.github/social-preview.svg', write_to='.github/social-preview.png', output_width=1280, output_height=640)"` - -### 1.3 GitHub Repository Optimization -- [ ] Verify About description: "Give your Claude Code projects a steady development pace — requirements change, rhythm stays." -- [ ] Verify 8 GitHub Topics are active: `claude-code`, `plugin`, `devops`, `bizdevops`, `ai-development`, `project-management`, `quality-gates`, `change-management` -- [ ] Enable GitHub Discussions (Settings → Features → Discussions) -- [x] Discussion templates created (`.github/DISCUSSION_TEMPLATE/share-your-experience.yml`, `show-and-tell.yml`) -- [x] Feedback issue template created (`.github/ISSUE_TEMPLATE/feedback.md`) -- [x] Issue config links to Discussions and docs (`.github/ISSUE_TEMPLATE/config.yml`) - ---- - -## Wave 2: Core Preparation (Weeks 2-3, ~8-12h) - -### 2.1 Terminal GIF Demos -- [x] VHS tape scripts prepared (`scripts/record-demos/gif-1|2|3-*.tape`) -- [x] Recording guide written (`scripts/record-demos/README.md`) -- [ ] Prepare clean demo project environment -- [ ] Record GIF-1: /pace-init (15-20s) -- [ ] Record GIF-2: Natural language dev (20-30s) -- [ ] Record GIF-3: Cross-session restore (15-20s) -- [ ] Embed GIFs in README.md hero section - -### 2.2 Marketplace Description -- [x] marketplace.json description optimized (user-pain-point-driven) - -### 2.3 Marketplace Submission -- [x] Confirmed: **无官方提交流程**。Anthropic 官方 Marketplace 由 Anthropic 自行维护,无公开 submission form。分发方式是自建 marketplace repo(`/plugin marketplace add arch-team/devpace-marketplace`),已配置在 README Installation 中 -- [ ] 可选:通过 Anthropic Community Forum 或 GitHub Issues 请求收录到官方 Marketplace -- [ ] Prepare submission materials (README + GIFs) -- [ ] Track review status → [submission link] - -### 🔖 暂停点 (2026-02-26) - -**已完成**: -- Wave 1 awesome-agent-skills: README 编辑 + commit 完成(`cb74f49` @ `/tmp/awesome-agent-skills:add-devpace`),push 被 Code Defender 阻止 -- Wave 2.2: marketplace.json 已优化 -- Wave 2.3: 确认 Marketplace 无官方提交流程,自建 marketplace 已就位 - -**未完成(需手动)**: -- Wave 1: push awesome-agent-skills + 创建 PR(需个人设备或申请 git-defender 白名单) -- Wave 2.1: 安装 VHS(`brew install charmbracelet/tap/vhs`)→ 准备 demo 项目 → 录制 3 个 GIF → 嵌入 README -- Wave 3-5: 全部待启动 - -**恢复指引**:下次继续时读此文件 §暂停点,按 Wave 顺序推进 - ---- - -## Wave 3: Content Launch (Weeks 3-4, ~6-8h) - -### 3.1 Blog Post #1: Cross-Session Context Loss -- [x] Draft written → `docs/plans/blog-cross-session-context.md` (EN + ZH, ~1,400 words each) -- [ ] English version published on: [ ] Dev.to [ ] Medium -- [ ] Chinese version published on: [ ] juejin.cn [ ] sspai.com - -### 3.2 Community Engagement -- [x] Community playbook written → `docs/plans/community-playbook.md` - - Show HN draft ready - - Reddit r/ClaudeAI templates (3 variants) - - Twitter/X content templates (5 tweets + 2 threads) - - Chinese community templates (V2EX post + Jike short posts) - - Engagement rules and weekly time budget -- [ ] Community accounts created -- [ ] First 3 community interactions completed -- Weekly target: 1-2 hours participation - -### 3.3 Blog Post #2: Meta-Narrative -- [x] Draft written → `docs/plans/blog-meta-narrative.md` (EN + ZH, ~2,900 words total) -- [ ] Published on 2+ platforms - ---- - -## Wave 4: Feedback Loop (Weeks 5-8) - -### 4.1 Feedback Collection -- [x] GitHub pinned issue created: [#3 Share your devpace experience](https://github.com/arch-team/devpace/issues/3) (pinned) -- [x] README Feedback section added with link to pinned issue (EN + ZH) -- [ ] 5+ real user feedback collected - -### 4.2 Quick Iteration -- [ ] Feedback categorized (Bug / UX / Feature) -- [ ] Top 3 issues fixed -- [ ] Fix announcements shared publicly - -### 4.3 Secondary Content -- [ ] "v1.6.0: 5 improvements from user feedback" post published - ---- - -## Wave 5: Sustained Growth (Weeks 9-12) - -### 5.1 New Example Project -- [ ] Non-Todo walkthrough (REST API or CLI tool) -- [ ] Optional: Runnable demo repository - -### 5.2 90-Day Review -- [ ] Data collected: Stars / Installs / Issues / Blog traffic -- [ ] Strategy effectiveness evaluated -- [ ] Next phase direction decided - ---- - -## Asset Inventory (2026-02-26) - -| Asset | File | Status | -|-------|------|--------| -| Blog #1 (Cross-session) | `docs/plans/blog-cross-session-context.md` | Draft ready (EN+ZH) | -| Blog #2 (Meta-narrative) | `docs/plans/blog-meta-narrative.md` | Draft ready (EN+ZH) | -| Community playbook | `docs/plans/community-playbook.md` | Complete | -| Aggregator submissions | `docs/plans/aggregator-submissions.md` | Copy-paste ready | -| VHS tape scripts | `scripts/record-demos/*.tape` | 3 scripts ready | -| Discussion templates | `.github/DISCUSSION_TEMPLATE/*.yml` | 2 templates | -| Feedback issue template | `.github/ISSUE_TEMPLATE/feedback.md` | Ready | -| Social preview SVG | `.github/social-preview.svg` | v1.5.0 | -| Marketplace config | `.claude-plugin/marketplace.json` | Optimized | - ---- - -## Key Principles - -1. **"Solve problems" not "promote product"** — Every piece of content starts from user pain -2. **Simplify narrative** — One-liner: "Claude Code helps you code without losing context" -3. **90-day feature freeze** — Shift energy from Phase 19 to user acquisition -4. **GIF > Text** — Visual demos are highest ROI investment -5. **English first** — Global Claude Code users are the target market diff --git a/docs/research/engineering-quality-audit-2026-03-14.md b/docs/research/engineering-quality-audit-2026-03-14.md new file mode 100644 index 0000000..a8ea4fe --- /dev/null +++ b/docs/research/engineering-quality-audit-2026-03-14.md @@ -0,0 +1,427 @@ +# devpace 工程质量深度分析报告 + +**项目**:devpace — Claude Code Plugin for BizDevOps 研发节奏管理 +**版本**:v1.6.2-beta | **分支**:feature/engineering-quality-uplift +**规模**:658 文件(产品层 ~220、开发层 ~172、根/共享 ~15、gitignored workspace ~239) +**分析日期**:2026-03-14 + +--- + +## 1. 评分卡(Executive Summary) + +| 维度 | 评分 | 关键依据 | +|------|------|---------| +| **目录结构** | 4.5/5 | 两层架构清晰,约定优于配置,自动发现模式。瑕疵:根目录孤儿文件 `promt.md` | +| **关注点分离(分层)** | 5/5 | 产品层→开发层零引用违规,CI + 静态测试 + Makefile 三重强制 | +| **依赖管理** | 4/5 | Hooks 零 npm 依赖,inter-skill 耦合极低(仅 3 处)。但 knowledge 反向列消费者增加维护负担 | +| **可维护性** | 3.5/5 | 分拆模式优秀,但 133 procedures + 22 schema + 580 行 rules 构成高认知负荷;同步矩阵复杂 | +| **可复用性** | 3.5/5 | hooks/lib/utils.mjs、conftest.py、validate-all.sh 可提取;但整体高度领域特化 | +| **测试与质量保障** | 4/5 | 四层测试体系(静态/Hook/集成/Eval),CI 矩阵 Python 3.9+3.12。Eval 未入 CI、单 Skill 深度不均 | +| **文档** | 4.5/5 | 38 篇特性文档(中英双语)、权威文件索引、§0 速查卡片、完整 CONTRIBUTING | +| **Plugin 最佳实践对齐** | 4.5/5 | plugin.json 极简、frontmatter 合规、Hook 事件名正确、Agent 定义完整、Skill 级 Hook 运用得当 | + +**综合评分:4.1/5** — 工程成熟度很高的 Claude Code Plugin 项目,分层架构和关注点分离达到标杆水平,可维护性因功能复杂度存在结构性挑战。 + +--- + +## 2. 架构强项 + +### 2.1 教科书级分层架构(产品层 vs 开发层) + +**事实**:`grep -r "docs/\|\.claude/" rules/ skills/ knowledge/` 返回零结果。 + +**三重防御深度**: +- `test_layer_separation.py`(每次 pytest 执行) +- `validate-all.sh` Tier 1.5(shell 层 grep 检查) +- `.github/workflows/validate.yml`(CI 独立 step) + +CLAUDE.md 还明确声明"编辑范围严格分层",禁止同一批次跨层编辑。这种"规范 + 自动检测 + CI 强制"的防御深度在 Claude Code Plugin 生态中罕见。 + +### 2.2 SKILL.md + procedures 分拆模式 + +19 个 Skill 全部一致遵循: +- SKILL.md = "做什么"(frontmatter + 输入/输出/路由表) +- `*-procedures.md` = "怎么做"(详细步骤) +- **执行路由表**实现按子命令/状态懒加载,仅读取必要 procedures,节省 token +- 命名统一:`-procedures-.md` +- 依赖单向:SKILL.md → procedures,procedures **从不**反向引用 SKILL.md + +### 2.3 Hook 工程化 + +- `hooks/lib/utils.mjs`(203 行,12 个纯函数)作为星形拓扑中心,零 npm 依赖 +- `pre-tool-use.mjs` 实现确定性 enforcement(正则匹配,非 LLM 判断):Gate 3 阻断用 exit 2 +- 3 个 Skill 级 scope-check Hook(`hooks/skill/`)与全局 Hook 互补 +- 9 个 Hook 测试用真实 subprocess 执行 + 临时目录 + 模拟 stdin JSON + +### 2.4 Schema 契约中心 + +`knowledge/_schema/` 的 22 个 schema 被 73+ 处引用,作为数据契约层: +- Skills 输出格式有权威参照 +- `test_schema_compliance.py` 验证模板与 schema 一致性 +- `cr-format.md` 是最高耦合实体(~15 消费者),符合 CR 作为核心工作单元的领域模型 + +### 2.5 conftest.py 集中式治理 + +- 集中定义 `SKILL_NAMES`(19)、`SCHEMA_FILES`(22)、`TEMPLATE_FILES`(11)、`CR_STATES`(8) 等清单 +- `test_conftest_sync.py` 自引用检查确保清单与文件系统同步——测试基础设施自身也被测试 +- `parse_frontmatter()` + `headings()` 提供共享 Markdown 解析工具 + +### 2.6 CI/CD 工程化 + +- `validate.yml`:lint / test / hooks 三路并行,Python 3.9+3.12 矩阵 +- `release.yml`:plugin.json / marketplace.json / Git tag 三方版本一致性检查 + 自动打包发布 +- `Makefile` 提供 16 个 target(test, lint, validate, layer-check, plugin-load, eval-*, bump, release-check 等) + +--- + +## 3. 架构关注点(按严重性排序) + +### 3.1 [HIGH] 认知负荷与同步维护瓶颈 + +**问题**:项目复杂度已达需要警惕的临界点。 + +**证据**: +- 产品层:19 Skill + 133 procedures + 22 schema + 580 行 devpace-rules.md ≈ 12,900+ 行 Markdown +- CLAUDE.md "多处出现内容的同步维护"清单有 6 大类,最复杂的(pace-role 角色扩展)需同步 **13 个文件** +- pace-plan 子命令扩展需同步 6 个文件 +- `signal-priority.md` 显式列出 5 个消费方 + +**影响**:新增功能时开发者需手动追踪 10+ 文件同步关系。`test_sync_maintenance.py` 仅覆盖 4 种同步点(命令表、accept 关键词、schema 映射、特性文档子命令),其余依赖人工。 + +**建议**: +1. 扩展 `test_sync_maintenance.py` 覆盖 CLAUDE.md 列出的全部同步点 +2. 考虑生成式同步——从权威源脚本派生下游内容(扩展 `collect-signals.mjs` 模式) +3. 在 CLAUDE.md 中为每个同步清单标注自动化程度(手动/半自动/全自动) + +### 3.2 [HIGH] Eval 体系未入 CI + +**问题**:38 套 trigger/behavioral eval 存在但不在 CI 中执行。 + +**证据**: +- `validate.yml` 无 eval job +- `validate-all.sh` Tier 3 仅检查覆盖率,不执行 eval +- Makefile 有 `eval-trigger`/`eval-behavior`/`eval-stale` 但依赖 `skill-creator` CLI + +**影响**:Skill description 触发精度和行为正确性缺乏持续验证,procedures 修改可能引入静默回归。 + +**建议**: +1. **短期**(P1):CI 中添加 `eval-stale` 检查(仅需 git log 比对,不需要 API key) +2. **中期**:评估 trigger eval 的 CI 可行性(可能需 Anthropic API key) +3. **长期**:建立 eval 基线,PR 时对比检测回归 + +### 3.3 [MEDIUM] knowledge 文件反向引用 + +**问题**:knowledge 层文件反向列出消费方 Skill,技术上不违反分层(同属产品层),但增加维护成本。 + +**证据**: +- `knowledge/signal-priority.md:96-100` — 列出 5 个消费方 Skill +- `knowledge/_schema/cr-format.md:54,65,102,271,425,447` — 引用 pace-dev 和 pace-guard 的具体 procedures +- `knowledge/theory.md:544-557` — 包含 Skill 目录路径 + +**影响**:消费方增减时 knowledge 需同步更新。本质上将"被依赖关系"硬编码到了源文件中。 + +**建议**:将反向引用移入注释或独立索引文件;或在 `test_sync_maintenance.py` 中添加反向引用一致性检查。 + +### 3.4 [MEDIUM] devpace-rules.md Token 预算压力 + +**问题**:580 行 rules 文件接近 600 行警告阈值(`validate-all.sh` Tier 1.7)。 + +**影响**:每次会话全量注入 Claude 上下文,即使大部分场景只涉及 §1-§12 核心规则。功能增长将持续恶化。 + +**建议**: +1. 将 §14(Release/反馈/集成)和 §16(同步管理)拆为独立 rules 文件 +2. 利用 Claude Code 的 rules 多文件自动发现机制:`rules/devpace-rules-release.md` 等 +3. 审查"权威委托"模式——§9 已成功将变更管理详细规则委托给 pace-change procedures + +### 3.5 [MEDIUM] 单 Skill 测试深度不均匀 + +**问题**:pace-init 有 65 个专项测试,其他 18 个 Skill 仅有通用参数化测试覆盖。 + +**证据**: +- `test_pace_init.py`:65 测试函数(模板/schema/初始化/路由表/迁移链) +- 无 `test_pace_dev.py`、`test_pace_change.py` 等专项测试 +- pace-dev(6 procedures + Skill 级 Hook)和 pace-change(11 procedures)复杂度匹配不足 + +**建议**:优先为 pace-dev 和 pace-change 创建专项静态测试,重点验证 procedures 路由完整性和状态转换覆盖。 + +### 3.6 [LOW] 根目录孤儿文件 `promt.md` + +拼写错误的未追踪文件(43KB),应删除或移入 `docs/scratch/`。 + +### 3.7 [LOW] 无 git pre-commit hook + +本地提交不经过任何自动质量检查,依赖 CI 和开发者自觉运行 `make validate`。 + +**建议**:添加轻量 pre-commit hook 运行 `make layer-check && make lint`(秒级完成)。 + +--- + +## 4. Claude Code Plugin 最佳实践对齐 + +### 4.1 plugin.json — 5/5 + +- `name` 正确作为 Skill 命名空间前缀(`devpace:pace-init`) +- `repository` 使用字符串格式(避免对象格式兼容性问题) +- 未显式声明 skills/agents/hooks——依赖约定目录自动发现(最佳实践) +- `outputStyles` 正确使用 `./` 相对路径 + +### 4.2 Skill 组织 — 4.5/5 + +- 全部 19 个 SKILL.md frontmatter 通过 `test_frontmatter.py` 自动验证 +- `description` 以触发条件开头 + `NOT for` 排除模式减少误触发 +- 10/19 使用 `context: fork` + `agent` 字段 +- 3/19 定义 Skill 级 `hooks` frontmatter +- **改进点**:部分 description 超 200 字符,可能影响触发效率 + +### 4.3 Hook 架构 — 5/5 + +- 8 种事件全覆盖,事件名大小写正确 +- `async: true` 用于非阻塞 PostToolUse Hook +- exit 2 阻断 / exit 0 成功的规范运用 +- `${CLAUDE_PLUGIN_ROOT}` 路径引用 +- Skill 级 Hook 位于 `hooks/skill/` 子目录 + +### 4.4 Agent 定义 — 4/5 + +- 3 个 Agent 完整配置(tools/model/color/maxTurns/memory) +- `memory: project` 正确启用跨会话记忆 +- **注意**:pace-review SKILL.md `model: opus` 但路由到 pace-engineer Agent(sonnet),需确认优先级语义 + +### 4.5 自动发现 — 5/5 + +- 无 `commands/` 目录,全部通过 Skills 实现入口 +- Skills / Agents / Rules 均通过约定目录自动发现 +- Output styles 通过 plugin.json 显式声明(唯一需要声明的资产) + +--- + +## 5. 可维护性深度分析 + +### 5.1 认知负荷 + +| 负荷源 | 量级 | 缓解措施 | +|--------|------|---------| +| 价值链层次 | 5 层(Opportunity→Epic→BR→PF→CR) | §0 速查卡片 | +| 状态机 | CR 8 态 + Epic/BR/Opportunity/Release 各 4-5 态 | test_state_machine.py 自动验证 | +| 同步矩阵 | 6 大类,最复杂 13 文件 | test_sync_maintenance.py(部分覆盖) | +| 文档跳转 | devpace-rules → SKILL.md → procedures → schema(4 层) | 路由表模式减少跳转 | +| 规则体量 | 580 行 devpace-rules.md | Token budget 警告 | + +**结论**:认知负荷 HIGH,但已有多项缓解措施。核心瓶颈是同步矩阵的人工依赖。 + +### 5.2 变更爆炸半径 + +| 变更类型 | 爆炸半径 | 自动检测能力 | +|---------|---------|-------------| +| 新增 Skill | 低 | test_plugin_json_sync + test_conftest_sync | +| 修改 cr-format.md | **高**(15+ 消费者) | test_schema_compliance(结构),无语义检测 | +| 修改状态机 | **高** | test_state_machine + hook tests | +| 新增角色 | **高**(13 文件) | test_sync_maintenance(部分) | +| 修改 Hook 逻辑 | 中 | 9 个 Hook 测试 | +| 新增 procedures | 低 | test_cross_references | + +**最高风险**:cr-format.md 语义变更 — 静态测试只验证结构,不验证消费方处理逻辑。 + +--- + +## 6. 可复用性评估 + +### 可直接提取的通用模式 + +| 组件 | 复用价值 | 提取难度 | +|------|---------|---------| +| `hooks/lib/utils.mjs`(203 行纯函数) | 高 — 通用 Hook 工具库 | 低 | +| `test_layer_separation.py` | 高 — 任何 Plugin 分层检查 | 低 | +| `test_frontmatter.py` 参数化验证 | 高 — frontmatter 合规通用需求 | 低 | +| `conftest.py` 清单治理 + 自引用检查 | 高 — 自洽性检查通用模式 | 中 | +| `validate-all.sh` 多 Tier 验证框架 | 中 — 分层验证 + 优雅降级 | 低 | +| release.yml 版本一致性检查 | 中 — Plugin 发布通用需求 | 低 | + +### 难以通用化的 + +- SKILL.md + procedures 分拆**模式**可复用,但**内容**高度领域特化 +- signal-priority / 状态机 / 质量门体系——设计精良但绑定 BizDevOps 域 + +--- + +## 7. 优先级建议(按影响/努力比) + +### P1 — 高影响、低努力 + +| # | 建议 | 具体行动 | +|---|------|---------| +| 1 | CI 添加 eval-stale 检查 | `validate.yml` 添加 job,运行 `make eval-stale`,不需要 API key | +| 2 | 清理 `promt.md` | 删除或移入 `docs/scratch/` | +| 3 | 添加轻量 pre-commit hook | `make layer-check && make lint`(秒级) | + +### P2 — 高影响、中努力 + +| # | 建议 | 具体行动 | +|---|------|---------| +| 4 | devpace-rules.md Token 优化 | 拆 §14/§16 为独立 rules 文件,利用 rules/ 自动发现 | +| 5 | 扩展 test_sync_maintenance.py | 覆盖 CLAUDE.md 全部 6 类同步点 | +| 6 | pace-dev/pace-change 专项测试 | 创建 test_pace_dev.py、test_pace_change.py | + +### P3 — 中影响、高努力 + +| # | 建议 | 具体行动 | +|---|------|---------| +| 7 | knowledge 反向引用治理 | 消费者列表移入独立索引或添加同步检测 | +| 8 | 同步维护自动化 | 扩展脚本从权威源生成派生内容 | +| 9 | 提取通用 Plugin 测试框架 | 独立包:conftest + layer + frontmatter + plugin-json 验证 | + +--- + +## 8. 总结 + +devpace 作为一个 Claude Code Plugin 项目,在以下方面达到了**标杆水平**: +- **分层架构**:产品层/开发层零违规,三重强制执行 +- **Plugin 规范合规**:frontmatter、Hook、Agent、自动发现全部正确运用 +- **测试工程化**:四层测试体系 + CI 矩阵 + 自引用完整性检查 + +主要挑战来自**功能复杂度带来的可维护性压力**: +- 19 个 Skill × 平均 7 个 procedures = 同步矩阵膨胀 +- knowledge 反向引用 + devpace-rules.md 体量增长 + +这不是架构设计问题,而是**成功的副作用**——项目功能覆盖面广导致的结构性复杂度。优先级建议聚焦于通过自动化检测和拆分策略来管控这种复杂度,而非架构重构。 + +--- + +## 9. Plugin 生态对比分析 + +基于对当前工作环境中所有已安装 Plugin 的实际调研,将 devpace 与生态中的代表性项目进行横向对比。 + +### 9.1 对比对象 + +| 项目 | 类型 | 规模 | 代表性 | +|------|------|------|--------| +| **devpace** | 第三方 Plugin | 19 Skills, 133 procedures, 22 schemas | 本次分析主体 | +| **skill-creator** | Anthropic 官方 Plugin | 1 Skill (480 行), agents/, references/ | 官方最复杂 Plugin | +| **everything-claude-code (ECC)** | 社区 Plugin | 65 Skills, 16 agents, 30 rules | 社区最大规模 Plugin | +| **SuperClaude** | 用户级框架(非 Plugin) | 7 modes, 17 agents, 26 commands | Prompt 工程框架参考 | +| **hookify / ralph-loop** | Anthropic 官方 Plugin | 极简(1-4 hooks) | 官方极简参考 | + +### 9.2 架构复杂度频谱 + +``` +简单 ←————————————————————————————————————————→ 复杂 + +ralph-loop hookify skill-creator ECC devpace +(1 hook) (4 hooks) (1 skill,480L) (65S) (19S+133P+22Sch) +``` + +devpace 位于频谱最右端。**但复杂度不等于过度工程**——devpace 的复杂度来自领域需求(BizDevOps 完整价值链),而非技术堆砌。 + +### 9.3 维度对比矩阵 + +| 维度 | devpace | ECC | skill-creator (官方) | 官方简单 Plugin | +|------|---------|-----|---------------------|----------------| +| **Skill 组织** | SKILL.md + procedures 分拆(路由表懒加载) | 单文件 SKILL.md(无分拆) | 单文件 480 行 | 单文件,极简 | +| **Skill 数量** | 19 | 65 | 1 | 0-3 | +| **Procedures 模式** | 133 个独立 procedures 文件 | 无 | references/ 目录 | 无 | +| **分层架构** | 产品层/开发层严格分离,三重强制 | 无分层概念 | 无分层概念 | 无 | +| **Agent 定义** | 3 个,用 memory/maxTurns/color | 16 个,标准字段 | 内置 agents/ | 0-1 个 | +| **Agent memory** | `memory: project`(跨会话持久) | 无 | 无 | 无 | +| **Hook 事件覆盖** | 8 种事件,17 脚本 | 6 种事件,15+ 脚本 | 无 | 0-4 种 | +| **Skill 级 Hook** | 3 个(frontmatter 内嵌) | 无 | 无 | 无 | +| **Hook 分发模式** | 直接调用 + skill/ 子目录 | 中央 flag 分发器 | 无 | 直接调用 | +| **测试体系** | pytest + node:test + eval + CI | node.js 自定义 + CI | 无测试 | 无测试 | +| **Schema 契约** | 22 个 `_schema/*.md` 文件 | 无 | 无 | 无 | +| **Knowledge 层** | theory/metrics/signals/teaching (8 文件) | 无独立知识层 | references/ (2 文件) | 无 | +| **Rules 文件** | 1 个 580 行(接近上限) | 30 个按语言分类 | 无 | 0-1 个 | +| **Output Styles** | 1 个(plugin.json 声明) | 无 | 无 | 独立发布 | +| **文档** | 38 篇特性文档(中英双语)+ 用户指南 | README 仅 | README 仅 | README 仅 | +| **CI/CD** | validate.yml + release.yml | 有 CI | 无 | 无 | +| **plugin.json** | 极简 + outputStyles | 极简 | 极简 | 极简 | +| **非标字段** | 无(全部合规) | `origin: ECC`(非标) | 无 | 无 | + +### 9.4 devpace 的独创架构模式 + +以下模式在整个生态中**仅 devpace 使用**: + +#### 1. Procedures 分拆 + 路由表懒加载 + +``` +# devpace 独创模式 +SKILL.md(~80 行) + └─ 路由表 → 按子命令/状态加载对应 procedures + ├─ dev-procedures-intent.md(created 状态) + ├─ dev-procedures-developing.md(developing 状态) + └─ dev-procedures-common.md(所有状态共享) +``` + +**对比**: +- skill-creator 将 480 行全部放在 SKILL.md(超出官方建议的 500 行上限) +- ECC 每个 Skill 是独立的单文件(65 个 SKILL.md,各自 50-200 行) +- 官方简单 Plugin 无此需求 + +**评价**:devpace 的模式是唯一能同时满足"SKILL.md < 500 行"和"复杂子命令逻辑"的方案。官方文档推荐"超过 ~50 行拆出 procedures",devpace 是最忠实的实践者。 + +#### 2. 产品层/开发层分离 + 自动检测 + +**对比**:整个生态中无任何其他 Plugin 有分层概念。ECC 将 tests/ 和源码混在一起。官方 Plugin 无测试文件。 + +**评价**:这使 devpace 成为唯一可以"只分发产品层"的 Plugin。其他 Plugin 要么全部分发,要么无法干净分离。 + +#### 3. Schema 契约驱动 + +**对比**:无任何其他 Plugin 有数据格式契约层。ECC 和官方 Plugin 的输出格式隐含在 Skill 内容中。 + +**评价**:Schema 层解决了"19 个 Skill 共享数据格式"的一致性问题。对于单 Skill Plugin 不需要,但对多 Skill Plugin 是必要基础设施。 + +#### 4. Agent `memory: project` 跨会话记忆 + +**对比**:无任何其他 Plugin 使用 Agent memory 功能。ECC 通过 homunculus 系统实现类似功能(但不通过官方 memory API)。 + +**评价**:devpace 是官方 `memory` 字段最早的深度使用者。三个 Agent(pace-engineer/pm/analyst)各自维护项目级记忆。 + +#### 5. Skill 级 Hooks(frontmatter 内嵌) + +**对比**:无任何其他 Plugin 使用此功能。这是官方支持但极少被采用的高级特性。 + +**评价**:devpace 的 3 个 scope-check Hook(pace-dev/init/review)展示了"全局 Hook 做通用检查,Skill Hook 做精细控制"的互补模式。 + +### 9.5 从生态中可借鉴的模式 + +| 来源 | 模式 | 适用性 | 建议 | +|------|------|--------|------| +| **ECC** | Rules 按领域分文件(`rules/typescript/`, `rules/python/`) | 高 | devpace-rules.md 580 行问题的参考解法:按章节拆为 `rules/devpace-core.md` + `rules/devpace-release.md` + `rules/devpace-sync.md` | +| **ECC** | 中央 flag 分发器(`run-with-flags.js`) | 低 | devpace 的直接调用模式更简洁,无需引入额外抽象 | +| **ECC** | Strictness 级别(minimal/standard/strict) | 中 | 可考虑为 devpace hooks 增加可配置的严格度级别 | +| **skill-creator** | `references/` 目录放补充材料 | 已采用 | devpace 的 `knowledge/` 层是此模式的升级版 | +| **SuperClaude** | Flag 系统(`--think`, `--ultrathink`) | 低 | devpace 已有自主模式系统(explore/advance),领域不同 | +| **官方 Plugin** | `color` 字段标识 Agent | 已采用 | devpace 3 个 Agent 均使用 color(blue/green/yellow) | + +### 9.6 devpace 在生态中的定位 + +``` + 架构成熟度 + ^ + | + 5 | * devpace + | + 4 | * ECC + | + 3 | * skill-creator + | + 2 | * hookify + | + 1 |* ralph-loop + +————————————————————————————→ 功能复杂度 + 1 2 3 4 5 +``` + +**结论**:devpace 在 Claude Code Plugin 生态中处于**架构成熟度最高**的位置。它不是最大的 Plugin(ECC 有 65 个 Skill),但在工程治理(分层、测试、CI、Schema 契约、Hook 工程化)方面远超所有对比对象,包括 Anthropic 官方 Plugin。 + +### 9.7 对比分析总结 + +**devpace 做对了什么(生态视角)**: +1. **严格遵循官方规范**:全部 frontmatter 字段合规,无非标字段(ECC 用了 `origin`,SuperClaude 用了 `category`) +2. **率先采用高级特性**:Skill 级 Hooks、Agent memory、context:fork+agent 委托——这些官方支持但生态中几乎无人使用 +3. **procedures 分拆是正确选择**:在 19 Skill 规模下,单文件模式(如 ECC)会导致大量 SKILL.md 超过 500 行 +4. **测试是差异化优势**:官方 Plugin 无测试,ECC 有基础测试,devpace 有四层测试体系 + +**devpace 需要注意什么(生态视角)**: +1. **Rules 文件体量**:ECC 的按领域分文件策略值得借鉴,devpace 应将 580 行 rules 拆分 +2. **复杂度是孤例**:生态中无参考先例可借鉴。devpace 的架构决策(分层、Schema、procedures)是自创的,意味着维护和演进只能自己摸索 +3. **生态工具支持有限**:skill-creator 的 eval 框架是唯一的外部质量工具,devpace 的 Schema 验证、同步检测等都是自建的 diff --git a/docs/research/harness-engineering-practices-2026-03-14.md b/docs/research/harness-engineering-practices-2026-03-14.md new file mode 100644 index 0000000..9f2ab2a --- /dev/null +++ b/docs/research/harness-engineering-practices-2026-03-14.md @@ -0,0 +1,301 @@ +# Harness Engineering 调研与 devpace 对标分析 + +> **调研日期**:2026-03-14 | **调研方法**:Tavily Deep Research(OpenAI 原文 + Martin Fowler 分析 + 社区解读 20+ 源) | **置信度**:0.90 + +--- + +## 一、什么是 Harness Engineering + +### 1.1 一句话定义 + +**Harness Engineering 是设计"让 AI Agent 可靠工作的环境"的工程学科**——工程师从写代码转向设计环境、指定意图、构建反馈循环。 + +> "Humans steer. Agents execute." —— OpenAI, 2026-02-11 + +### 1.2 术语起源 + +| 时间 | 事件 | 关键人物 | +|------|------|---------| +| 2025-11 | Anthropic 将 Claude Agent SDK 描述为 "a powerful, general-purpose agent harness" | Anthropic | +| 2026-01 | Aakash Gupta 宣称 "2025 was agents. 2026 is agent harnesses",Phil Schmid 提出 agent harness 定义年 | 社区 | +| **2026-02-04** | **Mitchell Hashimoto**(HashiCorp 联合创始人、Terraform 作者)发表博客,**首次命名 "Engineer the Harness"** | Mitchell Hashimoto | +| **2026-02-11** | **OpenAI 发表 "[Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)"** | Ryan Lopopolo (OpenAI) | +| 2026-02-17 | Martin Fowler(Thoughtworks)发表分析文章,将 harness 归纳为三类组件 | Birgitta Boeckeler | +| 2026-02-18 | Ethan Mollick 将其 AI 指南框架重组为 "Models, Apps, and Harnesses",术语迅速普及 | Ethan Mollick | + +**隐喻来源**:"Harness" 原义是马具(缰绳、鞍具、嚼子)——用于将马匹的力量引导到正确方向、防止失控、实现稳定长途运作。 + +### 1.3 三层嵌套关系 + +``` +┌─────────────────────────────────────────────────────────┐ +│ Harness Engineering │ +│ Agent/工作流的系统级设计与控制 │ +│ │ +│ ┌───────────────────────────────────────────────────┐ │ +│ │ Context Engineering │ │ +│ │ 设计和管理输入给 LLM 的所有上下文 │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────┐ │ │ +│ │ │ Prompt Engineering │ │ │ +│ │ │ 优化人类给 LLM 的指令文本 │ │ │ +│ │ └─────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ + Tool 定义、RAG、消息历史、输出 Schema、 │ │ +│ │ Memory、MCP 数据 ... │ │ +│ └───────────────────────────────────────────────────┘ │ +│ │ +│ + 架构约束(linter、结构测试、依赖规则) │ +│ + 反馈循环(CI/CD、可观测性集成) │ +│ + 工作流控制(任务拆分、并行、权限) │ +│ + 改进循环(熵管理、文档保鲜) │ +└─────────────────────────────────────────────────────────┘ +``` + +> **关键认知**:Prompt Engineering < Context Engineering < Harness Engineering。Harness Engineering 关注的是 Context 之外的一切——架构约束、反馈循环、熵管理。 + +--- + +## 二、OpenAI 实验:百万行代码、零手写 + +### 2.1 实验概况 + +| 指标 | 数据 | +|------|------| +| 起始时间 | 2025 年 8 月底(空 Git 仓库) | +| 持续时间 | ~5 个月 | +| 代码规模 | 100 万+ 行(应用逻辑、测试、CI 配置、文档、可观测性、内部工具) | +| 手写代码 | **零行**——所有代码由 Codex Agent 生成 | +| 时间效率 | 约为手写代码的 **1/10** | +| 吞吐量 | 平均每位工程师每天 3.5 个 PR,且随团队增长而提升 | +| 产品状态 | 内部日活用户 + 外部 Alpha 测试者,正常部署/运行/修复 | + +### 2.2 工程师角色的重新定义 + +传统工程师:**写代码** → Harness Engineering 下的工程师: + +1. **设计环境**(Design environments)——构建 Agent 可靠工作的基础设施 +2. **指定意图**(Specify intent)——用声明式方式定义"做什么",而非"怎么做" +3. **提供结构化反馈**(Provide structured feedback)——优先优化 prompt、翻译用户反馈为验收标准、验证结果 + +> "When the agent struggles, we treat it as a signal: identify what is missing — tools, guardrails, documentation — and feed it back into the repository." + +**核心转变**:调试 AI 的思路从"模型为什么这么蠢?"变为"缺少什么上下文或约束?" + +--- + +## 三、Harness Engineering 三大支柱 + +### 3.1 Context Engineering — 让 Agent 看到正确的信息 + +**核心**:仓库知识成为记录系统(Repository knowledge as the system of record)。 + +**OpenAI 的实践**: +- **AGENTS.md**:不是一个巨大的指令文件,而是指向更深层真相源(设计文档、架构图、执行计划、质量评级)的入口——全部版本控制 +- **动态上下文**:Agent 可访问可观测性数据(日志、指标、trace)和浏览器导航(Chrome DevTools MCP) +- **Agent 可理解性(Agent Legibility)**:目标不是"人类可读",而是"Agent 可以量化地掌握系统行为" + +**关键理念**: +> "Agent legibility is the goal" —— 代码和文档的组织方式应以"Agent 能否高效理解"为标准,而非仅仅"人类能否阅读"。 + +### 3.2 Architectural Constraints — 确定性规则乘以 Agent 速度 + +**核心**:用确定性机制(非 LLM 判断)强制执行架构规范。 + +**OpenAI 的实践**: +- **Custom Linter + 结构测试**:静态强制结构化日志、命名规范、Schema/类型命名、文件大小限制、平台特定可靠性要求 +- **自定义错误消息**:Linter 的报错信息专门设计为可注入 Agent 上下文的修复指令(remediation instructions) +- **"Golden Principles"**:机械化、有主张的规则编码入仓库—— + - 优先用共享 utility 包而非手写 helper(集中化不变量) + - 在边界校验数据,而非探测式地猜测数据结构 + - 使用团队的 OpenTelemetry-instrumented 并发工具,而非行为不透明的第三方库 +- **"Taste Invariants"**:品味级别的约束也被编码为可执行规则 + +> "In a human-first workflow, these rules might feel pedantic or constraining. With agents, they become multipliers: once encoded, they apply everywhere at once." + +**关键理念**:在人类主导的工作流中,严格规则感觉迂腐;在 Agent 主导的工作流中,**规则就是速度的放大器**。 + +### 3.3 Garbage Collection — 对抗 Agent 驱动开发的熵增 + +**核心**:Agent 会复制仓库中已存在的模式——包括次优模式,导致渐进漂移。 + +**OpenAI 的实践**: +- **问题发现**:初期团队每周五花 20% 时间手动清理 "AI slop"——无法规模化 +- **解决方案**:一组后台 Codex 任务按固定节奏运行—— + - 扫描架构偏差 + - 更新质量评级 + - 打开定向重构 PR(大多数可在一分钟内审核并自动合并) + - 检查文档陈旧度 +- **文档保鲜**:Agent 为 Agent 维护文档——"documentation _for_ agents, _by_ agents" + +> 这类似于给代码仓库加了"垃圾回收机制":小问题随时清理,技术债不会堆积。 + +--- + +## 四、关键工程洞察 + +### 4.1 吞吐量改变 Merge 哲学 + +当 Agent 吞吐量远超人类注意力时,传统工程规范变得适得其反: + +| 传统做法 | Harness Engineering 做法 | 逻辑 | +|---------|------------------------|------| +| 严格 Merge Gate 阻断 | 最小化阻断性 Merge Gate | Agent 吞吐量 >> 人类注意力,等待比修正更昂贵 | +| 长生命周期 PR | 短生命周期 PR | 修正成本低,快速迭代 | +| 测试 Flake 阻断构建 | Follow-up run 修复 | 匹配实际成本结构 | +| 人类审查每一行代码 | 异步反馈 + Agent 自行响应 | 人类时间是唯一稀缺资源 | + +> "In a system where agent throughput far exceeds human attention, corrections are cheap, and waiting is expensive." + +### 4.2 自治度的递进提升 + +OpenAI 仓库最终达到的自治度——Agent 收到单个 prompt 后可以: + +1. 验证代码库当前状态 +2. 复现报告的 Bug +3. 录制演示失败的视频 +4. 实现修复 +5. 通过驱动应用来验证修复 +6. 录制演示修复的视频 +7. 打开 Pull Request +8. 响应 Agent 和人类的反馈 +9. 检测并修复构建失败 +10. **仅在需要判断时升级给人类** +11. 合并变更 + +> "This behavior depends heavily on the specific structure and tooling of this repository and should not be assumed to generalize without similar investment—at least, not yet." + +### 4.3 Martin Fowler 的批判性观察 + +Birgitta Boeckeler(Thoughtworks Distinguished Engineer)在 Martin Fowler 网站上指出: + +- **功能和行为验证的缺失**:OpenAI 文章中所有措施都聚焦于"长期内部质量和可维护性",但缺少对功能正确性和行为验证的讨论 +- **Harness 可能成为新的服务模板**:未来团队可能从一套标准 Harness 模板中选择来启动新应用,类似于今天的 Golden Path / Service Template +- **架构可能向"易于 Harness"的方向演化**:代码库结构和拓扑可能默认采用更容易被 AI 维护的模式 + +--- + +## 五、devpace 对标分析——devpace 就是一个 Harness + +### 5.1 结构映射 + +| Harness Engineering 支柱 | devpace 对应实现 | 成熟度 | +|-------------------------|-----------------|--------| +| **Context Engineering** | `knowledge/` 知识库、`_schema/` 格式契约、`rules/` 行为规则(= AGENTS.md 等价物)、SKILL.md description(按需加载路由)、六层信息架构 | ★★★★★ | +| **Architectural Constraints** | 5 条 Iron Rules、Hook 确定性阻断(`pre-tool-use.mjs` regex)、`validate-schema.mjs` 结构测试、custom linter(`validate-all.sh` 10 项检查)、CI 矩阵 | ★★★★☆ | +| **Garbage Collection** | `pace-pulse` 健康检查、`pre-compact.sh` compaction 前快照、`pace-learn` 经验萃取、confidence score 衰减机制 | ★★★☆☆ | + +### 5.2 核心理念对标 + +| Harness Engineering 理念 | devpace 对应 | 对齐度 | +|-------------------------|-------------|--------| +| "Humans steer. Agents execute." | Gate 3 Iron Rule(永远需人类审批)+ AI 自治度三级(Assist/Standard/Autonomous) | **高度对齐** | +| Repository knowledge as system of record | `.devpace/` 目录 = 项目状态的单一真相源 | **高度对齐** | +| Agent legibility > human readability | CSO description 设计(为 Claude 路由优化,非人类阅读)、六层按需加载 | **高度对齐** | +| Custom linter error → agent remediation | Hook 阻断消息设计为 Claude 可直接理解并修复的指令 | **高度对齐** | +| Throughput changes merge philosophy | devpace 的 CR 状态机允许快速推进 + Gate 自愈循环(失败→修复→重试) | **部分对齐** | +| Entropy and garbage collection | pace-pulse 做健康检查,但缺少**自动化的定期重构扫描** | **差距存在** | +| "When agent struggles, fix the environment" | Signal 系统(agent 困难 = 缺少上下文/约束的信号) | **理念对齐,机制可强化** | + +### 5.3 devpace 相对 OpenAI 的差异化优势 + +| 维度 | OpenAI Harness | devpace | devpace 优势 | +|------|---------------|---------|-------------| +| **覆盖范围** | 代码→部署(最后一公里) | 业务意图→代码(第一公里) | devpace 管理 WHY,OpenAI 管理 HOW | +| **价值链追溯** | 无——不追溯业务意图 | Opportunity→Epic→BR→PF→CR 完整追溯 | 确保"做正确的事"而非仅"正确地做" | +| **变更管理** | 未提及需求变更 | 四种变更场景 + 影响分析 + triage | 需求变更有序管理 | +| **度量体系** | 未提及 | DORA proxy + 质量门通过率 + pace-retro | 可量化的研发节奏 | +| **信任度管理** | 无 | confidence score 衰减机制(验证 +0.1,质疑 -0.2) | 比 OpenAI 更精细的模式信任管理 | +| **对抗性审查** | 未提及 | adversarial review("至少找一个问题") | 主动质疑而非被动接受 | + +### 5.4 devpace 的短板(对照 Harness Engineering) + +| 差距 | 描述 | 建议 | +|------|------|------| +| **Garbage Collection 自动化不足** | 缺少定期运行的"清理 Agent"扫描架构偏差、文档陈旧度 | pace-pulse 增强为定期 GC 任务 | +| **Agent 自治度阶梯不够具象** | OpenAI 列出了 11 步自治度阶梯,devpace 的三级(Assist/Standard/Autonomous)较粗 | 参考 OpenAI 阶梯细化每级的具体能力边界 | +| **Merge 哲学未适配** | devpace 的 Gate 流程仍偏向"阻断式",未考虑高吞吐场景下"修正比等待便宜" | 评估 Gate 1 裁剪(小变更跳过部分检查) | +| **功能行为验证** | 与 OpenAI 同样的缺失——缺少 Agent 驱动的端到端功能验证(录制视频、驱动应用) | 长期方向:pace-test 集成 Playwright 做 Agent 驱动的行为验证 | + +--- + +## 六、可落地的行动建议 + +### P0(高价值 + 低成本,立即可做) + +| # | 建议 | 来源启发 | 涉及文件 | 复杂度 | +|---|------|---------|---------|--------| +| 1 | **将 devpace 显式定位为 "Claude Code Harness"** | Harness Engineering 术语对齐 | `docs/design/vision.md`、README | S | +| 2 | **Hook 错误消息优化为 Agent 修复指令** | OpenAI custom linter remediation | `hooks/` 各脚本 | S | +| 3 | **Gate 1 智能裁剪**——小变更跳过部分检查 | OpenAI merge philosophy | `skills/pace-dev/dev-procedures-gate.md` | M | + +### P1(高价值 + 中等成本) + +| # | 建议 | 来源启发 | 涉及文件 | 复杂度 | +|---|------|---------|---------|--------| +| 4 | **pace-pulse GC 模式**——定期扫描文档陈旧度、架构偏差、Schema 不一致 | OpenAI garbage collection agents | `skills/pace-pulse/` | M | +| 5 | **自治度阶梯细化**——从三级扩展为具体能力清单 | OpenAI 11-step autonomy | `rules/devpace-rules.md`、`knowledge/theory.md` | M | +| 6 | **"Agent 困难 = 环境缺陷"信号机制**——当 Claude 在 Skill 执行中反复失败时,自动记录为"harness 改进信号" | OpenAI "treat struggle as signal" | `skills/pace-learn/` | M | + +### P2(架构性变更,需设计论证) + +| # | 建议 | 来源启发 | 涉及文件 | 复杂度 | +|---|------|---------|---------|--------| +| 7 | **Harness Template 化**——将 devpace 的 rules + schema + hooks 抽象为可复用的"项目 Harness 模板" | Martin Fowler "harnesses as future service templates" | 新增 `knowledge/_harness-templates/` | L | +| 8 | **Agent 驱动行为验证**——pace-test 集成 Playwright 做端到端功能验证 | OpenAI Codex + Chrome DevTools | `skills/pace-test/` | XL | + +--- + +## 七、理念层面的启发 + +### 7.1 "Intelligence 是商品,Harness 是资产" + +> "Intelligence is a commodity now. Making AI useful requires you to inject your constraints and context." —— Chris Lettieri, The Augmented Weekly + +这是 Harness Engineering 最深层的洞察:**模型能力是标准化的商品,围绕模型构建的环境(harness)才是差异化资产**。devpace 的 19 个 Skill + 六层信息架构 + Iron Rules + Hook 系统,本质上就是一个精心设计的 harness。 + +### 7.2 "调试 AI = 调试环境" + +> "Instead of asking 'why is the model dumb?' you ask 'what context or constraint is missing?'" + +这个思维转变对 devpace 开发有直接指导意义:当 Claude 在某个 Skill 中表现不佳时,问题几乎总是在 SKILL.md、procedures、Schema 或 rules 中——而非模型本身。 + +### 7.3 "规则在 Agent 世界是乘数,不是约束" + +> "In a human-first workflow, these rules might feel pedantic. With agents, they become multipliers." + +这解释了为什么 devpace 的 Iron Rules、Schema 契约、Hook 阻断在实践中如此有效——它们不是"限制 Claude",而是"让 Claude 的输出质量成倍提升"。每增加一条编码化的规则,就是在所有未来的 Agent 执行中同时应用它。 + +### 7.4 "人类时间是唯一稀缺资源" + +> "The horse is fast. The harness is everything." + +OpenAI 实验的终极结论:Agent 吞吐量已经不是瓶颈,人类注意力才是。所有 harness 设计的目标都是**最大化每单位人类注意力的产出**。devpace 的零摩擦入门(P1)、副产物非前置(P3)、渐进暴露(P2)设计原则,本质上都在服务这个目标。 + +--- + +## 八、不适用于 devpace 的 Harness Engineering 实践 + +| OpenAI 实践 | 不适用原因 | +|-------------|-----------| +| Codex Box(云端开发服务器 + 并行 Agent 运行) | devpace 运行在用户本地 Claude Code CLI 中,无云端基础设施 | +| Chrome DevTools MCP 集成做 Agent QA | devpace 管理研发节奏而非运行应用,无 UI 可驱动 | +| 数百 billions tokens/周的消耗规模 | devpace 面向个人/小团队,token 预算为量级更小 | +| "零阻断 Merge Gate" 策略 | devpace 场景中 Gate 3 人类审批是安全底线,不可取消 | +| 数十个并行 Agent 同时工作 | 当前 Claude Code 单 session 单 agent 模型 | + +--- + +## 九、参考源 + +| 来源 | URL | 日期 | +|------|-----|------| +| OpenAI 原文 | https://openai.com/index/harness-engineering/ | 2026-02-11 | +| Martin Fowler 分析 | https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html | 2026-02-17 | +| Mitchell Hashimoto 博客 | https://mitchellh.com/writing/my-ai-adoption-journey#step-5-engineer-the-harness | 2026-02 | +| SmartScope 概念梳理 | https://smartscope.blog/en/blog/harness-engineering-overview/ | 2026-02 | +| NxCode 完整指南 | https://www.nxcode.io/resources/news/harness-engineering-complete-guide-ai-agent-codex-2026 | 2026-03 | +| InfoQ 技术报道 | https://www.infoq.com/news/2026/02/openai-harness-engineering-codex/ | 2026-02 | +| Latent Space 辩论 | https://www.latent.space/p/ainews-is-harness-engineering-real | 2026-03 | +| The Augmented Weekly | https://bitsofchris.com/p/harness-engineering-why-context-beats | 2026-03 | +| GTCode 深度分析 | https://gtcode.com/articles/harness-engineering/ | 2026-02 | diff --git a/docs/research/trigger-eval-postmortem-2026-03-15.md b/docs/research/trigger-eval-postmortem-2026-03-15.md new file mode 100644 index 0000000..b850c89 --- /dev/null +++ b/docs/research/trigger-eval-postmortem-2026-03-15.md @@ -0,0 +1,157 @@ +# Trigger Eval 自动化踩坑复盘 + +> 日期:2026-03-15 | 耗时:~4 小时 | 状态:基础设施完成,正面检测受上游限制 + +## 1. 背景 + +`make eval-trigger-one S=pace-dev` 能运行但 18/18 正面用例全部失败(trigger_rate=0)。最初假设的根因是 skill-creator 的 `parse_skill_md()` 解析空 `name:` 字段导致临时 command 名畸形。 + +目标:修复基础设施 + 建立 eval→fix→regress→CI 自动化体系。 + +## 2. 发现的根因链(7 层) + +调查过程中逐层剥开了 **7 个叠加问题**,而非原计划假设的 1 个: + +| # | 问题 | 发现方式 | 修复 | +|---|------|---------|------| +| 1 | SKILL.md 无 `name:` 字段 → 文件名 `-skill-` | 读 `utils.py:parse_skill_md()` 源码 | shim 注入 `name:` | +| 2 | `eval-runner.sh` 做 `cd "$sc_root"` → `find_project_root()` 解析到 `~/.claude` 而非 devpace | 运行 `find_project_root()` 打印结果 | PYTHONPATH 替代 cd | +| 3 | 全局 plugin 的 skill 与测试 command 竞争 | 观察 `claude -p` init 输出的 skills 列表 | `claude plugin disable --all` | +| 4 | MCP 服务器初始化 ~36s > 默认 timeout 30s | `time claude -p "say hi"` 测量 | timeout 增至 90s | +| 5 | `-p` 模式默认权限阻断工具调用 | 观察 `claude -p` 挂起行为 | `--dangerously-skip-permissions` | +| 6 | 用户 `~/.claude/CLAUDE.md` 中 SuperClaude 框架强制 brainstorming 路由 | debug 脚本捕获 Skill 调用内容为 `superpowers:brainstorming` | 临时 mv CLAUDE.md | +| 7 | `run_eval.py:141` 对首个非 Skill tool_use 立即 `return False`;内置工具不受 `--allowedTools` 控制 | debug 脚本捕获完全隔离环境下 Claude 仍先调 ToolSearch/Glob/Bash | **上游限制,无法绕过** | + +## 3. 关键调试时刻 + +### 3.1 Gate 测试的误导 + +原计划有一个 Step 0 Gate:"用 bash 快速验证 name 注入假设,3 分钟完成"。实际执行后发现注入 name 后仍然 0/2 正面通过。**如果严格执行 gate 的 "全 0 → 停下重新分析" 规则**,反而成为了深入调查的转折点。 + +教训:**Gate 机制有效,但要为 "gate 失败后的调查" 预留时间预算。** + +### 3.2 嵌套 `claude -p` 的不可靠性 + +从 Claude Code 会话内运行 `claude -p` 存在严重的管道缓冲问题: +- Bash 工具管道到 Python 解析器 → 持续超时 +- 直接文件重定向 → 也超时 +- ProcessPoolExecutor(run_eval.py 方式)→ 反而正常工作 + +原因:Claude Code 的 Bash 工具对 stdout 的处理与原生 shell 不同。`run_eval.py` 用 `subprocess.Popen` + `select.select` + `os.read` 做非阻塞读取,绕过了缓冲问题。 + +教训:**调试 `claude -p` 行为时,用 Python 脚本(Popen+select)而非 Bash 管道。** + +### 3.3 `--allowedTools` 的边界 + +`--allowedTools "Skill"` 只控制用户定义工具(MCP tools 等),**不控制 Claude Code 内置工具**(ToolSearch、Glob、Bash、Read、Edit 等)。这是 CLI 的设计决策而非 bug——内置工具是 Claude Code 运行的基础。 + +发现方式:在完全隔离环境(无 CLAUDE.md、无插件)中用 debug 脚本观察到 Claude 仍然调用了 ToolSearch、Glob、Bash。 + +教训:**不要假设 CLI 标志的行为——先用最小实验验证。** + +### 3.4 真正的阻塞点在上游 + +`run_eval.py` 第 133-141 行的检测逻辑: + +```python +if cb.get("type") == "tool_use": + tool_name = cb.get("name", "") + if tool_name in ("Skill", "Read"): + pending_tool_name = tool_name # 开始追踪 + else: + return False # ← 首个非 Skill/Read 工具 → 立即放弃 +``` + +这个 early-exit 设计假设 Claude 的**第一个工具调用**就是 Skill。但在实际环境中,Claude 总是先调用 ToolSearch(发现延迟加载的工具)或 Glob/Bash(探索项目),然后才考虑 Skill 路由。 + +正确的检测应该:扫描所有 turn 的所有 content block,找到任何包含目标 command 名的 Skill 调用即判定为触发。 + +## 4. 最终交付物 + +### 4.1 shim 基础设施(完整可用) + +``` +eval/ +├── shim.py # 核心适配层(trigger/loop/regress/baseline/smoke) +├── eval-runner.sh # 路由入口(从 dev-scripts 迁移) +└── apply.py # description diff/apply +``` + +Makefile 新增 18 个 target:trigger-one/trigger/smoke/deep/fix/fix-diff/fix-apply/regress/baseline-save/diff/save-all 等。 + +### 4.2 实际验证结果 + +| 指标 | 结果 | +|------|------| +| 负面用例(should_trigger=false) | **17/17 通过 (100%)** | +| 正面用例(should_trigger=true) | 0/18 通过 (0%) — 上游限制 | +| 基础设施稳定性 | 35 条查询全部完成运行,不再超时/挂起 | +| 结果持久化 | latest.json + history/ 正确生成 | +| 环境恢复 | 插件和 CLAUDE.md 在 finally 中正确恢复 | + +## 5. 经验教训 + +### 5.1 递归调试的成本 + +原计划预估 Phase 1(核心修复)约 1-2 小时。实际花了 ~4 小时,因为每修一层都揭露下一层: + +``` +name 注入 → 还是 0 + → 调查 project_root → 还是 0 + → 调查 plugin 竞争 → 还是 0 + → 调查 timeout → 还是 0 + → 调查权限阻断 → 还是 0 + → 调查 CLAUDE.md → 还是 0 + → 最终定位到上游检测逻辑 → 无法绕过 +``` + +教训:**对涉及多进程+外部 CLI+模型行为的集成问题,调试时间应按 3-5x 预估。** + +### 5.2 "修了一个但没好" 的陷阱 + +每一层修复在逻辑上都是正确的(name 确实需要注入、project_root 确实指错了地方)。但因为是叠加问题,单独修任何一个都看不到效果。这容易导致: +- 怀疑自己的修复是否正确(实际上是正确的) +- 反复尝试同一层面的变体(浪费时间) +- 过早放弃正确方向 + +教训:**当修复逻辑确认正确但效果未显现时,应假设存在更深层阻塞,而非反复调整当前层。** + +### 5.3 可观测性是第一优先级 + +突破性进展来自两个观测手段: +1. `time claude -p "say hi"` — 一行命令揭示 36s 初始化时间 +2. `_debug_single_query.py` — 用 Popen+select 捕获 stream 事件 + +如果一开始就写好 debug 工具而不是反复用 Bash 管道试错,可以节省至少 1 小时。 + +教训:**对黑盒系统(`claude -p`),先投资 10 分钟建观测工具,再开始修复。** + +### 5.4 环境隔离的完整性 + +需要隔离的不只是 "明显的依赖"(plugins),还包括: +- 用户级配置(`~/.claude/CLAUDE.md`) +- 框架级指令(SuperClaude 的 brainstorming 强制指令) +- CLI 内置行为(ToolSearch 不受 allowedTools 控制) + +教训:**自动化测试的环境隔离必须是 "全部白名单" 而非 "逐个黑名单"。** + +### 5.5 最终根因:`claude -p` 不触发 skill 自动路由 + +在尝试了 7 层环境修复 + 自建检测逻辑 + 3 种 project 隔离策略后,最终确认: + +**`claude -p` 是单轮无状态 API,不复现交互模式的 skill 自动触发管线。** + +在交互式 Claude Code 会话中,`using-superpowers` 等系统 skill 在会话开始时被激活,其 "MUST invoke skills before ANY response" 指令持续整个会话,驱动 Claude 对每个用户消息检查 skill 匹配。而 `-p` 模式跳过了这个启动流程——每个查询是独立的,没有持久化的 skill 路由上下文。 + +这意味着:**基于 `claude -p` 的 trigger eval 是不可行的设计**,无论检测逻辑多么完善。skill-creator 的 run_eval.py 能工作的前提是一个极简环境(无 plugin、无 system prompt),而在生产级 Claude Code 环境中这个前提不成立。 + +教训:**在投入 N 小时修复 "怎么做" 之前,先花 30 分钟验证 "能不能做"。** + +## 6. 后续行动 + +| 优先级 | 行动 | 状态 | +|--------|------|------| +| P0 | 向 skill-creator 提 issue:`claude -p` 不触发 skill 自动路由,trigger eval 需要交互模式 API | 待提交 | +| P1 | 探索替代方案:Claude Code SDK 的 Agent API 是否支持 skill 触发检测 | 待调研 | +| P2 | 行为 eval(evals.json)不依赖触发检测,优先完善这条路径 | 可立即推进 | +| P3 | Phase 6 CI 集成(等 P0/P1 有结论后) | 阻塞中 | diff --git a/docs/scratch/prompt-notes.md b/docs/scratch/prompt-notes.md new file mode 100644 index 0000000..49c8734 --- /dev/null +++ b/docs/scratch/prompt-notes.md @@ -0,0 +1,357 @@ +对/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/skills/pace-init/SKILL.md 这个skill的token效率进行评估 + +/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/tests/scenarios 是怎么来的,适合将这些以场景测试为种子喂给 skill-creator吗 + + + + 核心原则:所有 18 个 Skill 都应有完整的三层评估用例(trigger + behavioral + full cycle),优化点在于何时执行什么级别,而非为谁准备什么级别。 + + + + + 设计要点 + + - 跨 Skill 交叉污染:每个 trigger eval 的 10 条负面测试包含兄弟 Skill 的典型查询(如 pace-dev 负面测试包含 pace-change 的"加需求") + - 共享断言模式:引用 SA-01~SA-06 避免重复定义(state_updated、cr_lifecycle、natural_language、p2_progressive、git_committed、schema_compliant) + - 场景覆盖:每个 Skill 覆盖正常流程、边界条件(空参数、未初始化项目)、错误恢复(rejection、interrupted review) + + 下一步 + + 用 /skill-creator 逐 Skill 运行评估(Phase A.3-A.4),优化 description 和行为。剩余 12 Skill 按相同模式扩展(Phase A.5)。 + + + /skill-creator 对/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/skills/pace-init/SKILL.md + 这个skill的token效率进行评估 + + /skill-creator + 为这个skill:/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/skills/pace-init/SKILL.md,添加更为丰富的评估 + + +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +❯ /skill-creator + 为这个skill:/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/skills/pace-init/SKILL.md,建立全面深入的测试与评估 + + devpace作为bizdevops全生命周期各种活动的管理器,从业务、需求、迭代、开发、测试、发布等,目前的最新的实现中devpace都有对应的skills覆盖,作为一个bizdevops专家,你从devpace提供的skills是否完整、流畅地提供给最终用户使用,以便这些用户可以使用devpace完成基于AI的bizdevops全生命周期活动进行审查。 + 审查可以从宏观到微观、从整体到局部,从粗粒度到细粒度。最终提供提供改进方案,让devpace成为世界上最优秀、最受访问的bizdevops全生命周期活动的AI工具 + + + devpace作为bizdevops全生命周期各种活动的管理器,需要分析devpace在 + 业务需求(BR - Business Requirement)和 产品需求/产品功能(PF - Product Feature),如下截图所示的 + 业务规划、客户诉求、市场洞察、产品和技术规划,战略/策略、愿景、业务目标、产品目标。devpace是否建立了这个阶段清晰的转换和信息关联。 + 请你在充分理解当前devpace实现现状的基础上为建立优秀的方案,你如果不清晰我要表达的意图请跟我确认 + + devpace项目还需要做什么优化和功能增强,才能有效的建立起 + +CodeArts +ALM + + + 截图展示了两个子域(业务规划 + 产品规划)之间的转换链路。我逐一标注 devpace 的覆盖情况: + + 业务规划子域 产品规划子域 + + 输入源 ──────────────────→ 业务机会 ────────→ 专题 ──────→ 业务需求 ──────→ 产品特性 + ┌───────────┐ ❌ ❌ ⚠️ (空壳) ✅(较好) + │ 业务规划 │ ❌ 无建模 无漏斗 无实体 仅一行标题 有完整字段 + │ 客户诉求 │ ❌ 无建模 无来源追溯 无分组能力 无来源/优先级 用户故事/验收标准 + │ 市场洞察 │ ❌ 无建模 无转化路径 无成功标准关联 无状态 有溢出模式 + │ 产品技术规划│ ❌ 无建模 + └───────────┘ + + + + + 组织层: + 业务线 ❌ → 战略/策略 ❌ → 业务目标 ✅(OBJ) ←对齐→ 产品目标 ❌ ← 愿景 ⚠️ (一行) ← 产品线 ❌ + + 管理层: + 投资组合 ❌ 成功标准 ✅(MoS,仅关联OBJ) 需求工作流 ⚠️ (CR有,BR→PF无) 版本规划 ✅ + + ┌──────────┬──────────────────────────────┐ + │ 阶段 │ 能力 │ + ├──────────┼──────────────────────────────┤ + │ 需求管理 │ 需求采集、追踪、变更控制 │ + ├──────────┼──────────────────────────────┤ + │ 项目规划 │ 迭代规划、资源分配、里程碑 │ + ├──────────┼──────────────────────────────┤ + │ 开发管理 │ 代码管理、分支策略、构建集成 │ + ├──────────┼──────────────────────────────┤ + │ 测试管理 │ 测试用例、执行、缺陷追踪 │ + ├──────────┼──────────────────────────────┤ + │ 发布管理 │ 部署流水线、版本控制、审批流 │ + ├──────────┼──────────────────────────────┤ + │ 运维反馈 │ 监控、事件管理、反馈闭环 │ + └──────────┴──────────────────────────────┘ + + ┌─────────────────┬────────────────────────────┐ + │ 传统 ALM 概念 │ devpace 对应 │ + ├─────────────────┼────────────────────────────┤ + │ Epic / Feature │ BR(Business Requirement) │ + ├─────────────────┼────────────────────────────┤ + │ User Story │ PF(Product Feature) │ + ├─────────────────┼────────────────────────────┤ + │ Task / Sub-task │ CR(Change Request) │ + ├─────────────────┼────────────────────────────┤ + │ 需求→代码追溯 │ BR→PF→CR 价值链 │ + ├─────────────────┼────────────────────────────┤ + │ 质量门禁 │ Gate 1/2/3 │ + ├─────────────────┼────────────────────────────┤ + │ 状态工作流 │ CR 状态机 │ + └─────────────────┴────────────────────────────┘ + + + + +如下的问题,导致devpace这个项目无法兑现服务bizdevops全生命周期活动的承诺。我认为应该覆盖从 BR 向上几乎是空白的部分。 +问题说明: + 关键断裂点 + 截图的价值流: + 业务规划/客户诉求/市场洞察 → 业务机会 → 专题 → 业务需求 → 产品特性 → 版本规划 + + devpace 的实现: + [完全空白] ─────────────────→ BR(一行标题) → PF(较完善) → CR(非常完善) + ↑ + 断裂点:BR 是"空壳" + +另外截图中的概念并不是全部必须,比如:业务线,产品线, 投资组合是否不用考虑,对于其他的概念模型,你需要分析或者设计最为合适的实体关系图,这些实体关系图是设计skill覆盖价值流的关键 + ┌───────────────────┬────────────────┬──────────────────────────────────────────────────┐ + │ 截图中的概念 │ devpace 覆盖度 │ 现状说明 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 业务线 │ 无 │ 不存在此概念 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 战略/策略 │ 无 │ 不存在,被 OBJ 隐含 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 业务目标 │ 较好 │ 已建模为 OBJ,有 MoS checkbox │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 日常需求 │ 无 │ 不存在从日常需求到 BR 的转化路径 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 业务机会 │ 无 │ 不存在"业务机会→专题→BR"的漏斗 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 专题(Epic) │ 无 │ 不存在独立概念,BR 勉强对应但缺乏结构 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 投资组合 │ 无 │ 不存在 BR/专题间的优先级排序和资源分配 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 成功标准 │ 较好 │ MoS 覆盖,但仅关联 OBJ,不关联专题 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 产品线 │ 无 │ 不存在此概念 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 愿景 │ 弱 │ project.md 一句话 blockquote,无结构化字段 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 产品目标 │ 无 │ 不存在独立概念,通过 OBJ→BR→PF 间接关联 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 业务需求/产品特性 │ 极弱→较好 │ BR 仅一行标题(极弱);PF 有完整字段体系(较好) │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 需求工作流 │ 部分 │ PF→CR 有状态机,但 BR→PF 无显式工作流 │ + ├───────────────────┼────────────────┼──────────────────────────────────────────────────┤ + │ 版本规划 │ 较好 │ Release 生命周期完善 │ + └───────────────────┴────────────────┴──────────────────────────────────────────────────┘ + + + + + + + + +devpace作为bizdevops全生命周期各种活动的管理器,从业务、需求、迭代、开发、测试、发布等,目前的最新的实现中devpace都有对应的skills覆盖,作为一个bizdevops专家,在充分理解devpace项目的最新状态和产品愿意用户价值基础上,你从devpace提供的skills是否完整、流畅地提供给最终用户使用,以便这些用户可以使用devpace完成基于AI的bizdevops全生命周期活动进行审查。审查可以从宏观到微观、从整体到局部,从粗粒度到细粒度。最终提供提供改进方案,让devpace成为世界上最优秀、最受访问的bizdevops全生命周期活动的AI工具 + +基于这个要求,给出符合要求的重构方案。 +你对我意图和要求如果有不明确必须跟我确认,不要自己猜测 + + +第一部分:现状评估 + + 1.1 BizDevOps 三域覆盖矩阵 + + ┌───────────┬────────────────────┬────────────────────────────────┬─────────────────────────────────┬──────┐ + │ 域 │ 阶段 │ 对应 Skill │ 覆盖状态 │ 评分 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Biz │ 业务机会识别 │ pace-biz opportunity │ Schema 已定义,Skill 待实现 │ 3/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Biz │ 专题规划 │ pace-biz epic │ Schema 已定义,Skill 待实现 │ 3/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Biz │ 需求分解 │ pace-biz decompose │ Schema 已定义,Skill 待实现 │ 3/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Biz │ 战略对齐 │ pace-biz align │ Schema 已定义,Skill 待实现 │ 3/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Biz │ 需求发现/导入/推断 │ pace-biz discover/import/infer │ Schema 已定义,Skill 待实现 │ 3/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Product │ 迭代规划 │ pace-plan │ 核心完成,adjust/health 待完善 │ 7/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Product │ 变更管理 │ pace-change(9 子命令) │ 完整 │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Product │ 角色视角 │ pace-role(5 角色) │ 完整 │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Dev │ 项目初始化 │ pace-init │ 完整(含 monorepo/迁移/模板) │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Dev │ CR 推进 │ pace-dev │ 完整(状态机 7+1 态,Gate 1-3) │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Dev │ 代码审查 │ pace-review(opus) │ 完整(对抗审查+业务追溯) │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Dev │ 测试策略 │ pace-test(10 子命令) │ 完整(策略+覆盖+AI 验收) │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Dev │ 风险管理 │ pace-guard(5 子命令) │ 完整(预检+监控+趋势) │ 8/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Ops │ 发布管理 │ pace-release(14 子命令) │ 完整(含 rollback/branch/tag) │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Ops │ 外部同步 │ pace-sync(GitHub MVP) │ 部分(Linear/Jira 待扩展) │ 6/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Ops │ 反馈收集 │ pace-feedback │ 完整(defect/hotfix CR 闭环) │ 8/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Observe │ 状态查询 │ pace-status(9 子命令) │ 完整(L1-L3 分级) │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Observe │ 导航推荐 │ pace-next(16 信号组) │ 完整 │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Observe │ 节奏脉搏 │ pace-pulse(自动触发) │ 完整 │ 8/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Observe │ 回顾度量 │ pace-retro(DORA 代理指标) │ 完整 │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Observe │ 决策轨迹 │ pace-trace(三层渐进透明) │ 完整 │ 9/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Knowledge │ 经验积累 │ pace-learn(纠正即学习) │ 完整 │ 8/10 │ + ├───────────┼────────────────────┼────────────────────────────────┼─────────────────────────────────┼──────┤ + │ Knowledge │ 理论参考 │ pace-theory(15 子命令) │ 完整 │ 9/10 │ + └───────────┴────────────────────┴────────────────────────────────┴─────────────────────────────────┴──────┘ + + ┌──────────────┬────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ + │ 域 │ 评分 │ 说明 │ + ├──────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ + │ Biz 域 │ 6/10 │ Schema 完整且设计优秀(OPP/EPIC/BR),pace-biz Skill 已定义 8 子命令 + 8 procedures,但 S35-S42 验收标准全部未打勾(验证待完成)。与上次审查(3/10)大幅提升 │ + ├──────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ + │ Dev 域 │ 9/10 │ 完整:pace-init/dev/test/guard/review 全部实现并验证 │ + ├──────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ + │ Ops 域 │ 7.5/10 │ pace-release 完整,pace-sync GitHub MVP 已完成(Phase 19/20 待扩展) │ + ├──────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ + │ Observe 域 │ 9/10 │ pace-status/next/pulse/retro/trace 全部完整 │ + ├──────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ + │ Knowledge 域 │ 8.5/10 │ pace-learn/theory 完整,devpace-cadence 可视化平台已设计 + + + + bizdevops-review-v2.md │ + + +一块探讨一下,pace-biz 这个 + + + Part IV: 竞争力差异化分析 + + 4.1 devpace 独特定位 + + ┌────────────────┬────────────────────────────────────────┬─────────────────────────────────────────┬─────────────────────────────┐ + │ 维度 │ 传统工具 (Jira/Linear/GitHub Projects) │ 通用 AI 编码 (Cursor/Copilot Workspace) │ devpace │ + ├────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────┤ + │ 业务到代码追溯 │ 手动关联 │ 无 │ 自动(OBJ->BR->PF->CR) │ + ├────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────┤ + │ 变更管理 │ 手动调整看板 │ 无 │ 一等公民,影响分析+有序调整 │ + ├────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────┤ + │ 质量门 │ CI/CD 外挂 │ 无 │ 内嵌,4 级 Gate 自动执行 │ + ├────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────┤ + │ 跨会话连续性 │ N/A │ 有限 │ 完整(state.md 锚点) │ + ├────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────┤ + │ 度量与回顾 │ 需额外配置 │ 无 │ 内建 DORA + 迭代回顾 │ + ├────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────┤ + │ 自然语言交互 │ 有限(AI 辅助搜索) │ 代码层面 │ 全流程自然语言驱动 │ + ├────────────────┼────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────┤ + │ AI 决策透明度 │ N/A │ 无 │ 完整审计(pace-trace) │ + └────────────────┴────────────────────────────────────────┴─────────────────────────────────────────┴─────────────────────────────┘ + + + ┌──────────┬──────────────────────────────────────────────┬──────────────────────────────────┐ + │ 方向 │ 措施 │ 目的 │ + ├──────────┼──────────────────────────────────────────────┼──────────────────────────────────┤ + │ 智能编排 │ 引入"旅程模板",自动编排多 Skill 流程 │ 从"工具集"升级为"智能工作流平台" │ + ├──────────┼──────────────────────────────────────────────┼──────────────────────────────────┤ + │ 预测能力 │ 强化 pace-retro forecast + pace-guard trends │ 从"被动管理"升级为"主动预测" │ + ├──────────┼──────────────────────────────────────────────┼──────────────────────────────────┤ + │ 生态集成 │ 完成 Phase 19-20 外部同步 + CI/CD 深度集成 │ 从"独立工具"升级为"研发中枢" │ + ├──────────┼──────────────────────────────────────────────┼──────────────────────────────────┤ + │ 团队适配 │ 多用户状态隔离 + 角色权限 + 共享仪表盘 │ 从"个人工具"升级为"团队平台" │ + ├──────────┼──────────────────────────────────────────────┼──────────────────────────────────┤ + │ 可视化 │ Phase 24 devpace-cadence Next.js 仪表盘 │ 从"CLI 工具"升级为"可视化平台" │ + └──────────┴──────────────────────────────────────────────┴──────────────────────────────────┘ + + + opportunity + +在devpace项目中opportunity的定义是什么?应用场景是什么?它跟Epic、BR、PF之间的关联关系是什么?为什么同时存在这两个概念? + +将愿景、业务目标作为独立的概念,最终用户使用devpace时,devpace可以帮使用devpace开发的项目,按照devpace提供的元模型逐步清晰、丰富被开发产品的愿景、业务目标等等 + +Epic、BR、PF之间最为合适的关联关系是?OBJ定义为[business | product | tech] 对应的obj-format.md需要完善吗?obj-format.md和vision-format.md的定义是否完善 + + +效指标(MoS)跟 + + +devpace中的skill 没有做到自包含 + + +为特定skill制定sub agent +为特定sub agent指定skills + +user-->(skills) + +Claude code 触发skill时可以指定使用subagent吗? +subagent可以指定特定的skills + +Claude code slash command 可以指定使用skill 和subagent吗? + +执行某个bizdevops的活动如(/pace-biz discover)时,我希望它在执行时使用特定的skill和subagent 我应该如何设计与实现呢? + +如何在hook中触发某个特定的任务如代码审查(这个代码审查的动作需要用subagent来执行,并使用特定的skills) + +设置context: fork时,Claude Code默认的上下文如Claude.md和rules下的上下文是否会加载到这个sub agent中 + +这个devapace目的是将bizdevops中的活动采用Claude code的组件和机制抽象为驱动Claude code代理bizdevops中的某些活动的组件。 +对应这项特定的活动,我希望它在执行的时候使用自定义的skill,自定义的agent +比如:opportunity、epic、BR、PF这些组件是devpace中抽象出来的业务实体,这些业务实体有对应的Schema定义(如opp-format.md、epic-format.md、br-format.md、pf-format.md),这些Schema定义了这些业务实体的字段和结构,这些业务实体在devpace中由对应的Skill来管理(如pace-biz),这些Skill通过slash command来触发(如/pace-biz opportunity),当用户触发这个slash command时,Claude code会调用对应的Skill来执行这个命令,这个Skill会根据命令的参数和上下文来创建或更新对应的业务实体,并将结果返回给用户。 + +skill触发subagent,subagent使用某个指定的skill执行任务,Claude Code 支持这种方式吗,如何支持,查官方文档确认不要自己猜测 + +Claude Code的plugin 可以包含 skill,slash command,mcp,hooks,subagent,那触发plugin的执行时,这些组件时怎么协作起来,不要猜测,不确认的查询claude code官方文档 + +devpace项目中的skills 关联的hook是否符合最小自治原则 + + + + 不在本次范围(需更大重构): + - 合并 4 个 PostToolUse hook 为单一 dispatcher(架构变更) + - 添加 CR_STATES 常量(跨 7+ 文件含 shell) + - skills/ 目录的重复工具函数(不属于 hooks/) + - shell 脚本重写为 .mjs + + +你是资深的业务分析专家,善于对软件研发中的业务分析、需求分析与设计指定通用性的标准。 +当前的devpace项目中的pace-biz skill及其子命令,是对软件研发中的业务分析、需求分析与设计的抽象设计。 + +请你从业务分析专家的角度,评估pace-biz这个skill的设计与实现,并且给出改进建议。评估可以从以下几个维度进行: +1. 业务覆盖度:pace-biz是否覆盖了软件研发中业务分析、需求分析与设计的核心活动和流程?是否有明显的缺失或冗余? +2. 角色适配度:pace-biz是否适配了软件研发中不同角色(如产品经理、业务分析师、架构师等)的需求和使用习惯?是否提供了针对不同角色的定制化功能或视角? +3. 流程连贯度:pace-biz的子命令和功能是否按照合理的业务流程进行组织和衔接?是否支持从业务机会识别到需求分解再到战略对齐的完整链路? +4. 用户体验:pace-biz的交互设计是否符合用户习惯?是否提供了清晰的输入输出、反馈机制和错误处理?是否支持自然语言交互和智能推荐? +5. 可扩展性:pace-biz的设计是否考虑了未来业务需求的变化和扩展?是否提供了灵活的配置和定制能力? +请你基于以上维度,给出对pace-biz这个skill的评估,并且提出具体的改进建议,以提升它在软件研发中的业务分析、需求分析与设计方面的价值和竞争力。 + + /claude-md-improver 分析一下当前项目中有哪些属于Claude + code的上下文规范,整体分析一下这些上下文规范,在不影响其作用的情况下,从文件信息组织、依赖关系,可读性,可维护性,token效率提升上有什么优化方案 + + + + /claude-md-management:claude-md-improver /Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/ + llm-platform-solution/claude-code-forge/devpace/.claude/rules/info-architecture.md对这个文件的信息组织、依赖关系,可读性,可维护性,token效率上有什么优化方案 + + + /claude-md-improver 分析一下当前项目中的Claude code的上下文规范文件,只需要分析:/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/.claude这个目录下的,这些上下文规范,在不影响其作用的情况下,从文件信息组织、依赖关系,可读性,可维护性,token效率提升上有什么优化方案 + + +/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/.claude/rules/info-architecture.md + +将这个文件中的内容分为三部分 +1、common-ia.md : 11 原则一览 │ 原则名 + 一句话 │ ✅ 完全通用 和 约束分级标记 │ iron rule / required / recommended │ ✅ 完全通用 +2、project-structure.md : 六层架构 devpace 专有映射 +3、project-ia-detail.md : 结合1和2在当前devpace项目的规则说明 +并且要求1和2互相不存在依赖,3可以依赖1和2;但1和2不能依赖3 + + 实际含义:以后开发 devpace 时,如果 Claude 在某个 Skill 中表现不好——不要调 prompt、不要换模型、不要怀疑 Claude 能力。先问:SKILL.md 缺了什么?Schema + 是不是不够明确?procedures 是不是有歧义? \ No newline at end of file diff --git a/promt.md b/docs/scratch/promt.md similarity index 97% rename from promt.md rename to docs/scratch/promt.md index 6b8df1a..9f7a841 100644 --- a/promt.md +++ b/docs/scratch/promt.md @@ -311,15 +311,9 @@ skill触发subagent,subagent使用某个指定的skill执行任务,Claude Co Claude Code的plugin 可以包含 skill,slash command,mcp,hooks,subagent,那触发plugin的执行时,这些组件时怎么协作起来,不要猜测,不确认的查询claude code官方文档 +devpace项目中的skills关联的hook都放在到了/Users/jinhuasu/Project_Workspace/Anker-Projects/ml-platform-research/llm-platform-solution/claude-code-forge/devpace/hooks,而不是属于某个skill的hook放到这个hook所在的目录。分析说明这个选择是否需要优化,为什么,收益什么 -仓库未公开 + 未提交 Marketplace 审核。版本号仍为 1.6.2-beta。再次核实一下当前项目真的没有实现吗 - │ 20.1 │ /pace-guard -> /pace-risk │ 未实现 │ skills/pace-guard/ 仍存在(63 文件 178 处引用) │ :向我解释一下当前项目/pace-guard的作用为我再决定是否要改名 - │ 22.1 │ 用户反馈收集机制 │ 基础就绪 │ .github/ISSUE_TEMPLATE/ 有 bug_report/feature_request/feedback 三个模板 │ 这个在项目中如何起作用的? - - │ 24.1 │ 审计日志导出 │ 未实现 │ 无 JSON/CSV 导出功能 │ 这里的审计日志导出具体是什么的审计日志呢 - - - AI 过程不透明、质量不一致、无法汇报 ROI - - Sub Agent 支持 memory 字段机制是什么有什么应用场景,我当前的devpace项目中的sub agent根据他们的各自的应用场景,是否应该启用这个字段为什么 \ No newline at end of file +───────────────────────────────────────────────────────────────────────────────────────────── +分析一下当前devpace项目作为一个Claude Code的plugin项目,结合Claude Code的plugin项目的最佳实践来分析和软件工程质量的原则 +深度分析一下这个项目的工程质量(可维护性、可复用性等等),特别是devpace项目的目录结构组织,职责分离,依赖关系的合理性等 \ No newline at end of file diff --git a/eval/README.md b/eval/README.md new file mode 100644 index 0000000..d58999f --- /dev/null +++ b/eval/README.md @@ -0,0 +1,314 @@ +# devpace Eval Toolkit + +Skill 触发评估、description 自动优化、回归检测的完整工具链。 + +## 架构 + +``` +eval/ +├── __init__.py # 包入口 (v0.2.0) +├── __main__.py # python3 -m eval 入口 +├── cli.py # 统一 CLI:trigger / loop / regress / baseline / changed +├── skill_io.py # SKILL.md frontmatter 读写 +├── results.py # 评估结果持久化 + 元数据 +├── trigger.py # Agent SDK 触发检测(核心引擎) +├── improve.py # Anthropic API description 生成 +├── loop.py # 优化循环 + train/test 分割 +├── regress.py # 多维回归检测 + 变更发现 +├── baseline.py # 基线管理 +├── apply.py # description diff/apply(独立脚本) +├── _gate_sdk.py # 快速门禁检查(独立脚本) +├── eval-runner.sh # 行为 eval 路由(skill-creator) +└── README.md # 本文件 +``` + +## 安装 + +```bash +# 在 devpace 根目录 +make setup +# 等价于: pip install -r requirements-dev.txt +``` + +依赖: + +| 包 | 用途 | 条件 | +|----|------|------| +| `claude-agent-sdk` | 触发检测 | Python >= 3.10 | +| `anthropic` | description 优化 | Python >= 3.10,需 API Key | +| `pytest` | 单元测试 | 开发环境 | + +## 快速开始 + +```bash +# 查看所有命令 +python3 -m eval --help + +# 对 pace-dev 运行触发评估 +make eval-trigger-one S=pace-dev + +# 查看评估覆盖率 +make eval-coverage +``` + +## 命令参考 + +### 触发评估 + +检测 Claude 是否在给定查询下正确触发目标 Skill。 + +```bash +# 单 Skill(默认 runs=3, timeout=90s, max_turns=5) +make eval-trigger-one S=pace-dev + +# 自定义参数 +make eval-trigger-one S=pace-dev RUNS=5 TIMEOUT=120 MAX_TURNS=8 + +# 指定模型 +make eval-trigger-one S=pace-dev MODEL=claude-sonnet-4-20250514 + +# 冒烟测试(runs=1, 取 5 条关键查询,适合快速验证) +make eval-trigger-smoke + +# 深度测试(runs=5, 全量查询,适合发布前验证) +make eval-trigger-deep + +# 全量(所有有 trigger-evals.json 的 Skill) +make eval-trigger + +# 仅变更的 Skill(基于 git diff,适合 PR 验证) +make eval-trigger-changed +``` + +**直接调用 CLI:** + +```bash +python3 -m eval trigger --skill pace-dev --runs 3 --timeout 90 --max-turns 5 +python3 -m eval trigger --skill pace-dev --smoke --smoke-n 5 +``` + +**输出**:结果保存在 `tests/evaluation//results/latest.json`。 + +### Description 优化 + +自动生成更优的 SKILL.md description 以提升触发准确率。 + +**前提**:设置 `ANTHROPIC_API_KEY` 环境变量。 + +```bash +export ANTHROPIC_API_KEY=sk-ant-... + +# 运行优化循环(默认 5 轮迭代) +make eval-fix S=pace-dev MODEL=claude-sonnet-4-20250514 + +# 指定迭代次数 +make eval-fix S=pace-dev MODEL=claude-sonnet-4-20250514 N=10 + +# 查看优化结果 vs 当前 description +make eval-fix-diff S=pace-dev + +# 应用最优 description 到 SKILL.md +make eval-fix-apply S=pace-dev +``` + +**工作原理**: +1. 将 eval 查询分为 train (70%) 和 test (30%) 两组 +2. 每轮迭代:用 Anthropic API (extended thinking) 生成候选 description +3. 在 train set 上评估候选 → 保留更优的 +4. 最终在 test set 上验证,检测过拟合(train-test gap > 20% 告警) + +**输出**:结果保存在 `tests/evaluation//results/loop/`。 + +### 回归检测 + +对比 baseline 和 latest 结果,检测多维度回归。 + +```bash +# 保存当前结果为基线 +make eval-baseline-save S=pace-dev + +# 离线回归检查(零 API 调用,纯 JSON diff) +make eval-regress-offline + +# 全量回归(重新运行 eval + 多维对比) +make eval-regress + +# 对比单个 Skill 的 baseline vs latest +make eval-baseline-diff S=pace-dev + +# 批量保存所有基线 +make eval-baseline-save-all +``` + +**回归指标与阈值**: + +| 指标 | WARNING | FAILURE | +|------|---------|---------| +| 正面触发率下降 | > 10% | > 20% | +| 假正面增加 | >= 1 | >= 1 | +| 假负面增加 | >= 2 | >= 4 | +| 总体通过率下降 | > 5% | > 15% | + +**输出**:报告保存在 `tests/evaluation/regress/latest-report.json`。 + +### 辅助命令 + +```bash +# 覆盖率报告 +make eval-coverage + +# 过期检测(Skill 变更后 eval 未更新) +make eval-stale + +# 一键全量(trigger + behavior + coverage + stale) +make eval-all + +# 检测哪些 Skill 有 git 变更 +python3 -m eval changed --base origin/main +``` + +## 端到端工作流 + +### 场景 1:新 Skill 建立评估基线 + +```bash +# 1. 确认 trigger-evals.json 存在 +ls tests/evaluation/pace-xxx/trigger-evals.json + +# 2. 首次评估 +make eval-trigger-one S=pace-xxx + +# 3. 保存为基线 +make eval-baseline-save S=pace-xxx +``` + +### 场景 2:优化低触发率 Skill + +```bash +# 1. 评估当前状态 +make eval-trigger-one S=pace-dev RUNS=5 + +# 2. 运行优化循环 +export ANTHROPIC_API_KEY=sk-ant-... +make eval-fix S=pace-dev MODEL=claude-sonnet-4-20250514 N=5 + +# 3. 查看差异 +make eval-fix-diff S=pace-dev + +# 4. 应用改进 +make eval-fix-apply S=pace-dev + +# 5. 验证改进 +make eval-trigger-one S=pace-dev RUNS=5 + +# 6. 确认无回归后更新基线 +make eval-baseline-save S=pace-dev +``` + +### 场景 3:PR 提交前验证 + +```bash +# 快速检查变更的 Skill 是否回归 +make eval-trigger-changed +make eval-regress-offline +``` + +## CI 集成 + +### 自动触发(PR) + +修改 `skills/` 下的文件时,CI 自动运行离线回归检查: +- Job: `eval-regress` — 对比 committed baseline.json vs latest.json +- 零 API 调用,无成本 +- 失败时在 PR 上标注 `::error` + +### 手动触发(workflow_dispatch) + +在 GitHub Actions 页面手动触发 live eval: +1. 进入 Actions → Validate → Run workflow +2. 填写 `eval_skill`(Skill 名称或 `all`)和 `eval_runs` +3. 需要 `ANTHROPIC_API_KEY` 作为 Repository Secret + +## 数据目录结构 + +``` +tests/evaluation/ +├── pace-dev/ +│ ├── trigger-evals.json # 触发评估查询集(手动维护) +│ ├── evals.json # 行为评估用例(手动维护) +│ └── results/ +│ ├── latest.json # 最近一次评估结果 +│ ├── baseline.json # 基线快照 +│ ├── history/ # 按时间戳归档的历史结果 +│ │ └── 2026-03-15T14-30.json +│ └── loop/ +│ ├── results.json # 优化循环结果 +│ └── best-description.txt +├── pace-change/ +│ └── ... +└── regress/ + └── latest-report.json # 回归检测报告 +``` + +### trigger-evals.json 格式 + +```json +[ + {"query": "帮我开始做用户认证功能", "should_trigger": true}, + {"query": "帮我实现登录页面", "should_trigger": true}, + {"query": "今天天气怎么样", "should_trigger": false}, + {"query": "查看项目进度", "should_trigger": false} +] +``` + +### latest.json 关键字段 + +```json +{ + "skill": "pace-dev", + "timestamp": "2026-03-15T14:30:00+00:00", + "description_hash": "a1b2c3d4e5f67890", + "summary": {"total": 35, "passed": 28, "failed": 7}, + "positive": {"total": 20, "passed": 16, "failed": 4}, + "negative": {"total": 15, "passed": 12, "failed": 3}, + "false_negatives": [{"id": 3, "query": "..."}], + "false_positives": [{"id": 7, "query": "..."}], + "metadata": { + "model": "claude-sonnet-4-20250514", + "sdk_options": {"max_turns": 5, "timeout": 90}, + "environment": {"python": "3.13", "sdk": "0.1.44"}, + "duration_seconds": 245.3 + } +} +``` + +## 参数速查 + +| 参数 | Make 变量 | CLI 参数 | 默认值 | 说明 | +|------|-----------|----------|--------|------| +| 运行次数 | `RUNS` | `--runs` | 3 | 每条查询重复运行次数 | +| 超时 | `TIMEOUT` | `--timeout` | 90s | 单次查询超时 | +| 最大轮次 | `MAX_TURNS` | `--max-turns` | 5 | Agent SDK 最大对话轮次 | +| 冒烟数量 | `SMOKE_N` | `--smoke-n` | 5 | 冒烟测试查询数量 | +| 模型 | `MODEL` | `--model` | (默认) | 指定 Claude 模型 ID | +| 测试集比例 | — | `--holdout` | 0.3 | loop 中 test set 占比 | +| 随机种子 | — | `--seed` | 42 | train/test 分割种子 | + +## 故障排除 + +**Q: 触发率为 0%** +- 确认 `claude-agent-sdk` 已安装且版本 >= 0.1.44 +- 确认不在 Claude Code 嵌套会话中运行(CLAUDECODE 环境变量会自动清除) +- 尝试增加 `MAX_TURNS=8` + +**Q: `make eval-fix` 报错 ANTHROPIC_API_KEY** +- 设置环境变量:`export ANTHROPIC_API_KEY=sk-ant-...` +- 或配置 AWS Bedrock(设置 `AWS_REGION`,无需 API Key) + +**Q: 回归检测无输出** +- 确认已运行过 `eval-baseline-save` 保存基线 +- 确认 `tests/evaluation//results/` 下同时有 `baseline.json` 和 `latest.json` + +**Q: CI eval-regress 未触发** +- 仅在 PR 且 `skills/` 有变更时触发 +- Push 到 main 不触发此 job diff --git a/eval/__init__.py b/eval/__init__.py new file mode 100644 index 0000000..97b53b4 --- /dev/null +++ b/eval/__init__.py @@ -0,0 +1,17 @@ +"""devpace eval — Skill evaluation toolkit. + +Modular package for trigger evaluation, description optimization, +regression detection, and baseline management of devpace skills. + +Submodules: + skill_io — SKILL.md read/write utilities + results — Evaluation results persistence + baseline — Baseline save/diff management + trigger — Agent SDK trigger detection + regress — Multi-dimensional regression analysis + improve — Anthropic API description generation + loop — Description optimization loop + cli — Unified CLI entry point +""" + +__version__ = "0.2.0" diff --git a/eval/__main__.py b/eval/__main__.py new file mode 100644 index 0000000..caf81c7 --- /dev/null +++ b/eval/__main__.py @@ -0,0 +1,6 @@ +"""Allow running eval as a package: python3 -m eval""" +import sys + +from .cli import main + +sys.exit(main()) diff --git a/eval/_gate_sdk.py b/eval/_gate_sdk.py new file mode 100644 index 0000000..ccaf12c --- /dev/null +++ b/eval/_gate_sdk.py @@ -0,0 +1,111 @@ +#!/usr/bin/env python3 +"""Gate check: verify Agent SDK can detect skill triggering. + +Runs 3 queries (2 positive, 1 negative) to validate the SDK path works. +Gate passes if >= 1 positive query triggers the target skill. + +Usage: + python3 eval/_gate_sdk.py +""" + +import asyncio +import json +import os +import sys +from pathlib import Path + +from claude_agent_sdk import ( + AssistantMessage, + ClaudeAgentOptions, + ToolUseBlock, + query, +) + +DEVPACE_ROOT = str(Path(__file__).resolve().parent.parent) + +# Remove CLAUDECODE to allow SDK to spawn claude subprocess without +# "nested session" error when running inside a Claude Code session. +os.environ.pop("CLAUDECODE", None) + +GATE_QUERIES = [ + {"query": "帮我开始做用户认证功能", "skill": "pace-dev", "should_trigger": True}, + {"query": "帮我实现登录页面的前端代码", "skill": "pace-dev", "should_trigger": True}, + {"query": "今天天气怎么样", "skill": "pace-dev", "should_trigger": False}, +] + + +async def check_skill_trigger(prompt: str, skill_name: str, timeout: int = 120) -> bool: + """Run a single query and check if the target skill is triggered.""" + options = ClaudeAgentOptions( + cwd=DEVPACE_ROOT, + permission_mode="bypassPermissions", + max_turns=3, + plugins=[{"type": "local", "path": DEVPACE_ROOT}], + ) + + triggered = False + try: + async for message in query(prompt=prompt, options=options): + if triggered: + continue # drain remaining messages for clean shutdown + if isinstance(message, AssistantMessage): + for block in message.content: + if isinstance(block, ToolUseBlock) and block.name == "Skill": + input_str = json.dumps(block.input) + if skill_name in input_str: + triggered = True + break + except Exception as e: + print(f" ERROR: {e}", file=sys.stderr) + return triggered + + +async def main(): + print("Agent SDK Gate Check", file=sys.stderr) + print(f" cwd: {DEVPACE_ROOT}", file=sys.stderr) + print(f" queries: {len(GATE_QUERIES)}", file=sys.stderr) + print(file=sys.stderr) + + positive_pass = 0 + negative_pass = 0 + + for item in GATE_QUERIES: + q = item["query"] + skill = item["skill"] + should = item["should_trigger"] + + print(f" Testing: {q[:50]}...", file=sys.stderr) + try: + triggered = await asyncio.wait_for( + check_skill_trigger(q, skill), + timeout=120, + ) + except asyncio.TimeoutError: + print(f" TIMEOUT", file=sys.stderr) + triggered = False + + if should: + passed = triggered + if passed: + positive_pass += 1 + print(f" {'PASS' if passed else 'FAIL'}: expected=trigger, got={'trigger' if triggered else 'no-trigger'}", file=sys.stderr) + else: + passed = not triggered + if passed: + negative_pass += 1 + print(f" {'PASS' if passed else 'FAIL'}: expected=no-trigger, got={'trigger' if triggered else 'no-trigger'}", file=sys.stderr) + + print(file=sys.stderr) + total_positive = sum(1 for q in GATE_QUERIES if q["should_trigger"]) + total_negative = sum(1 for q in GATE_QUERIES if not q["should_trigger"]) + print(f" Positive: {positive_pass}/{total_positive}", file=sys.stderr) + print(f" Negative: {negative_pass}/{total_negative}", file=sys.stderr) + + gate_passed = positive_pass >= 1 + print(file=sys.stderr) + print(f" Gate: {'PASSED' if gate_passed else 'FAILED'}", file=sys.stderr) + return 0 if gate_passed else 1 + + +if __name__ == "__main__": + sys.exit(asyncio.run(main())) diff --git a/eval/apply.py b/eval/apply.py new file mode 100644 index 0000000..b985cc9 --- /dev/null +++ b/eval/apply.py @@ -0,0 +1,215 @@ +#!/usr/bin/env python3 +"""Apply best description from eval loop to SKILL.md. + +Subcommands: + diff - Show difference between current and best description + apply - Replace SKILL.md description with the best one +""" + +import argparse +import json +import re +import shutil +import sys +from pathlib import Path + +DEVPACE_ROOT = Path(__file__).resolve().parent.parent +SKILLS_DIR = DEVPACE_ROOT / "skills" +EVAL_DATA_DIR = DEVPACE_ROOT / "tests" / "evaluation" + + +def get_current_description(skill_dir: Path) -> str: + """Extract current description from SKILL.md frontmatter.""" + content = (skill_dir / "SKILL.md").read_text() + lines = content.split("\n") + if lines[0].strip() != "---": + return "" + + desc_lines = [] + in_desc = False + for line in lines[1:]: + if line.strip() == "---": + break + if line.startswith("description:"): + value = line[len("description:"):].strip() + if value in (">", "|", ">-", "|-"): + in_desc = True + continue + else: + return value.strip('"').strip("'") + elif in_desc: + if line.startswith(" ") or line.startswith("\t"): + desc_lines.append(line.strip()) + else: + in_desc = False + return " ".join(desc_lines) + + +def get_best_description(skill_name: str) -> str | None: + """Read best description from eval loop results.""" + best_file = EVAL_DATA_DIR / skill_name / "results" / "loop" / "best-description.txt" + if best_file.exists(): + return best_file.read_text().strip() + + results_file = EVAL_DATA_DIR / skill_name / "results" / "loop" / "results.json" + if results_file.exists(): + data = json.loads(results_file.read_text()) + return data.get("best_description") + + return None + + +def replace_description(skill_dir: Path, new_desc: str) -> None: + """Replace description in SKILL.md frontmatter.""" + skill_md = skill_dir / "SKILL.md" + + # Backup + backup = skill_md.with_suffix(".md.bak") + shutil.copy2(skill_md, backup) + + content = skill_md.read_text() + lines = content.split("\n") + + if lines[0].strip() != "---": + print("Error: SKILL.md has no frontmatter", file=sys.stderr) + return + + # Find description field boundaries + desc_start = None + desc_end = None + for i, line in enumerate(lines[1:], start=1): + if line.strip() == "---": + if desc_start is not None and desc_end is None: + desc_end = i + break + if line.startswith("description:"): + desc_start = i + value = line[len("description:"):].strip() + if value not in (">", "|", ">-", "|-"): + desc_end = i + 1 + continue + if desc_start is not None and desc_end is None: + if line.startswith(" ") or line.startswith("\t"): + continue + else: + desc_end = i + + if desc_start is None: + print("Error: no description field found in frontmatter", file=sys.stderr) + return + + if desc_end is None: + desc_end = desc_start + 1 + + # Format new description + if len(new_desc) <= 200: + new_lines = [f"description: {new_desc}"] + else: + # Use folded block scalar for long descriptions + wrapped = new_desc.replace("\n", " ") + new_lines = ["description: >"] + # Split into ~80 char lines + words = wrapped.split() + current_line = " " + for word in words: + if len(current_line) + len(word) + 1 > 80 and len(current_line) > 2: + new_lines.append(current_line) + current_line = " " + word + else: + current_line += (" " if len(current_line) > 2 else "") + word + if current_line.strip(): + new_lines.append(current_line) + + # Replace + result_lines = lines[:desc_start] + new_lines + lines[desc_end:] + new_content = "\n".join(result_lines) + + # Validate frontmatter is parseable + fm_lines = new_content.split("\n") + if fm_lines[0].strip() != "---": + print("Error: broken frontmatter after replacement, restoring backup", file=sys.stderr) + shutil.copy2(backup, skill_md) + return + + has_closing = False + for line in fm_lines[1:]: + if line.strip() == "---": + has_closing = True + break + if not has_closing: + print("Error: broken frontmatter (no closing ---), restoring backup", file=sys.stderr) + shutil.copy2(backup, skill_md) + return + + skill_md.write_text(new_content) + backup.unlink() + print(f" description updated in {skill_md.relative_to(DEVPACE_ROOT)}") + + +def cmd_diff(args: argparse.Namespace) -> int: + """Show diff between current and best description.""" + skill_name = args.skill + skill_dir = SKILLS_DIR / skill_name + + current = get_current_description(skill_dir) + best = get_best_description(skill_name) + + if best is None: + print(f"No best description found for {skill_name}. Run eval-fix first.", file=sys.stderr) + return 1 + + print(f"--- current ({skill_name})") + print(current) + print() + print(f"+++ best ({skill_name})") + print(best) + print() + + if current == best: + print(" (identical)") + else: + print(f" current: {len(current)} chars") + print(f" best: {len(best)} chars") + + return 0 + + +def cmd_apply(args: argparse.Namespace) -> int: + """Apply best description to SKILL.md.""" + skill_name = args.skill + skill_dir = SKILLS_DIR / skill_name + + best = get_best_description(skill_name) + if best is None: + print(f"No best description found for {skill_name}. Run eval-fix first.", file=sys.stderr) + return 1 + + current = get_current_description(skill_dir) + if current == best: + print(f" {skill_name}: description already matches best") + return 0 + + replace_description(skill_dir, best) + return 0 + + +def main(): + parser = argparse.ArgumentParser(description="Apply eval results to SKILL.md") + sub = parser.add_subparsers(dest="action", required=True) + + p_diff = sub.add_parser("diff", help="Show diff between current and best") + p_diff.add_argument("--skill", "-s", required=True) + + p_apply = sub.add_parser("apply", help="Apply best description") + p_apply.add_argument("--skill", "-s", required=True) + + args = parser.parse_args() + if args.action == "diff": + return cmd_diff(args) + elif args.action == "apply": + return cmd_apply(args) + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/eval/baseline.py b/eval/baseline.py new file mode 100644 index 0000000..2cfec0d --- /dev/null +++ b/eval/baseline.py @@ -0,0 +1,47 @@ +"""Baseline management for eval results. + +Save current results as baselines and compare against them. +""" +from __future__ import annotations + +import json +import shutil +import sys +from pathlib import Path + +from .results import DEVPACE_ROOT, EVAL_DATA_DIR + + +def save_baseline(skill_name: str) -> int: + """Copy latest.json to baseline.json for a skill. Returns exit code.""" + rdir = EVAL_DATA_DIR / skill_name / "results" + latest = rdir / "latest.json" + baseline = rdir / "baseline.json" + + if not latest.exists(): + print(f"Error: no latest.json for {skill_name}", file=sys.stderr) + return 1 + + shutil.copy2(latest, baseline) + print(f" baseline saved: {baseline.relative_to(DEVPACE_ROOT)}") + return 0 + + +def diff_baseline(skill_name: str) -> int: + """Show diff between baseline and latest for a skill. Returns exit code.""" + rdir = EVAL_DATA_DIR / skill_name / "results" + baseline = rdir / "baseline.json" + latest = rdir / "latest.json" + + if not baseline.exists() or not latest.exists(): + print(f"Missing baseline or latest for {skill_name}", file=sys.stderr) + return 1 + + bl = json.loads(baseline.read_text()) + lt = json.loads(latest.read_text()) + bl_s, lt_s = bl["summary"], lt["summary"] + print( + f" {skill_name}: baseline {bl_s['passed']}/{bl_s['total']}" + f" -> latest {lt_s['passed']}/{lt_s['total']}" + ) + return 0 diff --git a/eval/cli.py b/eval/cli.py new file mode 100644 index 0000000..42b4936 --- /dev/null +++ b/eval/cli.py @@ -0,0 +1,176 @@ +"""Unified CLI entry point for devpace eval toolkit. + +Usage: + python3 -m eval trigger --skill pace-dev [--runs N] [--timeout T] [--max-turns M] + python3 -m eval loop --skill pace-dev --model MODEL [--iterations N] + python3 -m eval regress [--threshold 0.1] + python3 -m eval baseline save|diff --skill pace-dev + python3 -m eval changed [--base origin/main] +""" +from __future__ import annotations + +import argparse +import json +import sys +from pathlib import Path + +from .baseline import diff_baseline, save_baseline +from .regress import detect_changed_skills, run_regress +from .results import DEVPACE_ROOT, EVAL_DATA_DIR, SKILLS_DIR, build_metadata, save_trigger_results +from .skill_io import read_description +from .trigger import DEFAULT_MAX_TURNS, DEFAULT_RUNS, DEFAULT_TIMEOUT, run_eval_set + + +def cmd_trigger(args: argparse.Namespace) -> int: + """Run trigger evaluation.""" + skill_name = args.skill + skill_dir = SKILLS_DIR / skill_name + eval_file = EVAL_DATA_DIR / skill_name / "trigger-evals.json" + + if not skill_dir.is_dir(): + print(f"Error: skill directory not found: {skill_dir}", file=sys.stderr) + return 1 + if not eval_file.exists(): + print(f"Error: eval file not found: {eval_file}", file=sys.stderr) + return 1 + + eval_set = json.loads(eval_file.read_text()) + if getattr(args, "smoke", False): + pos = [e for e in eval_set if e.get("should_trigger")] + neg = [e for e in eval_set if not e.get("should_trigger")] + n = getattr(args, "smoke_n", 5) + eval_set = pos[:3] + neg[:max(n - 3, 2)] + + description = read_description(skill_dir) + timeout = args.timeout + runs = args.runs + max_turns = args.max_turns + + print(f" skill: {skill_name}", file=sys.stderr) + print(f" timeout: {timeout}s, runs: {runs}, max_turns: {max_turns}, queries: {len(eval_set)}", file=sys.stderr) + + raw = run_eval_set( + eval_set=eval_set, skill_name=skill_name, description=description, + num_workers=min(10, len(eval_set)), timeout=timeout, + project_root=str(DEVPACE_ROOT), + runs_per_query=runs, model=getattr(args, "model", None), + max_turns=max_turns, + ) + + metadata = build_metadata( + model=getattr(args, "model", None), + max_turns=max_turns, + timeout=timeout, + runs_per_query=runs, + duration_seconds=raw.get("duration_seconds"), + ) + latest = save_trigger_results(skill_name, raw, metadata=metadata) + + s = raw["summary"] + print(f" results: {latest.relative_to(DEVPACE_ROOT)}", file=sys.stderr) + print(f"\n {skill_name}: {s['passed']}/{s['total']} passed", file=sys.stderr) + print(json.dumps(raw, indent=2, ensure_ascii=False)) + return 0 if s["failed"] == 0 else 1 + + +def cmd_loop(args: argparse.Namespace) -> int: + """Run description optimization loop.""" + from .loop import run_loop + return run_loop( + skill_name=args.skill, + model=args.model, + iterations=args.iterations, + timeout=args.timeout, + runs=args.runs, + max_turns=args.max_turns, + holdout=args.holdout, + seed=args.seed, + ) + + +def cmd_regress(args: argparse.Namespace) -> int: + """Run regression check.""" + return run_regress(threshold=args.threshold) + + +def cmd_baseline(args: argparse.Namespace) -> int: + """Manage baselines.""" + if args.baseline_action == "save": + return save_baseline(args.skill) + if args.baseline_action == "diff": + return diff_baseline(args.skill) + return 1 + + +def cmd_changed(args: argparse.Namespace) -> int: + """Show changed skills relative to a git ref.""" + changed = detect_changed_skills(args.base) + if not changed: + print(" no skill changes detected") + return 0 + for s in changed: + print(f" changed: {s}") + return 0 + + +def build_parser() -> argparse.ArgumentParser: + """Build the argument parser.""" + p = argparse.ArgumentParser( + prog="python3 -m eval", + description="devpace skill evaluation toolkit", + ) + sub = p.add_subparsers(dest="command", required=True) + + # trigger + t = sub.add_parser("trigger", help="Run trigger evaluation") + t.add_argument("--skill", "-s", required=True) + t.add_argument("--runs", "-r", type=int, default=DEFAULT_RUNS) + t.add_argument("--timeout", "-t", type=int, default=DEFAULT_TIMEOUT) + t.add_argument("--max-turns", type=int, default=DEFAULT_MAX_TURNS) + t.add_argument("--model", "-m") + t.add_argument("--smoke", action="store_true") + t.add_argument("--smoke-n", type=int, default=5) + + # loop + lo = sub.add_parser("loop", help="Run description optimization loop") + lo.add_argument("--skill", "-s", required=True) + lo.add_argument("--model", "-m", required=True) + lo.add_argument("--iterations", "-n", type=int, default=5) + lo.add_argument("--runs", "-r", type=int, default=DEFAULT_RUNS) + lo.add_argument("--timeout", "-t", type=int, default=DEFAULT_TIMEOUT) + lo.add_argument("--max-turns", type=int, default=DEFAULT_MAX_TURNS) + lo.add_argument("--holdout", type=float, default=0.3) + lo.add_argument("--seed", type=int, default=42) + + # regress + r = sub.add_parser("regress", help="Check for regressions") + r.add_argument("--threshold", type=float, default=0.1) + + # baseline + b = sub.add_parser("baseline", help="Manage baselines") + b.add_argument("baseline_action", choices=["save", "diff"]) + b.add_argument("--skill", "-s", required=True) + + # changed + c = sub.add_parser("changed", help="Detect changed skills") + c.add_argument("--base", default="origin/main") + + return p + + +def main(argv: list[str] | None = None) -> int: + """CLI entry point.""" + parser = build_parser() + args = parser.parse_args(argv) + handlers = { + "trigger": cmd_trigger, + "loop": cmd_loop, + "regress": cmd_regress, + "baseline": cmd_baseline, + "changed": cmd_changed, + } + return handlers[args.command](args) + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/eval/eval-runner.sh b/eval/eval-runner.sh new file mode 100755 index 0000000..834e32e --- /dev/null +++ b/eval/eval-runner.sh @@ -0,0 +1,79 @@ +#!/bin/bash +# Routes behavioral eval commands through skill-creator. +# +# The shim (eval/shim.py) has been replaced by the modular eval package. +# Trigger/loop/regress/baseline are now handled by: python3 -m eval +# +# This script is retained ONLY for behavioral eval routing (evals.json). +# +# Usage: +# bash eval/eval-runner.sh eval --skill skills/pace-dev --evals tests/evaluation/pace-dev/evals.json + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" + +# Discover skill-creator Python scripts directory from plugin cache. +find_sc_scripts_root() { + local sc_base="$HOME/.claude/plugins/cache/claude-plugins-official/skill-creator" + [ -d "$sc_base" ] || return 1 + local hash_dir + hash_dir=$(ls "$sc_base" | head -1) + [ -n "$hash_dir" ] || return 1 + local scripts_root="$sc_base/$hash_dir/skills/skill-creator" + [ -f "$scripts_root/scripts/run_eval.py" ] && echo "$scripts_root" && return 0 + return 1 +} + +subcmd="${1:-}" + +case "$subcmd" in + eval|eval-behavior) + shift + local_skill_path="" evals_path="" + while [ $# -gt 0 ]; do + case "$1" in + --skill) local_skill_path="$2"; shift 2 ;; + --evals) evals_path="$2"; shift 2 ;; + *) shift ;; + esac + done + if [ -z "$local_skill_path" ] || [ -z "$evals_path" ]; then + echo "Error: eval requires --skill and --evals" >&2 + exit 1 + fi + sc_root=$(find_sc_scripts_root) || { + echo "Error: skill-creator scripts not found. Behavioral eval requires skill-creator plugin." >&2 + echo " Install via: claude /install skill-creator" >&2 + exit 1 + } + [[ "$local_skill_path" != /* ]] && local_skill_path="$PROJECT_ROOT/$local_skill_path" + [[ "$evals_path" != /* ]] && evals_path="$PROJECT_ROOT/$evals_path" + echo " (using skill-creator for behavioral eval)" + PYTHONPATH="$sc_root:${PYTHONPATH:-}" exec python3 -m scripts.run_eval \ + --eval-set "$evals_path" \ + --skill-path "$local_skill_path" \ + --verbose \ + --num-workers 10 \ + --timeout 90 \ + --runs-per-query 1 + ;; + eval-trigger*|eval-loop|regress|baseline|changed) + # Redirect to new modular eval package + echo " Note: '$subcmd' is now handled by: python3 -m eval" >&2 + echo " Redirecting..." >&2 + exec python3 -m eval "$@" + ;; + *) + echo "Usage: bash eval/eval-runner.sh eval --skill --evals " + echo "" + echo "For trigger/loop/regress/baseline, use:" + echo " python3 -m eval trigger --skill [--runs N]" + echo " python3 -m eval loop --skill --model " + echo " python3 -m eval regress" + echo " python3 -m eval baseline save|diff --skill " + echo " python3 -m eval changed [--base origin/main]" + exit 1 + ;; +esac diff --git a/eval/improve.py b/eval/improve.py new file mode 100644 index 0000000..587a8a3 --- /dev/null +++ b/eval/improve.py @@ -0,0 +1,187 @@ +"""Anthropic API-based description generation (P1.2). + +Replaces the old `claude -p` subprocess approach with direct Anthropic API +calls, gaining: +- Extended thinking (budget_tokens=10000) for better reasoning +- Full SKILL.md content context (not just description) +- History of previous attempts to prevent repetition +- Hard limit of 1024 characters on generated descriptions +- Support for ANTHROPIC_API_KEY or AWS Bedrock via environment +""" +from __future__ import annotations + +import json +import os +import sys + +MAX_DESCRIPTION_LENGTH = 1024 + + +def _get_client(): + """Get an Anthropic client, supporting both direct API and AWS Bedrock.""" + try: + import anthropic + except ImportError: + print( + "Error: anthropic package not installed. " + "Run: pip install 'anthropic>=0.50.0'", + file=sys.stderr, + ) + return None + + # Try AWS Bedrock first if configured + if os.environ.get("AWS_REGION") and not os.environ.get("ANTHROPIC_API_KEY"): + try: + return anthropic.AnthropicBedrock() + except Exception: + pass + + # Fall back to direct API + api_key = os.environ.get("ANTHROPIC_API_KEY") + if not api_key: + print( + "Error: ANTHROPIC_API_KEY not set and AWS Bedrock not configured.", + file=sys.stderr, + ) + return None + + return anthropic.Anthropic(api_key=api_key) + + +def _build_prompt( + skill_name: str, + current_desc: str, + skill_md_content: str, + eval_results: dict, + history: list[dict] | None = None, +) -> str: + """Build the improvement prompt with full context.""" + false_negatives = [ + r for r in eval_results.get("results", []) + if r.get("should_trigger") and not r.get("pass") + ] + false_positives = [ + r for r in eval_results.get("results", []) + if not r.get("should_trigger") and not r.get("pass") + ] + + fn_queries = json.dumps( + [r["query"] for r in false_negatives], ensure_ascii=False + ) + fp_queries = json.dumps( + [r["query"] for r in false_positives], ensure_ascii=False + ) + + # Build history context + history_text = "" + if history: + prev_attempts = [] + for h in history[-5:]: # last 5 attempts + score = h.get("score", "?") + desc = h.get("description", "?")[:200] + prev_attempts.append(f" - score={score}: {desc}") + if prev_attempts: + history_text = ( + "\n\nPrevious attempts (do NOT repeat these):\n" + + "\n".join(prev_attempts) + ) + + return f"""You are optimizing a Claude Code skill description for auto-triggering accuracy. + +SKILL NAME: {skill_name} + +CURRENT DESCRIPTION: +{current_desc} + +FULL SKILL.MD CONTENT (for context on what this skill does): +{skill_md_content[:3000]} + +FALSE NEGATIVES (should trigger but didn't): +{fn_queries} + +FALSE POSITIVES (shouldn't trigger but did): +{fp_queries} +{history_text} + +Generate an improved description that: +1. Better captures the missed positive cases (false negatives) +2. Better excludes the wrongly matched negative cases (false positives) +3. Starts with "Use when" +4. Includes specific trigger keywords in quotes (Chinese and English) +5. Includes NOT-for exclusions to prevent false positives +6. Does NOT exceed {MAX_DESCRIPTION_LENGTH} characters +7. Does NOT describe what the skill does (no internal steps) +8. Only describes WHEN it should trigger + +Output ONLY the improved description text. No markdown, no quotes around it, no explanation.""" + + +def generate_improved_description( + skill_name: str, + current_desc: str, + skill_md_content: str, + eval_results: dict, + model: str = "claude-sonnet-4-20250514", + history: list[dict] | None = None, +) -> str | None: + """Generate an improved skill description using Anthropic API. + + Returns the improved description string, or None on failure. + """ + client = _get_client() + if client is None: + return None + + prompt = _build_prompt( + skill_name, current_desc, skill_md_content, eval_results, history + ) + + try: + # Use extended thinking for better reasoning + response = client.messages.create( + model=model, + max_tokens=8000, + thinking={ + "type": "enabled", + "budget_tokens": 10000, + }, + messages=[{"role": "user", "content": prompt}], + ) + + # Extract text from response (skip thinking blocks) + desc = "" + for block in response.content: + if block.type == "text": + desc = block.text.strip() + break + + if not desc: + print(" generation returned empty output", file=sys.stderr) + return None + + # Clean up common wrapping artifacts + if desc.startswith('"') and desc.endswith('"'): + desc = desc[1:-1] + if desc.startswith("`") and desc.endswith("`"): + desc = desc[1:-1] + if desc.startswith("```") and desc.endswith("```"): + desc = desc[3:-3].strip() + + # Hard limit enforcement + if len(desc) > MAX_DESCRIPTION_LENGTH: + print( + f" generated description too long ({len(desc)} chars), " + f"truncating to {MAX_DESCRIPTION_LENGTH}", + file=sys.stderr, + ) + # Truncate at last space before limit + desc = desc[:MAX_DESCRIPTION_LENGTH] + last_space = desc.rfind(" ") + if last_space > MAX_DESCRIPTION_LENGTH * 0.8: + desc = desc[:last_space] + + return desc + + except Exception as e: + print(f" generation error: {e}", file=sys.stderr) + return None diff --git a/eval/loop.py b/eval/loop.py new file mode 100644 index 0000000..153b289 --- /dev/null +++ b/eval/loop.py @@ -0,0 +1,271 @@ +"""Description optimization loop with train/test split (P1.3). + +Iteratively improves a skill's description by: +1. Splitting eval queries into train (70%) and test (30%) sets +2. Generating improved descriptions via Anthropic API +3. Evaluating candidates on the train set +4. Validating the best candidate on the test set to detect overfitting +""" +from __future__ import annotations + +import json +import random +import sys +import time +from datetime import datetime, timezone +from pathlib import Path + +from .improve import generate_improved_description +from .results import DEVPACE_ROOT, EVAL_DATA_DIR, SKILLS_DIR, eval_score, results_dir_for +from .skill_io import read_description, read_skill_md, replace_description +from .trigger import DEFAULT_MAX_TURNS, DEFAULT_RUNS, DEFAULT_TIMEOUT, run_eval_set + + +def _split_train_test( + eval_set: list[dict], + holdout: float = 0.3, + seed: int | None = None, +) -> tuple[list[dict], list[dict]]: + """Split eval set into train and test sets. + + Maintains proportion of positive/negative queries in both sets. + """ + if seed is not None: + rng = random.Random(seed) + else: + rng = random.Random() + + pos = [e for e in eval_set if e.get("should_trigger")] + neg = [e for e in eval_set if not e.get("should_trigger")] + + def _split(items: list[dict]) -> tuple[list[dict], list[dict]]: + shuffled = items[:] + rng.shuffle(shuffled) + n_test = max(1, round(len(shuffled) * holdout)) + return shuffled[n_test:], shuffled[:n_test] + + train_pos, test_pos = _split(pos) + train_neg, test_neg = _split(neg) + + return train_pos + train_neg, test_pos + test_neg + + +def run_loop( + skill_name: str, + model: str, + iterations: int = 5, + timeout: int = DEFAULT_TIMEOUT, + runs: int = DEFAULT_RUNS, + max_turns: int = DEFAULT_MAX_TURNS, + holdout: float = 0.3, + seed: int | None = 42, +) -> int: + """Run description optimization loop with train/test split. + + Returns exit code (0 = success). + """ + skill_dir = SKILLS_DIR / skill_name + skill_md = skill_dir / "SKILL.md" + eval_file = EVAL_DATA_DIR / skill_name / "trigger-evals.json" + + if not skill_dir.is_dir(): + print(f"Error: skill directory not found: {skill_dir}", file=sys.stderr) + return 1 + if not eval_file.exists(): + print(f"Error: eval file not found: {eval_file}", file=sys.stderr) + return 1 + + eval_set = json.loads(eval_file.read_text()) + rdir = results_dir_for(skill_name) + project_root = str(DEVPACE_ROOT) + + # Train/test split + train_set, test_set = _split_train_test(eval_set, holdout=holdout, seed=seed) + print(f" skill: {skill_name}, iterations: {iterations}", file=sys.stderr) + print( + f" split: {len(train_set)} train / {len(test_set)} test " + f"(holdout={holdout})", + file=sys.stderr, + ) + + best_desc = read_description(skill_dir) + skill_md_content = read_skill_md(skill_dir) + print( + f" initial description ({len(best_desc)} chars): " + f"{best_desc[:80]}...", + file=sys.stderr, + ) + + # --- Initial eval on train set --- + print(f"\n [0/{iterations}] evaluating current description on train set...", file=sys.stderr) + best_results = run_eval_set( + eval_set=train_set, skill_name=skill_name, description=best_desc, + num_workers=min(5, len(train_set)), timeout=timeout, + project_root=project_root, runs_per_query=runs, model=model, + max_turns=max_turns, verbose=False, + ) + best_score = eval_score(best_results) + print(f" [0/{iterations}] train_score: {best_score:.0%}", file=sys.stderr) + + history: list[dict] = [ + {"iteration": 0, "description": best_desc, "train_score": best_score} + ] + + if best_score >= 1.0: + print(" perfect train score, nothing to optimize", file=sys.stderr) + else: + for i in range(1, iterations + 1): + print( + f"\n [{i}/{iterations}] generating improved description...", + file=sys.stderr, + ) + candidate = generate_improved_description( + skill_name=skill_name, + current_desc=best_desc, + skill_md_content=skill_md_content, + eval_results=best_results, + model=model, + history=history, + ) + + if candidate is None or candidate == best_desc: + print( + f" [{i}/{iterations}] no improvement generated, skipping", + file=sys.stderr, + ) + history.append({ + "iteration": i, + "description": best_desc, + "train_score": best_score, + "skipped": True, + }) + continue + + print( + f" [{i}/{iterations}] candidate ({len(candidate)} chars): " + f"{candidate[:80]}...", + file=sys.stderr, + ) + + # Temporarily swap description for eval + original_content = replace_description(skill_md, candidate) + try: + print( + f" [{i}/{iterations}] evaluating candidate on train set...", + file=sys.stderr, + ) + candidate_results = run_eval_set( + eval_set=train_set, + skill_name=skill_name, + description=candidate, + num_workers=min(5, len(train_set)), + timeout=timeout, + project_root=project_root, + runs_per_query=runs, + model=model, + max_turns=max_turns, + verbose=False, + ) + finally: + # Always restore original SKILL.md + skill_md.write_text(original_content) + + candidate_score = eval_score(candidate_results) + print( + f" [{i}/{iterations}] train_score: {candidate_score:.0%} " + f"(best: {best_score:.0%})", + file=sys.stderr, + ) + + if candidate_score > best_score: + print( + f" [{i}/{iterations}] improved! " + f"{best_score:.0%} -> {candidate_score:.0%}", + file=sys.stderr, + ) + best_desc = candidate + best_score = candidate_score + best_results = candidate_results + + history.append({ + "iteration": i, + "description": candidate, + "train_score": candidate_score, + }) + + if best_score >= 1.0: + print( + f" [{i}/{iterations}] perfect train score, stopping early", + file=sys.stderr, + ) + break + + # --- Validate on test set --- + test_score = None + if test_set: + print(f"\n validating best description on test set...", file=sys.stderr) + original_content = replace_description(skill_md, best_desc) + try: + test_results = run_eval_set( + eval_set=test_set, + skill_name=skill_name, + description=best_desc, + num_workers=min(5, len(test_set)), + timeout=timeout, + project_root=project_root, + runs_per_query=runs, + model=model, + max_turns=max_turns, + verbose=False, + ) + finally: + skill_md.write_text(original_content) + + test_score = eval_score(test_results) + print(f" test_score: {test_score:.0%} (train: {best_score:.0%})", file=sys.stderr) + + # Overfitting detection + gap = best_score - test_score + if gap > 0.2: + print( + f" WARNING: possible overfitting " + f"(train-test gap: {gap:.0%})", + file=sys.stderr, + ) + + # --- Save results --- + loop_results = { + "skill": skill_name, + "timestamp": datetime.now(timezone.utc).isoformat(), + "best_description": best_desc, + "best_train_score": best_score, + "test_score": test_score, + "holdout": holdout, + "train_size": len(train_set), + "test_size": len(test_set), + "iterations": len(history) - 1, + "history": history, + } + + loop_dir = rdir / "loop" + (loop_dir / "results.json").write_text( + json.dumps(loop_results, indent=2, ensure_ascii=False) + ) + (loop_dir / "best-description.txt").write_text(best_desc) + + original_desc = read_description(skill_dir) + print(f"\n loop complete: train={best_score:.0%}", file=sys.stderr) + if test_score is not None: + print(f" test={test_score:.0%}", file=sys.stderr) + if best_desc != original_desc: + print( + f" improved description saved to: " + f"{loop_dir.relative_to(DEVPACE_ROOT)}/best-description.txt", + file=sys.stderr, + ) + print(f" apply with: make eval-fix-apply S={skill_name}", file=sys.stderr) + else: + print(" no improvement found over current description", file=sys.stderr) + + print(json.dumps(loop_results, indent=2, ensure_ascii=False)) + return 0 diff --git a/eval/regress.py b/eval/regress.py new file mode 100644 index 0000000..b1a7acd --- /dev/null +++ b/eval/regress.py @@ -0,0 +1,195 @@ +"""Multi-dimensional regression detection (P3.1). + +Compares baseline vs latest results across multiple metrics: +- Positive trigger rate drop +- False positive increase +- False negative increase +- Overall pass rate drop + +Also includes change detection (P3.2) for selective eval. +""" +from __future__ import annotations + +import json +import subprocess +import sys +from pathlib import Path + +from .results import DEVPACE_ROOT, EVAL_DATA_DIR + +# Regression thresholds +THRESHOLDS = { + "positive_trigger_rate_drop": {"warning": 0.10, "failure": 0.20}, + "false_positive_increase": {"warning": 1, "failure": 1}, # any new false positive is warning+failure + "false_negative_increase": {"warning": 2, "failure": 4}, + "overall_pass_rate_drop": {"warning": 0.05, "failure": 0.15}, +} + + +def _compute_metrics(baseline: dict, latest: dict) -> dict: + """Compute regression metrics between baseline and latest.""" + bl_s, lt_s = baseline["summary"], latest["summary"] + + bl_total = max(bl_s.get("total", 0), 1) + lt_total = max(lt_s.get("total", 0), 1) + + bl_rate = bl_s.get("passed", 0) / bl_total + lt_rate = lt_s.get("passed", 0) / lt_total + + # Positive trigger rate + bl_pos = baseline.get("positive", {}) + lt_pos = latest.get("positive", {}) + bl_pos_total = max(bl_pos.get("total", 0), 1) + lt_pos_total = max(lt_pos.get("total", 0), 1) + bl_pos_rate = bl_pos.get("passed", 0) / bl_pos_total + lt_pos_rate = lt_pos.get("passed", 0) / lt_pos_total + + # False negatives and false positives + bl_fn = len(baseline.get("false_negatives", [])) + lt_fn = len(latest.get("false_negatives", [])) + bl_fp = len(baseline.get("false_positives", [])) + lt_fp = len(latest.get("false_positives", [])) + + return { + "positive_trigger_rate_drop": round(bl_pos_rate - lt_pos_rate, 4), + "false_positive_increase": lt_fp - bl_fp, + "false_negative_increase": lt_fn - bl_fn, + "overall_pass_rate_drop": round(bl_rate - lt_rate, 4), + "baseline_rate": bl_rate, + "latest_rate": lt_rate, + "baseline_pos_rate": bl_pos_rate, + "latest_pos_rate": lt_pos_rate, + } + + +def _classify(metric_name: str, value: float | int) -> str: + """Classify a metric value as OK/WARNING/FAILURE.""" + t = THRESHOLDS.get(metric_name, {}) + failure = t.get("failure", float("inf")) + warning = t.get("warning", float("inf")) + + if value >= failure: + return "FAILURE" + if value >= warning: + return "WARNING" + return "OK" + + +def run_regress(threshold: float = 0.1) -> int: + """Check for regressions across all skills with baselines. + + Returns exit code: 0 if no failures, 1 if any FAILURE-level regression. + """ + any_fail = False + report: dict = {"skills": {}, "overall": "OK"} + + for d in sorted(EVAL_DATA_DIR.iterdir()): + if d.name.startswith("_") or not d.is_dir(): + continue + + bl_path = d / "results" / "baseline.json" + lt_path = d / "results" / "latest.json" + if not bl_path.exists() or not lt_path.exists(): + continue + + bl = json.loads(bl_path.read_text()) + lt = json.loads(lt_path.read_text()) + metrics = _compute_metrics(bl, lt) + + # Classify each metric + classifications = {} + skill_fail = False + skill_warn = False + + for metric_name in THRESHOLDS: + val = metrics.get(metric_name, 0) + cls = _classify(metric_name, val) + classifications[metric_name] = {"value": val, "level": cls} + if cls == "FAILURE": + skill_fail = True + elif cls == "WARNING": + skill_warn = True + + # Determine per-query regressions + bl_results = {r["query"]: r for r in bl.get("raw_results", [])} + lt_results = {r["query"]: r for r in lt.get("raw_results", [])} + new_failures = [] + for q, lt_r in lt_results.items(): + bl_r = bl_results.get(q) + if bl_r and bl_r.get("pass") and not lt_r.get("pass"): + new_failures.append(q) + + skill_status = "FAILURE" if skill_fail else ("WARNING" if skill_warn else "OK") + report["skills"][d.name] = { + "status": skill_status, + "metrics": classifications, + "baseline_rate": f"{metrics['baseline_rate']:.0%}", + "latest_rate": f"{metrics['latest_rate']:.0%}", + "new_failures": new_failures, + } + + if skill_fail: + any_fail = True + print(f" FAILURE {d.name}: {metrics['baseline_rate']:.0%} -> {metrics['latest_rate']:.0%}") + for mn, mc in classifications.items(): + if mc["level"] != "OK": + print(f" {mc['level']} {mn}: {mc['value']}") + elif skill_warn: + print(f" WARNING {d.name}: {metrics['baseline_rate']:.0%} -> {metrics['latest_rate']:.0%}") + else: + print(f" OK {d.name}: {metrics['baseline_rate']:.0%} -> {metrics['latest_rate']:.0%}") + + report["overall"] = "FAILURE" if any_fail else "OK" + + # Write report + report_dir = EVAL_DATA_DIR / "regress" + report_dir.mkdir(parents=True, exist_ok=True) + report_path = report_dir / "latest-report.json" + report_path.write_text(json.dumps(report, indent=2, ensure_ascii=False)) + print(f"\n report: {report_path.relative_to(DEVPACE_ROOT)}") + + return 1 if any_fail else 0 + + +def detect_changed_skills(base_ref: str = "origin/main") -> list[str]: + """Detect skills changed relative to base_ref using git diff (P3.2). + + Returns list of skill names that have changes. + """ + try: + result = subprocess.run( + ["git", "diff", "--name-only", base_ref, "--", "skills/"], + capture_output=True, text=True, timeout=10, + cwd=str(DEVPACE_ROOT), + ) + if result.returncode != 0: + return [] + + changed = set() + for line in result.stdout.strip().split("\n"): + if not line: + continue + parts = line.split("/") + if len(parts) >= 2 and parts[0] == "skills": + changed.add(parts[1]) + return sorted(changed) + except Exception: + return [] + + +def get_sibling_skills(skill_name: str) -> list[str]: + """Get 'sibling' skills that might be affected by a skill's changes. + + For example, pace-dev changes should also check pace-change (similar triggers). + """ + siblings_map = { + "pace-dev": ["pace-change", "pace-test"], + "pace-change": ["pace-dev", "pace-biz"], + "pace-review": ["pace-test", "pace-guard"], + "pace-test": ["pace-review", "pace-dev"], + "pace-status": ["pace-next", "pace-pulse"], + "pace-next": ["pace-status"], + "pace-init": ["pace-biz"], + "pace-biz": ["pace-init", "pace-change"], + } + return siblings_map.get(skill_name, []) diff --git a/eval/results.py b/eval/results.py new file mode 100644 index 0000000..70ac9ab --- /dev/null +++ b/eval/results.py @@ -0,0 +1,137 @@ +"""Evaluation results persistence and data model. + +Handles saving, loading, and structuring trigger evaluation results +with enhanced metadata (P1.4: model, sdk_options, environment, duration). +""" +from __future__ import annotations + +import json +import platform +import sys +from datetime import datetime, timezone +from pathlib import Path + +from . import __version__ +from .skill_io import description_hash + +# Default paths +DEVPACE_ROOT = Path(__file__).resolve().parent.parent +EVAL_DATA_DIR = DEVPACE_ROOT / "tests" / "evaluation" +SKILLS_DIR = DEVPACE_ROOT / "skills" + + +def results_dir_for(skill_name: str) -> Path: + """Get or create the results directory for a skill.""" + d = EVAL_DATA_DIR / skill_name / "results" + d.mkdir(parents=True, exist_ok=True) + (d / "history").mkdir(exist_ok=True) + (d / "loop").mkdir(exist_ok=True) + return d + + +def _get_sdk_version() -> str: + """Get claude_agent_sdk version if available.""" + try: + from importlib.metadata import version + return version("claude-agent-sdk") + except Exception: + return "unknown" + + +def build_metadata( + *, + model: str | None = None, + max_turns: int = 5, + timeout: int = 90, + runs_per_query: int = 1, + duration_seconds: float | None = None, +) -> dict: + """Build enhanced metadata dict (P1.4).""" + meta: dict = {} + if model: + meta["model"] = model + meta["sdk_options"] = { + "max_turns": max_turns, + "timeout": timeout, + "runs_per_query": runs_per_query, + } + meta["environment"] = { + "python": platform.python_version(), + "sdk": _get_sdk_version(), + "eval_version": __version__, + "platform": sys.platform, + } + if duration_seconds is not None: + meta["duration_seconds"] = round(duration_seconds, 1) + return meta + + +def save_trigger_results( + skill_name: str, + raw: dict, + *, + metadata: dict | None = None, +) -> Path: + """Save trigger evaluation results to disk. + + Writes both latest.json and a timestamped history file. + Returns path to latest.json. + """ + rdir = results_dir_for(skill_name) + res = raw.get("results", []) + pos = [r for r in res if r.get("should_trigger")] + neg = [r for r in res if not r.get("should_trigger")] + + structured = { + "skill": skill_name, + "timestamp": datetime.now(timezone.utc).isoformat(), + "description_hash": description_hash(SKILLS_DIR / skill_name), + "summary": raw.get("summary", {}), + "positive": { + "total": len(pos), + "passed": sum(1 for r in pos if r["pass"]), + "failed": sum(1 for r in pos if not r["pass"]), + }, + "negative": { + "total": len(neg), + "passed": sum(1 for r in neg if r["pass"]), + "failed": sum(1 for r in neg if not r["pass"]), + }, + "false_negatives": [ + {"id": i, "query": r["query"]} + for i, r in enumerate(pos) if not r["pass"] + ], + "false_positives": [ + {"id": i, "query": r["query"]} + for i, r in enumerate(neg) if not r["pass"] + ], + "runs_per_query": res[0].get("runs", 1) if res else 1, + "raw_results": res, + } + + if metadata: + structured["metadata"] = metadata + + latest = rdir / "latest.json" + latest.write_text(json.dumps(structured, indent=2, ensure_ascii=False)) + ts = datetime.now().strftime("%Y-%m-%dT%H-%M") + (rdir / "history" / f"{ts}.json").write_text( + json.dumps(structured, indent=2, ensure_ascii=False) + ) + return latest + + +def load_results(skill_name: str, which: str = "latest") -> dict | None: + """Load results for a skill. which='latest' or 'baseline'.""" + rdir = EVAL_DATA_DIR / skill_name / "results" + path = rdir / f"{which}.json" + if not path.exists(): + return None + return json.loads(path.read_text()) + + +def eval_score(results: dict) -> float: + """Compute a single score from eval results (0.0 - 1.0).""" + s = results.get("summary", {}) + total = s.get("total", 0) + return s.get("passed", 0) / max(total, 1) diff --git a/eval/shim.py b/eval/shim.py new file mode 100644 index 0000000..2623cda --- /dev/null +++ b/eval/shim.py @@ -0,0 +1,46 @@ +#!/usr/bin/env python3 +"""Eval shim: compatibility wrapper. + +This file previously contained the full eval implementation (617 lines). +It now delegates to the modular eval package while maintaining backward +compatibility for existing Makefile targets and scripts. + +Usage (unchanged): + python3 eval/shim.py trigger --skill pace-dev [--runs N] [--timeout T] + python3 eval/shim.py loop --skill pace-dev --model MODEL [--iterations N] + python3 eval/shim.py regress [--threshold 0.1] + python3 eval/shim.py baseline save --skill pace-dev +""" +from __future__ import annotations + +import sys +from pathlib import Path + +# Ensure the devpace root (parent of eval/) is on sys.path so that +# `import eval` works when this file is run as `python3 eval/shim.py`. +_DEVPACE_ROOT = str(Path(__file__).resolve().parent.parent) +if _DEVPACE_ROOT not in sys.path: + sys.path.insert(0, _DEVPACE_ROOT) + +# Re-export key functions for any code that imports from shim directly +from eval.skill_io import ( # noqa: F401 + description_hash, + read_description, + replace_description as _replace_description_in_file, +) +from eval.results import ( # noqa: F401 + eval_score as _eval_score, + results_dir_for, + save_trigger_results, +) +from eval.trigger import ( # noqa: F401 + run_eval_set as _run_eval_set, + run_single_query as _run_single_query_sdk, +) +from eval.baseline import save_baseline, diff_baseline # noqa: F401 +from eval.regress import run_regress # noqa: F401 +from eval.cli import main + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/eval/skill_io.py b/eval/skill_io.py new file mode 100644 index 0000000..4c2f921 --- /dev/null +++ b/eval/skill_io.py @@ -0,0 +1,104 @@ +"""SKILL.md read/write utilities. + +Handles frontmatter parsing, description extraction, replacement, +and hashing for devpace skill files. +""" +from __future__ import annotations + +import hashlib +from pathlib import Path + + +def read_description(skill_dir: Path) -> str: + """Extract description from SKILL.md frontmatter. + + Supports both inline and multi-line (folded/literal block) descriptions. + """ + content = (skill_dir / "SKILL.md").read_text() + lines = content.split("\n") + for i, line in enumerate(lines): + if line.startswith("description:"): + value = line[len("description:"):].strip() + if value in (">", "|", ">-", "|-"): + parts = [] + for cont in lines[i + 1:]: + if cont.startswith(" ") or cont.startswith("\t"): + parts.append(cont.strip()) + else: + break + return " ".join(parts) + # Only strip wrapping quotes if fully quoted + if (value.startswith('"') and value.endswith('"')) or \ + (value.startswith("'") and value.endswith("'")): + return value[1:-1] + return value + return "" + + +def read_skill_md(skill_dir: Path) -> str: + """Read full SKILL.md content.""" + return (skill_dir / "SKILL.md").read_text() + + +def description_hash(skill_dir: Path) -> str: + """SHA256 prefix (16 chars) of the current description.""" + return hashlib.sha256(read_description(skill_dir).encode()).hexdigest()[:16] + + +def _format_description_lines(desc: str) -> list[str]: + """Format a description string into YAML frontmatter lines.""" + if len(desc) <= 200: + return [f"description: {desc}"] + + new_lines = ["description: >"] + words = desc.split() + current_line = " " + for word in words: + if len(current_line) + len(word) + 1 > 80 and len(current_line) > 2: + new_lines.append(current_line) + current_line = " " + word + else: + current_line += (" " if len(current_line) > 2 else "") + word + if current_line.strip(): + new_lines.append(current_line) + return new_lines + + +def replace_description(skill_md: Path, new_desc: str) -> str: + """Replace description in SKILL.md frontmatter. + + Returns the original file content (for rollback). + """ + original = skill_md.read_text() + lines = original.split("\n") + if lines[0].strip() != "---": + return original + + desc_start = None + desc_end = None + for i, line in enumerate(lines[1:], start=1): + if line.strip() == "---": + if desc_start is not None and desc_end is None: + desc_end = i + break + if line.startswith("description:"): + desc_start = i + value = line[len("description:"):].strip() + if value not in (">", "|", ">-", "|-"): + desc_end = i + 1 + continue + if desc_start is not None and desc_end is None: + if line.startswith(" ") or line.startswith("\t"): + continue + else: + desc_end = i + + if desc_start is None: + return original + if desc_end is None: + desc_end = desc_start + 1 + + new_lines = _format_description_lines(new_desc) + result_lines = lines[:desc_start] + new_lines + lines[desc_end:] + skill_md.write_text("\n".join(result_lines)) + return original diff --git a/eval/trigger.py b/eval/trigger.py new file mode 100644 index 0000000..2e7b951 --- /dev/null +++ b/eval/trigger.py @@ -0,0 +1,220 @@ +"""Agent SDK-based trigger detection for skill evaluation. + +Uses claude_agent_sdk.query() to run queries and detect whether +a target skill is triggered via ToolUseBlock inspection. + +Enhancements over original shim.py: +- P2.1: max_turns default raised from 3 to 5 +- P2.2: Structured ToolUseBlock matching via block.input.get("skill") +- P2.2: All ToolUseBlock names recorded for debugging +- P2.3: Wilson score confidence interval for statistical stability +""" +from __future__ import annotations + +import asyncio +import json +import math +import os +import sys +import time + +from .results import build_metadata, eval_score, save_trigger_results +from .skill_io import read_description + +# Remove CLAUDECODE to allow SDK to spawn claude subprocess without +# "nested session" error when running inside a Claude Code session. +os.environ.pop("CLAUDECODE", None) + +DEFAULT_TIMEOUT = 90 +DEFAULT_RUNS = 3 +DEFAULT_MAX_TURNS = 5 + + +def _wilson_interval(successes: int, total: int, z: float = 1.96) -> tuple[float, float]: + """Wilson score confidence interval for a proportion. + + Returns (lower, upper) bounds at the given z-level (default 95%). + """ + if total == 0: + return (0.0, 0.0) + p = successes / total + denom = 1 + z * z / total + centre = p + z * z / (2 * total) + spread = z * math.sqrt((p * (1 - p) + z * z / (4 * total)) / total) + return ( + round((centre - spread) / denom, 4), + round((centre + spread) / denom, 4), + ) + + +async def run_single_query( + query_text: str, + skill_name: str, + timeout: int, + project_root: str, + model: str | None = None, + max_turns: int = DEFAULT_MAX_TURNS, +) -> dict: + """Run one query via Agent SDK and detect if a Skill matching skill_name fires. + + Returns a dict with: + triggered: bool + tool_uses: list of all ToolUseBlock names seen (for debugging) + """ + from claude_agent_sdk import ( + AssistantMessage, + ClaudeAgentOptions, + ToolUseBlock, + query as sdk_query, + ) + + options = ClaudeAgentOptions( + cwd=project_root, + permission_mode="bypassPermissions", + max_turns=max_turns, + model=model, + plugins=[{"type": "local", "path": project_root}], + ) + + triggered = False + tool_uses: list[str] = [] + + try: + async for message in sdk_query(prompt=query_text, options=options): + if isinstance(message, AssistantMessage): + for block in message.content: + if not isinstance(block, ToolUseBlock): + continue + tool_uses.append(block.name) + if triggered: + continue + if block.name == "Skill": + # P2.2: Structured matching — check input dict first, + # fall back to JSON string search for robustness + inp = block.input if isinstance(block.input, dict) else {} + if inp.get("skill") == skill_name: + triggered = True + elif skill_name in json.dumps(inp): + triggered = True + except Exception: + pass + + return {"triggered": triggered, "tool_uses": tool_uses} + + +async def run_eval_set_async( + eval_set: list[dict], + skill_name: str, + num_workers: int, + timeout: int, + project_root: str, + runs_per_query: int = 1, + model: str | None = None, + max_turns: int = DEFAULT_MAX_TURNS, +) -> list[tuple[dict, dict]]: + """Run eval set concurrently. Returns (item, result_dict) pairs.""" + sem = asyncio.Semaphore(num_workers) + + async def _run(item: dict) -> tuple[dict, dict]: + async with sem: + try: + result = await asyncio.wait_for( + run_single_query( + item["query"], skill_name, timeout, + project_root, model, max_turns, + ), + timeout=timeout, + ) + return (item, result) + except asyncio.TimeoutError: + return (item, {"triggered": False, "tool_uses": []}) + except Exception: + return (item, {"triggered": False, "tool_uses": []}) + + tasks = [_run(item) for item in eval_set for _ in range(runs_per_query)] + return await asyncio.gather(*tasks) + + +def run_eval_set( + eval_set: list[dict], + skill_name: str, + description: str, + num_workers: int, + timeout: int, + project_root: str, + runs_per_query: int = 1, + trigger_threshold: float = 0.5, + model: str | None = None, + max_turns: int = DEFAULT_MAX_TURNS, + verbose: bool = True, +) -> dict: + """Run the full eval set. Returns results dict with per-query details.""" + start_time = time.monotonic() + + raw_results = asyncio.run( + run_eval_set_async( + eval_set=eval_set, + skill_name=skill_name, + num_workers=num_workers, + timeout=timeout, + project_root=project_root, + runs_per_query=runs_per_query, + model=model, + max_turns=max_turns, + ) + ) + + duration = time.monotonic() - start_time + + # Aggregate per-query results + query_triggers: dict[str, list[bool]] = {} + query_items: dict[str, dict] = {} + query_tool_uses: dict[str, list[list[str]]] = {} + + for item, result in raw_results: + q = item["query"] + query_items[q] = item + query_triggers.setdefault(q, []).append(result["triggered"]) + query_tool_uses.setdefault(q, []).append(result["tool_uses"]) + + results = [] + for q, triggers in query_triggers.items(): + item = query_items[q] + rate = sum(triggers) / len(triggers) + should = item["should_trigger"] + passed = (rate >= trigger_threshold) if should else (rate < trigger_threshold) + ci_lower, ci_upper = _wilson_interval(sum(triggers), len(triggers)) + + results.append({ + "query": q, + "should_trigger": should, + "trigger_rate": rate, + "triggers": sum(triggers), + "runs": len(triggers), + "pass": passed, + "confidence_interval": [ci_lower, ci_upper], + "tool_uses_seen": query_tool_uses.get(q, []), + }) + + n_pass = sum(1 for r in results if r["pass"]) + total = len(results) + + if verbose: + print(f"Results: {n_pass}/{total} passed", file=sys.stderr) + for r in results: + tag = "PASS" if r["pass"] else "FAIL" + ci = r["confidence_interval"] + print( + f" [{tag}] rate={r['triggers']}/{r['runs']}" + f" CI=[{ci[0]:.2f},{ci[1]:.2f}]" + f" expected={r['should_trigger']}: {r['query'][:70]}", + file=sys.stderr, + ) + + return { + "skill_name": skill_name, + "description": description, + "results": results, + "summary": {"total": total, "passed": n_pass, "failed": total - n_pass}, + "duration_seconds": round(duration, 1), + } diff --git a/hooks/hooks.json b/hooks/hooks.json index 4a019be..9316a6b 100644 --- a/hooks/hooks.json +++ b/hooks/hooks.json @@ -40,6 +40,7 @@ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/pulse-counter.mjs", "timeout": 5, + "async": true, "statusMessage": "devpace pulse counter" }, { diff --git a/hooks/intent-detect.mjs b/hooks/intent-detect.mjs index 143fff9..2b7703e 100755 --- a/hooks/intent-detect.mjs +++ b/hooks/intent-detect.mjs @@ -26,6 +26,9 @@ const userPrompt = input?.content ?? ''; // Change management trigger words // Synced with skills/pace-change/SKILL.md description (authority source) // Categories: add / pause / resume / reprioritize / modify + English variants +// Technical context words — if the prompt is about code/git operations, skip change detection +const techContextPattern = /注释|缩进|格式化|配置文件|代码风格|git\s|stash|commit|branch|merge|rebase|checkout/; + const triggerPattern = new RegExp([ // --- add --- '加一个', '加需求', '新增需求', '插入', '还需要', '补一个', '追加', @@ -33,7 +36,7 @@ const triggerPattern = new RegExp([ '不做了', '先不搞', '砍掉', '搁置', '放一放', '暂停', '停掉', '先放着', '不要这个功能了', '延后', // --- resume --- - '恢复之前', '恢复', '重新开始', '捡回来', '继续之前', + '恢复之前', '重新开始', '捡回来', '继续之前', // --- reprioritize --- '优先级', '先做这个', '提前', '排到前面', '优先', '调个顺序', // --- modify --- @@ -44,8 +47,8 @@ const triggerPattern = new RegExp([ '\\bdefer\\b', '\\bshelve\\b', '\\breprioritize\\b', ].join('|')); -if (triggerPattern.test(userPrompt)) { - console.log('devpace:change-detected Change intent detected in user prompt. Follow devpace-rules.md §9 change management workflow: classify → impact analysis → confirmation → execute.'); +if (triggerPattern.test(userPrompt) && !techContextPattern.test(userPrompt)) { + console.log('devpace:change-detected 用户输入包含变更意图。ACTION: 调用 /pace-change 处理,不要直接修改 CR/PF。/pace-change 会自动执行分类、影响分析、确认、执行流程。'); } process.exit(0); diff --git a/hooks/lib/utils.mjs b/hooks/lib/utils.mjs index 375a9de..387bdde 100644 --- a/hooks/lib/utils.mjs +++ b/hooks/lib/utils.mjs @@ -7,6 +7,21 @@ import { readFileSync, writeFileSync, mkdirSync } from 'node:fs'; import { createInterface } from 'node:readline'; import { dirname } from 'node:path'; +/** + * Canonical CR state values. + * All .mjs hooks should import and use these constants instead of raw strings. + * Shell hooks (session-end.sh, pre-compact.sh) use grep equivalents — keep in sync. + */ +export const CR_STATES = Object.freeze({ + CREATED: 'created', + DEVELOPING: 'developing', + VERIFYING: 'verifying', + IN_REVIEW: 'in_review', + APPROVED: 'approved', + MERGED: 'merged', + PAUSED: 'paused', +}); + /** * Read stdin and JSON.parse it. Returns {} on any failure. */ @@ -55,11 +70,15 @@ export function isCrFile(filePath, backlogDir) { * Read a CR markdown file and return the value of the "状态" field. * CR format: "- **状态**:" (Markdown bold + Chinese/ASCII colon) * Returns empty string if not found or file unreadable. + * + * Shell equivalents (keep in sync if format changes): + * session-end.sh: grep + sed to extract state from "- **状态**:" line + * pre-compact.sh: grep for active state keywords in state.md */ -export function readCrState(filePath) { +export function readCrState(filePath, content) { try { - const content = readFileSync(filePath, 'utf-8'); - const match = content.match(/^- \*\*状态\*\*[::]\s*(.+)$/m); + const text = content ?? readFileSync(filePath, 'utf-8'); + const match = text.match(/^- \*\*状态\*\*[::]\s*(.+)$/m); return match ? match[1].trim() : ''; } catch { return ''; @@ -80,14 +99,11 @@ export function isDevpaceFile(filePath) { * we consider the session in advance mode. Otherwise, it's explore mode (default). * Returns true if advance mode, false if explore mode. */ -export function isAdvanceMode(projectDir) { +export function isAdvanceMode(projectDir, content) { try { - const statePath = `${projectDir}/.devpace/state.md`; - const content = readFileSync(statePath, 'utf-8'); - // Look for "进行中" indicator in the "当前工作" section - return /\*\*进行中\*\*/.test(content); + const text = content ?? readFileSync(`${projectDir}/.devpace/state.md`, 'utf-8'); + return /\*\*进行中\*\*/.test(text); } catch { - // state.md doesn't exist or unreadable → not initialized, not advance mode return false; } } @@ -112,16 +128,27 @@ export function isStateChangeToApproved(content) { return /\*\*状态\*\*[::]\s*approved/.test(content); } +/** + * Check if write content sets CR state to an advance-mode-only value. + * These states (developing, verifying, in_review) represent active progress + * and should not be set in explore mode — use advance mode via /pace-dev. + * States like created and paused are allowed in explore mode (pace-change needs them). + */ +export function isStateEscalation(content) { + if (!content) return false; + return /\*\*状态\*\*[::]\s*(developing|verifying|in_review)/.test(content); +} + /** * Read the last event from a CR file's event table. * Parses the structured event format: | timestamp | event_type | actor | note | handoff | * @param {string} crFilePath - CR file path * @returns {{ ts: string, type: string, actor: string, note: string } | null} */ -export function getLastEvent(crFilePath) { +export function getLastEvent(crFilePath, content) { try { - const content = readFileSync(crFilePath, 'utf-8'); - const lines = content.split('\n'); + const text = content ?? readFileSync(crFilePath, 'utf-8'); + const lines = text.split('\n'); let inEventTable = false; let lastDataLine = null; @@ -174,10 +201,10 @@ export function readSyncStateCache(projectDir) { * Update a single entry in the sync state cache. * Creates the cache file and .devpace/ directory if needed. */ -export function updateSyncStateCache(projectDir, crName, newState) { +export function updateSyncStateCache(projectDir, crName, newState, existingCache) { try { const cachePath = `${projectDir}/.devpace/.sync-state-cache`; - const cache = readSyncStateCache(projectDir); + const cache = existingCache ?? readSyncStateCache(projectDir); cache.set(crName, newState); const lines = []; for (const [name, state] of cache) { diff --git a/hooks/post-cr-update.mjs b/hooks/post-cr-update.mjs index afc17b1..4812fcf 100755 --- a/hooks/post-cr-update.mjs +++ b/hooks/post-cr-update.mjs @@ -3,15 +3,15 @@ * devpace PostToolUse hook — detect CR merged state and trigger knowledge pipeline * * Purpose: After a Write/Edit to a CR file, check if the CR transitioned to 'merged'. - * If so, output a reminder for Claude to trigger the post-merge pipeline (§11 aligned): - * 7-step pipeline for merged CR processing, with conditional step 7 for external sync. + * If so, output a signal reference for Claude to execute the §11 post-merge pipeline. + * Also detects gate failures and rejections as learning triggers. * * This is an advisory hook (exit 0), not blocking. */ -import { readFileSync, writeFileSync, existsSync, mkdirSync } from 'node:fs'; -import { basename, dirname } from 'node:path'; -import { readStdinJson, getProjectDir, extractFilePath, isCrFile, readCrState, getLastEvent } from './lib/utils.mjs'; +import { existsSync, readFileSync } from 'node:fs'; +import { basename } from 'node:path'; +import { readStdinJson, getProjectDir, extractFilePath, isCrFile, readCrState, getLastEvent, CR_STATES } from './lib/utils.mjs'; const input = await readStdinJson(); const projectDir = getProjectDir(); @@ -32,67 +32,30 @@ if (!isCrFile(filePath, backlogDir)) { // Check CR state and recent events for learning triggers if (existsSync(filePath)) { - const currentState = readCrState(filePath); - const crName = basename(filePath, '.md'); - - if (currentState === 'merged') { - // Build pipeline message — steps 1-6 always present (§11 aligned) - const steps = [ - '1) Cascading updates (PF + project.md + state.md + iterations + Release)', - '2) pace-learn knowledge extraction', - '3) dashboard.md incremental metrics', - '4) PF completion → release note', - '5) Iteration completion check (>90% → suggest retro)', - '6) First-CR review (teaching dedup)', - ]; - - // Step 7: conditional — only if sync-mapping exists and CR has external link - const syncMappingPath = `${projectDir}/.devpace/integrations/sync-mapping.md`; - if (existsSync(syncMappingPath)) { - try { - const content = readFileSync(filePath, 'utf-8'); - const hasExternalLink = /\*\*外部关联\*\*[::]/.test(content); - if (hasExternalLink) { - steps.push(`7) External sync push: auto-execute /pace-sync push ${crName}`); - } - } catch { - // Read error — skip step 7 - } - } + let content; + try { + content = readFileSync(filePath, 'utf-8'); + } catch { + process.exit(0); + } - console.log(`devpace:post-merge ${crName} merged. Execute post-merge pipeline: ${steps.join(' ')}`); + const currentState = readCrState(filePath, content); + const crName = basename(filePath, '.md'); - // Write .learn-pending flag for session-start reminder - try { - const pendingPath = `${projectDir}/.devpace/.learn-pending`; - mkdirSync(dirname(pendingPath), { recursive: true }); - const existing = existsSync(pendingPath) ? readFileSync(pendingPath, 'utf-8').trim() : ''; - const entry = `${crName} ${new Date().toISOString()}`; - const newContent = existing ? `${existing}\n${entry}` : entry; - writeFileSync(pendingPath, newContent + '\n', 'utf-8'); - } catch { - // Non-critical — learn-pending write failure doesn't block pipeline - } + if (currentState === CR_STATES.MERGED) { + console.log(`devpace:post-merge ${crName} 已 merge。ACTION: 执行 post-merge 管线——1) 更新 state.md 移除该 CR 的进行中标记 2) 若 CR 有外部关联则执行 /pace-sync push ${crName} 关闭 Issue 3) 执行 /pace-learn merge ${crName} 萃取经验。`); } // Gate fail learning trigger — gate_fail is a valuable learning opportunity - const recentEvent = getLastEvent(filePath); + const recentEvent = getLastEvent(filePath, content); if (recentEvent && (recentEvent.type === 'gate1_fail' || recentEvent.type === 'gate2_fail')) { - const gateNum = recentEvent.type.includes('1') ? '1' : '2'; - console.log([ - `devpace:learn-trigger ${crName} Gate ${gateNum} 失败是学习机会。`, - ` 建议: 调用 /pace-learn 提取 Gate ${gateNum} 失败原因`, - ` 关注: 失败的检查项是否应该调整阈值,或 Claude 有可避免的盲区`, - ].join('\n')); + const gateNum = recentEvent.type === 'gate1_fail' ? '1' : '2'; + console.log(`devpace:learn-trigger ${crName} Gate ${gateNum} 未通过。ACTION: 先修复 Gate 失败原因并重试;Gate 通过后执行 /pace-learn gate-failure ${crName} 萃取教训。`); } // Rejected learning trigger — human rejection reveals understanding gaps if (recentEvent && recentEvent.type === 'rejected') { - console.log([ - `devpace:learn-trigger ${crName} 人类打回是理解差距的信号。`, - ` 建议: 调用 /pace-learn 分析打回原因`, - ` 关注: Claude 的意图理解是否与人类预期一致`, - ].join('\n')); + console.log(`devpace:learn-trigger ${crName} 被人类驳回。ACTION: 查看 CR 事件表最新 rejected 记录确认驳回原因,修复后重新提交 review;完成后执行 /pace-learn rejection ${crName} 分析认知差距。`); } } diff --git a/hooks/post-schema-check.mjs b/hooks/post-schema-check.mjs index 4af12c6..95e59b1 100755 --- a/hooks/post-schema-check.mjs +++ b/hooks/post-schema-check.mjs @@ -11,7 +11,8 @@ */ import { existsSync } from 'node:fs'; -import { basename, join } from 'node:path'; +import { basename, dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; import { execFileSync } from 'node:child_process'; import { readStdinJson, getProjectDir, extractFilePath, isDevpaceFile } from './lib/utils.mjs'; @@ -46,7 +47,7 @@ const devpaceDir = devpaceMatch[1]; // Run validation try { - const scriptDir = new URL('.', import.meta.url).pathname; + const scriptDir = dirname(fileURLToPath(import.meta.url)); const scriptPath = join(scriptDir, '..', 'scripts', 'validate-schema.mjs'); if (!existsSync(scriptPath)) { @@ -69,7 +70,9 @@ try { if (!result.valid) { const r = result.results[0]; const issues = [...r.errors.map(e => `error: ${e}`), ...r.warnings.map(w => `warning: ${w}`)]; - console.log(`devpace:schema-check ${name} 校验发现 ${r.errors.length} 个错误、${r.warnings.length} 个警告:${issues.slice(0, 3).join('; ')}${issues.length > 3 ? ` (+${issues.length - 3} more)` : ''}`); + const schemaMap = { 'state.md': 'state-format', 'project.md': 'project-format' }; + const schemaName = schemaMap[name] || (name.startsWith('CR-') ? 'cr-format' : name.startsWith('PF-') ? 'pf-format' : name.startsWith('BR-') ? 'br-format' : 'unknown'); + console.log(`devpace:schema-check ${name} 校验不通过(${r.errors.length} error, ${r.warnings.length} warning):${issues.slice(0, 3).join('; ')}${issues.length > 3 ? ` (+${issues.length - 3} more)` : ''}. ACTION: 重新读取 ${name},按上述错误逐一修复,修复后重新写入触发再次校验。格式参考:knowledge/_schema/${schemaName}-format.md。`); } } catch { // Validation failure is non-critical — skip silently diff --git a/hooks/post-tool-failure.mjs b/hooks/post-tool-failure.mjs index 9f29371..a75ce48 100755 --- a/hooks/post-tool-failure.mjs +++ b/hooks/post-tool-failure.mjs @@ -25,9 +25,9 @@ const filePath = extractFilePath(input); // Check if the failed write was targeting a CR file if (isCrFile(filePath, backlogDir)) { - console.log('devpace:tool-failure Write/Edit to CR file failed. Check CR state consistency: 1) Verify CR status field matches last successful state 2) Check if event table needs rollback entry 3) Consider git stash or revert if partial write occurred.'); + console.log(`devpace:tool-failure CR 文件写入失败。ACTION: 1) 读取 CR 文件确认状态字段是否仍为上次成功值 2) 若状态不一致则在事件表补记 write_failed 条目 3) 执行 git diff ${filePath} 检查部分写入,必要时 git checkout -- ${filePath} 恢复。`); } else if (filePath && filePath.includes('.devpace/')) { - console.log('devpace:tool-failure Write/Edit to .devpace/ file failed. Verify state.md is still consistent with current progress.'); + console.log(`devpace:tool-failure .devpace/ 文件写入失败。ACTION: 读取 state.md 确认与当前进度一致;若不一致则执行 git checkout -- ${filePath} 恢复后重试写入。`); } process.exit(0); diff --git a/hooks/pre-compact.sh b/hooks/pre-compact.sh index a5989c0..fa18e23 100755 --- a/hooks/pre-compact.sh +++ b/hooks/pre-compact.sh @@ -39,6 +39,7 @@ if [ -d "${DEVPACE_DIR}/backlog" ]; then ACTIVE_CRS=$(grep -rl "developing\|verifying\|in_review" "${DEVPACE_DIR}/backlog/" 2>/dev/null | head -3) if [ -n "$ACTIVE_CRS" ]; then for cr in $ACTIVE_CRS; do + [ ! -f "$cr" ] && continue CR_NAME=$(basename "$cr" .md) CR_STATUS=$(grep -m1 "状态" "$cr" 2>/dev/null | head -1) echo "devpace:pre-compact Active CR: $CR_NAME — $CR_STATUS" diff --git a/hooks/pre-tool-use.mjs b/hooks/pre-tool-use.mjs index f00cb78..258287e 100755 --- a/hooks/pre-tool-use.mjs +++ b/hooks/pre-tool-use.mjs @@ -5,18 +5,21 @@ * Purpose: Enforce devpace iron rules at the mechanism level, not just text-based rules. * * Enforcement levels: - * 1. BLOCKING (exit 2): Explore mode writes to .devpace/, Gate 3 bypass attempts + * 1. BLOCKING (exit 2): Explore mode state escalation, Gate 3 bypass attempts * 2. ADVISORY (exit 0): Gate 1/2 reminders during normal development flow * * Iron rules enforced: - * - Explore mode: no writes to .devpace/ (devpace-rules.md §2) + * - Explore mode: block state.md writes and CR state escalation to advance-mode + * states (developing/verifying/in_review). Allow other .devpace/ writes so + * management Skills (pace-change, pace-biz, pace-plan) can operate. * - Gate 3: human approval required, no automated state change to approved (devpace-rules.md §2) */ import { existsSync } from 'node:fs'; import { readStdinJson, getProjectDir, extractFilePath, extractWriteContent, - isCrFile, readCrState, isDevpaceFile, isAdvanceMode, isStateChangeToApproved + isCrFile, readCrState, isDevpaceFile, isAdvanceMode, isStateChangeToApproved, + isStateEscalation, CR_STATES } from './lib/utils.mjs'; const input = await readStdinJson(); @@ -31,44 +34,46 @@ if (!existsSync(backlogDir)) { const filePath = extractFilePath(input); // ── ENFORCEMENT 1: Explore mode protection ────────────────────────── -// Iron rule: explore mode must not write to .devpace/ files +// Narrowed scope: only block high-risk state operations in explore mode. +// Management Skills (pace-change, pace-biz, pace-plan) need to write to +// .devpace/ files even without an active CR, so we only block: +// 1. state.md direct modification — progress state shouldn't change in explore mode +// 2. CR state escalation — setting developing/verifying/in_review requires advance mode if (isDevpaceFile(filePath) && !isAdvanceMode(projectDir)) { - // Allow writes to .devpace/rules/ (configuration, not state) - // Allow writes to .devpace/context.md (tech convention tracking) - const isConfigFile = filePath.includes('.devpace/rules/') || filePath.includes('.devpace/context.md'); - if (!isConfigFile) { - console.error('devpace:blocked 探索模式下不允许修改 .devpace/ 状态文件。请先进入推进模式(说"帮我实现/修改 X")再修改。'); + const isStateMd = filePath.endsWith('/state.md') || filePath.endsWith('/.devpace/state.md'); + const isCrStateEsc = isCrFile(filePath, backlogDir) + && isStateEscalation(extractWriteContent(input)); + + if (isStateMd || isCrStateEsc) { + console.error('devpace:blocked 探索模式禁止修改进度状态(state.md 或 CR 状态升级)。ACTION: 告知用户需要先进入推进模式,引导用户说"帮我实现 X"或"开始做 CR-NNN"以激活 /pace-dev。'); process.exit(2); } } -// ── ENFORCEMENT 2: Gate 3 — human approval required ───────────────── -// Iron rule: in_review → approved transition requires explicit human approval +// ── ENFORCEMENT 2 + ADVISORY: Gate checks ─────────────────────────── +// Single read for both Gate 3 enforcement and advisory reminders if (isCrFile(filePath, backlogDir) && existsSync(filePath)) { const currentState = readCrState(filePath); - if (currentState === 'in_review') { + // Gate 3: human approval required — in_review → approved blocked + if (currentState === CR_STATES.IN_REVIEW) { const newContent = extractWriteContent(input); if (isStateChangeToApproved(newContent)) { - console.error('devpace:blocked Gate 3 要求人类审批。不允许自动将 CR 状态从 in_review 变更为 approved。请等待用户明确批准。'); + console.error('devpace:blocked Gate 3 铁律:CR 从 in_review→approved 必须由人类明确批准。ACTION: 向用户展示 review 摘要(diff 概要+验收标准对比),然后询问是否批准该变更,等待用户回复批准/approved后再修改状态。'); process.exit(2); } } -} - -// ── ADVISORY: Quality gate reminders (existing behavior) ──────────── -if (isCrFile(filePath, backlogDir) && existsSync(filePath)) { - const currentState = readCrState(filePath); + // Advisory: Quality gate reminders switch (currentState) { - case 'developing': - console.log("devpace:gate-reminder CR is in 'developing'. Gate 1 (code quality: lint+test+typecheck) must pass before advancing to 'verifying'."); + case CR_STATES.DEVELOPING: + console.log("devpace:gate-reminder CR 状态 developing。推进到 verifying 前须通过 Gate 1。ACTION: 执行 lint+test+typecheck,全部通过后在 CR 事件表记录 gate1_pass,再将状态改为 verifying。"); break; - case 'verifying': - console.log("devpace:gate-reminder CR is in 'verifying'. Gate 2 (integration test + intent consistency) must pass before advancing to 'in_review'."); + case CR_STATES.VERIFYING: + console.log("devpace:gate-reminder CR 状态 verifying。推进到 in_review 前须通过 Gate 2。ACTION: 执行集成测试+意图一致性检查(对比 CR 验收标准与实际实现),通过后在事件表记录 gate2_pass,再将状态改为 in_review。"); break; - case 'in_review': - console.log("devpace:gate-reminder CR is in 'in_review'. Gate 3 requires human approval. Do not advance without explicit user approval."); + case CR_STATES.IN_REVIEW: + console.log("devpace:gate-reminder CR 状态 in_review。Gate 3 须人类批准。ACTION: 向用户展示变更摘要(diff 概要+验收标准对比),等待用户明确说'批准'。"); break; } } diff --git a/hooks/pulse-counter.mjs b/hooks/pulse-counter.mjs index c824873..f65a1b4 100755 --- a/hooks/pulse-counter.mjs +++ b/hooks/pulse-counter.mjs @@ -53,9 +53,10 @@ try { } // --- Stuck detection: same CR written 5+ times without state change --- +// Throttled to every 3rd write to reduce per-write I/O (reads JSON + CR file + writes JSON). const filePath = extractFilePath(input); const backlogDir = `${devpaceDir}/backlog`; -if (isCrFile(filePath, backlogDir)) { +if (isCrFile(filePath, backlogDir) && count % 3 === 0) { const crWritesPath = `${devpaceDir}/.pulse-cr-writes`; let writes = {}; try { writes = JSON.parse(readFileSync(crWritesPath, 'utf-8')); } catch { /* start fresh */ } @@ -69,10 +70,21 @@ if (isCrFile(filePath, backlogDir)) { writes[crName].count++; } + // Prune stale entries — keep only CRs still in backlog (max 20 as safety cap) + const keys = Object.keys(writes); + if (keys.length > 20) { + for (const k of keys) { + if (!existsSync(`${backlogDir}/${k}.md`)) { + delete writes[k]; + } + } + } + try { writeFileSync(crWritesPath, JSON.stringify(writes), 'utf-8'); } catch { /* silent */ } if (writes[crName].count >= 5) { console.log(`devpace:stuck-warning ${crName} 已被写入 ${writes[crName].count} 次但状态仍为 ${currentState},建议检查是否在空转。考虑: /pace-status 查看全局状态。`); + console.log(`devpace:struggle-signal ${crName} 重复写入可能指示环境缺陷(Skill/procedure/Schema 不足)。CR merged 后 /pace-learn 将自动提取改进建议。`); } } diff --git a/hooks/session-start.sh b/hooks/session-start.sh index e6b2c0e..1fa8b88 100755 --- a/hooks/session-start.sh +++ b/hooks/session-start.sh @@ -6,13 +6,6 @@ STATE_FILE="${PROJECT_DIR}/.devpace/state.md" if [ -f "$STATE_FILE" ]; then echo "devpace:session-start Active project detected. Read .devpace/state.md for details." - - # Check for pending learn extraction - LEARN_PENDING="${PROJECT_DIR}/.devpace/.learn-pending" - if [ -f "$LEARN_PENDING" ]; then - PENDING_COUNT=$(wc -l < "$LEARN_PENDING" | tr -d ' ') - echo "devpace:learn-pending ${PENDING_COUNT} 个 CR 已 merged 但经验尚未提取。建议执行 /pace-learn 提取经验。" - fi else echo "devpace:session-start No active devpace project." fi diff --git a/hooks/pace-dev-scope-check.mjs b/hooks/skill/pace-dev-scope-check.mjs similarity index 86% rename from hooks/pace-dev-scope-check.mjs rename to hooks/skill/pace-dev-scope-check.mjs index 89c991e..c2056bb 100755 --- a/hooks/pace-dev-scope-check.mjs +++ b/hooks/skill/pace-dev-scope-check.mjs @@ -3,23 +3,23 @@ * devpace pace-dev scope check — fast command Hook replacing LLM prompt Hook * * Replaces the slow prompt-type Hook (~15s LLM per call) with a fast - * programmatic check (~5ms) for scope validation and Gate 3 enforcement. + * programmatic check (~5ms) for scope validation during /pace-dev. * * Checks: - * 1. Gate 3: Block automated state change to approved (defense-in-depth) - * 2. Scope validation: Is target file within the active CR's scope? - * 3. Scope drift warning: Advisory for out-of-scope writes + * 1. CR file writes: always in scope during development (Gate 3 delegated to global hook) + * 2. .devpace/ management files: always in scope + * 3. Scope validation: Is target file within the active CR's scope? + * 4. Scope drift warning: Advisory for out-of-scope writes * * Exit codes: * 0 = allow (in scope or advisory warning) - * 2 = block (Gate 3 violation) */ import { readFileSync, existsSync } from 'node:fs'; import { - readStdinJson, getProjectDir, extractFilePath, extractWriteContent, - isCrFile, isStateChangeToApproved, isDevpaceFile -} from './lib/utils.mjs'; + readStdinJson, getProjectDir, extractFilePath, + isCrFile, isDevpaceFile +} from '../lib/utils.mjs'; const input = await readStdinJson(); const projectDir = getProjectDir(); @@ -35,14 +35,9 @@ if (!filePath) { process.exit(0); } -// ── CHECK 1: Gate 3 — block automated approved state change ────── +// ── CHECK 1: CR file writes — always in scope during development ── +// Gate 3 (approved blocking) is enforced by global pre-tool-use.mjs — no duplication needed. if (isCrFile(filePath, backlogDir)) { - const newContent = extractWriteContent(input); - if (isStateChangeToApproved(newContent)) { - console.error('devpace:blocked Gate 3 要求人类审批。不允许自动将 CR 状态变更为 approved。'); - process.exit(2); - } - // CR file writes are always in scope during development process.exit(0); } diff --git a/hooks/pace-init-scope-check.mjs b/hooks/skill/pace-init-scope-check.mjs similarity index 90% rename from hooks/pace-init-scope-check.mjs rename to hooks/skill/pace-init-scope-check.mjs index f4119bd..faa9209 100755 --- a/hooks/pace-init-scope-check.mjs +++ b/hooks/skill/pace-init-scope-check.mjs @@ -15,7 +15,7 @@ * 2 = block (target is outside allowed scope) */ -import { readStdinJson, getProjectDir, extractFilePath } from './lib/utils.mjs'; +import { readStdinJson, getProjectDir, extractFilePath, isDevpaceFile } from '../lib/utils.mjs'; const input = await readStdinJson(); const projectDir = getProjectDir(); @@ -35,7 +35,7 @@ const absPath = filePath.startsWith('/') const projRoot = projectDir.endsWith('/') ? projectDir.slice(0, -1) : projectDir; // Check 1: .devpace/ directory — any file underneath -if (absPath.includes('/.devpace/') || absPath.includes('.devpace/')) { +if (isDevpaceFile(absPath)) { process.exit(0); } diff --git a/hooks/pace-review-scope-check.mjs b/hooks/skill/pace-review-scope-check.mjs similarity index 98% rename from hooks/pace-review-scope-check.mjs rename to hooks/skill/pace-review-scope-check.mjs index 6639110..83e275a 100755 --- a/hooks/pace-review-scope-check.mjs +++ b/hooks/skill/pace-review-scope-check.mjs @@ -25,7 +25,7 @@ import { readStdinJson, getProjectDir, extractFilePath, isDevpaceFile -} from './lib/utils.mjs'; +} from '../lib/utils.mjs'; const input = await readStdinJson(); const projectDir = getProjectDir(); diff --git a/hooks/subagent-stop.mjs b/hooks/subagent-stop.mjs index d0f3143..c74b344 100755 --- a/hooks/subagent-stop.mjs +++ b/hooks/subagent-stop.mjs @@ -9,10 +9,9 @@ * This is an advisory hook (exit 0) — outputs warnings for the main session to handle. */ -import { existsSync, readdirSync } from 'node:fs'; -import { readStdinJson, getProjectDir, readCrState } from './lib/utils.mjs'; -import { readFileSync } from 'node:fs'; +import { existsSync, readdirSync, readFileSync } from 'node:fs'; import { join } from 'node:path'; +import { readStdinJson, getProjectDir, readCrState, isAdvanceMode, CR_STATES } from './lib/utils.mjs'; const input = await readStdinJson(); const projectDir = getProjectDir(); @@ -47,52 +46,49 @@ try { process.exit(0); } -const hasActiveWork = /\*\*进行中\*\*/.test(stateContent); +const hasActiveWork = isAdvanceMode(projectDir, stateContent); // Check 2: Scan backlog for CR state consistency try { const crFiles = readdirSync(backlogDir).filter(f => f.startsWith('CR-') && f.endsWith('.md')); + const crStates = new Map(); // crFile → state, reused by Check 3 for (const crFile of crFiles) { const crPath = join(backlogDir, crFile); - const crState = readCrState(crPath); - - // Inconsistency: state.md says "进行中" but CR is not in an active state - if (hasActiveWork && crState === 'created') { - // Not necessarily inconsistent — there might be other developing CRs - continue; - } - - // Inconsistency: CR claims 'verifying' but Gate 1 checks not recorded - if (crState === 'verifying' && agentName === 'pace-engineer') { - // Check if the CR has Gate 1 evidence in events - try { - const crContent = readFileSync(crPath, 'utf-8'); - const hasGate1Event = /Gate 1/.test(crContent); - if (!hasGate1Event) { - warnings.push(`${crFile}: 状态为 verifying 但无 Gate 1 检查记录,可能是 pace-engineer 中断导致`); - } - } catch { /* skip unreadable */ } + let crContent; + try { + crContent = readFileSync(crPath, 'utf-8'); + } catch { continue; } + + const crState = readCrState(crPath, crContent); + crStates.set(crFile, crState); + + // Only pace-engineer transitions CR state through developing→verifying→in_review. + // pace-pm and pace-analyst are included in DEVPACE_AGENTS for Check 3 (state.md + // consistency) but do not trigger Gate-specific warnings. + if (crState === CR_STATES.VERIFYING && agentName === 'pace-engineer') { + if (!/Gate 1/.test(crContent)) { + warnings.push(`${crFile}: 状态为 verifying 但无 Gate 1 检查记录,可能是 pace-engineer 中断导致`); + } } // Inconsistency: CR claims 'in_review' but no review summary - if (crState === 'in_review' && agentName === 'pace-engineer') { - try { - const crContent = readFileSync(crPath, 'utf-8'); - const hasGate2Event = /Gate 2/.test(crContent); - if (!hasGate2Event) { - warnings.push(`${crFile}: 状态为 in_review 但无 Gate 2 检查记录,可能是 pace-engineer 中断导致`); - } - } catch { /* skip unreadable */ } + if (crState === CR_STATES.IN_REVIEW && agentName === 'pace-engineer') { + if (!/Gate 2/.test(crContent)) { + warnings.push(`${crFile}: 状态为 in_review 但无 Gate 2 检查记录,可能是 pace-engineer 中断导致`); + } } } // Check 3: state.md says no active work but there are developing CRs + // Reuse crStates map instead of re-scanning backlog if (!hasActiveWork) { - const developingCrs = crFiles.filter(f => { - const state = readCrState(join(backlogDir, f)); - return state === 'developing' || state === 'verifying'; - }); + const developingCrs = []; + for (const [file, state] of crStates) { + if (state === CR_STATES.DEVELOPING || state === CR_STATES.VERIFYING) { + developingCrs.push(file); + } + } if (developingCrs.length > 0) { warnings.push(`state.md 无进行中工作,但 ${developingCrs.join(', ')} 仍在活跃状态,建议更新 state.md`); } diff --git a/hooks/sync-push.mjs b/hooks/sync-push.mjs index ca977e3..c08c442 100755 --- a/hooks/sync-push.mjs +++ b/hooks/sync-push.mjs @@ -6,7 +6,7 @@ * so ordinary edits that don't change state are silently ignored. * * - State unchanged → silent exit (no noise) - * - State changed to merged → directive language (auto-execute) + * - State changed to merged → advisory suggestion (suggest sync push) * - State changed to other value → advisory suggestion * * Advisory only (exit 0) — never blocks workflow. @@ -22,7 +22,7 @@ import { readFileSync, existsSync } from 'node:fs'; import { basename } from 'node:path'; import { readStdinJson, getProjectDir, isCrFile, extractFilePath, readCrState, - readSyncStateCache, updateSyncStateCache, + readSyncStateCache, updateSyncStateCache, CR_STATES, } from './lib/utils.mjs'; const input = await readStdinJson(); @@ -48,8 +48,15 @@ if (!existsSync(syncMappingPath)) { process.exit(0); } -// Read current CR state -const newState = readCrState(filePath); +// Read CR file once for state + external link detection +let crContent; +try { + crContent = readFileSync(filePath, 'utf-8'); +} catch { + process.exit(0); +} + +const newState = readCrState(filePath, crContent); if (!newState) { process.exit(0); } @@ -64,31 +71,26 @@ if (oldState === newState) { process.exit(0); } -// State actually changed — update cache first -updateSyncStateCache(projectDir, crName, newState); +// State actually changed — update cache (pass existing cache to avoid re-read) +updateSyncStateCache(projectDir, crName, newState, cache); -// Check if CR has external link -try { - const content = readFileSync(filePath, 'utf-8'); - const hasExternalLink = /\*\*外部关联\*\*[::]/.test(content); - - if (!hasExternalLink) { - process.exit(0); - } - - // Extract external link info for the reminder - const linkMatch = content.match(/\*\*外部关联\*\*[::]\s*\[([^\]]+)\]\(([^)]+)\)/); - const linkText = linkMatch ? linkMatch[1] : '外部实体'; - - if (newState === 'merged') { - // Directive language for merged — §11 step 7 close-loop - console.log(`devpace:sync-push ${crName} state transition: ${oldState || '(new)'}→merged, linked to ${linkText}. Auto-execute: /pace-sync push ${crName} (§11 step 7 — close Issue + done label + completion summary)`); - } else { - // Advisory suggestion for other transitions - console.log(`devpace:sync-push ${crName} state transition: ${oldState || '(new)'}→${newState}, linked to ${linkText}. Consider running /pace-sync push to sync status.`); - } -} catch { - // File read error — silent exit +// Check if CR has external link (use already-read content) +const hasExternalLink = /\*\*外部关联\*\*[::]/.test(crContent); + +if (!hasExternalLink) { + process.exit(0); +} + +// Extract external link info for the reminder +const linkMatch = crContent.match(/\*\*外部关联\*\*[::]\s*\[([^\]]+)\]\(([^)]+)\)/); +const linkText = linkMatch ? linkMatch[1] : '外部实体'; + +if (newState === CR_STATES.MERGED) { + // Advisory language for merged — §11 step 7 close-loop + console.log(`devpace:sync-push ${crName} state transition: ${oldState || '(new)'}→merged, linked to ${linkText}. Suggest: /pace-sync push ${crName} (§11 step 7 — close Issue + done label + completion summary)`); +} else { + // Advisory suggestion for other transitions + console.log(`devpace:sync-push ${crName} state transition: ${oldState || '(new)'}→${newState}, linked to ${linkText}. Consider running /pace-sync push to sync status.`); } process.exit(0); diff --git a/knowledge/_schema/accept-report-contract.md b/knowledge/_schema/accept-report-contract.md index 10130c4..e1bb8f8 100644 --- a/knowledge/_schema/accept-report-contract.md +++ b/knowledge/_schema/accept-report-contract.md @@ -2,13 +2,13 @@ > **职责**:定义 `/pace-test accept` 验收验证报告的输出格式契约。此文件是 pace-test(生产方)和 pace-review(消费方)之间的共享接口。 > -> **修改此文件时**:必须同时检查生产方(`verify-procedures.md` Step 4)和消费方(`review-procedures-gate.md` accept 消费章节)是否需要适配。 +> **修改此文件时**:必须同时检查生产方(`test-procedures-verify.md` Step 4)和消费方(`review-procedures-gate.md` accept 消费章节)是否需要适配。 ## §0 速查卡片 | 属性 | 值 | |------|-----| -| 生产方 | `/pace-test accept`(verify-procedures.md Step 4) | +| 生产方 | `/pace-test accept`(test-procedures-verify.md Step 4) | | 消费方 | `/pace-review`(review-procedures-gate.md accept 消费章节) | | 写入位置 | CR 文件"验证证据" section | | 触发标题 | `## 验收验证报告` | diff --git a/knowledge/_schema/project-format.md b/knowledge/_schema/project-format.md index 490cbe9..46ecc31 100644 --- a/knowledge/_schema/project-format.md +++ b/knowledge/_schema/project-format.md @@ -90,11 +90,28 @@ Claude 在更新 project.md 时,如果发现是桩状态(含占位文字) - **自主级别**:[辅助 | 标准 | 自主](默认:标准) ``` -| 值 | 含义 | 适用场景 | +| 值 | 一句话含义 | 适用场景 | |---|------|---------| -| 辅助 | Claude 在 Gate 失败时询问而非自动修复 | 新项目/新用户/建立信任阶段 | +| 辅助 | Claude 在关键节点询问而非自动执行 | 新项目/新用户/建立信任阶段 | | 标准(默认) | Claude 自动执行+自修复,Gate 3 人类审批 | 已建立信任/标准开发流程 | -| 自主 | 标准行为 + 简化审批条件放宽 | 高信任/批量操作/熟练用户 | +| 自主 | 标准行为 + 审批放宽 + 通知精简 | 高信任/批量操作/熟练用户 | + +#### 能力边界矩阵(权威定义) + +Claude 根据当前自主级别,按此矩阵判断每个动作的执行方式。 + +| 能力维度 | 辅助 | 标准(默认) | 自主 | +|---------|------|------------|------| +| Gate 1 失败 | 询问用户如何修复 | 自行修复并重试 | 自行修复并重试 | +| Gate 2 失败 | 询问用户确认修复方向 | 自行补充并重试 | 自行补充并重试 | +| Gate 3 审批 | 等待用户 | 等待用户 | 等待用户(IR-2 不可绕过) | +| 简化审批阈值 | 不启用 | ≤3 文件 + 一次通过 | ≤5 文件 + 放宽条件 | +| M 复杂度执行计划 | 生成(多一层引导) | 不生成 | 不生成 | +| 意图检查点确认 | 显式等待用户确认 | 告知后继续 | 告知后继续 | +| sync Issue 创建 | 每次询问 | 前 3 次询问,之后静默 | 自动创建 | +| pace-next 引导 | 始终确认式 | 场景适配 | 连续 3 次后简化为内联 | +| 变更管理确认 | 每步等待确认 | 影响分析后确认 | 低影响变更自动执行 | +| checkpoint 通知 | 每步输出详细摘要 | 1 行进度 | 静默(仅异常时通知) | 规则: - 字段不存在时默认"标准"(向后兼容) diff --git a/knowledge/output-guide.md b/knowledge/output-guide.md index f118ca9..32f70da 100644 --- a/knowledge/output-guide.md +++ b/knowledge/output-guide.md @@ -70,3 +70,20 @@ Claude:"判断依据: - 评估过程(逐项检查 + 规则匹配 + 经验影响) - 决策上下文(读取的信息 + 溯源标记) - 上下文感知的导航建议 + +## 分层输出约定(SSOT) + +子命令输出详细度的统一定义。各 Skill 的 `*-common.md` 引用本节而非各自重复定义。 + +| 层级 | 触发 | 内容 | 适用场景 | +|------|------|------|---------| +| 简要 | `--brief` | 1-3 行核心结论(关键指标、等级、下一步建议) | 自动化消费、快速确认 | +| 标准 | (默认) | 结构化表格 + 汇总 + 建议(各规程定义的输出格式) | 日常使用 | +| 详细 | `--detail` | 完整输出含历史对比、实施指导、扩展分析 | 深度审查、首次使用 | + +**通用规则**: +- 未指定参数时使用"标准"层级(向后兼容) +- `--brief` 输出可被其他 Skill 或 report 子命令程序化消费 +- 各子命令的"详细"内容由各规程文件定义 +- 简要行统一风格:`类型:关键指标1 · 指标2 · 下一步/趋势` +- 各 Skill 可定义自动升级规则(特定上下文自动提升输出层级) diff --git a/knowledge/signal-collection.md b/knowledge/signal-collection.md index 344fc7a..0ecb824 100644 --- a/knowledge/signal-collection.md +++ b/knowledge/signal-collection.md @@ -45,10 +45,10 @@ ### 脚本采集(推荐) -信号采集脚本 `scripts/collect-signals.mjs` 实现全部 24 个信号条件的确定性评估: +信号采集脚本 `skills/pace-next/scripts/collect-signals.mjs` 实现全部 24 个信号条件的确定性评估: ``` -Bash: node ${CLAUDE_PLUGIN_ROOT}/scripts/collect-signals.mjs .devpace [--role <角色>] [--cache] [--cache-read] +Bash: node ${CLAUDE_PLUGIN_ROOT}/skills/pace-next/scripts/collect-signals.mjs .devpace [--role <角色>] [--cache] [--cache-read] ``` - `--cache`:采集后自动写入 `.signal-cache`(JSON 格式) diff --git a/requirements-dev.txt b/requirements-dev.txt index 0ca1ab8..2b9b8bb 100644 --- a/requirements-dev.txt +++ b/requirements-dev.txt @@ -1,3 +1,5 @@ # devpace 开发依赖 pytest>=7.0,<9.0 pyyaml>=6.0,<7.0 +claude-agent-sdk>=0.1.44; python_version>="3.10" +anthropic>=0.50.0; python_version>="3.10" diff --git a/skills/pace-biz/SKILL.md b/skills/pace-biz/SKILL.md index 16ca23e..e3c712f 100644 --- a/skills/pace-biz/SKILL.md +++ b/skills/pace-biz/SKILL.md @@ -1,7 +1,8 @@ --- description: Use when user says "业务机会", "专题", "Epic", "分解需求", "战略对齐", "业务全景", "业务规划", "需求发现", "头脑风暴", "brainstorm", "导入需求", "从文档导入", "代码分析需求", "技术债务盘点", "discover", "import", "infer", "pace-biz", or wants to create opportunities/Epics, decompose requirements, discover/import/infer features. NOT for implementation (/pace-dev), existing item changes (/pace-change), or iteration planning (/pace-plan). -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob, Bash, Grep +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Grep, Bash argument-hint: "[opportunity|epic|decompose|align|view|discover|import|infer] [EPIC-xxx|BR-xxx] <描述|路径>" +model: sonnet context: fork agent: pace-pm --- @@ -102,90 +103,8 @@ $ARGUMENTS: ## 输出 -### 所有子命令的通用输出原则 - - **渐进暴露**:默认输出简洁摘要,`--detail` 展示完整信息 - **操作确认**:写入操作前展示变更预览,用户确认后执行 - **追溯链**:每次创建实体时展示其在价值链中的位置 -### opportunity 输出 - -``` -已捕获业务机会:OPP-xxx — [描述] -来源:[类型]([详情]) -状态:评估中 -→ 下一步:/pace-biz epic OPP-xxx 评估并转化为 Epic -``` - -### epic 输出 - -``` -已创建专题:EPIC-xxx — [名称] -关联:OBJ-x([目标])← OPP-xxx(如有) -MoS:[指标列表] -→ 下一步:/pace-biz decompose EPIC-xxx 分解为业务需求 -``` - -### decompose 输出 - -``` -已分解 [EPIC-xxx|BR-xxx]: -├── BR-001:[名称] P0 -├── BR-002:[名称] P1 -└── BR-003:[名称] P2 -→ 下一步:/pace-change add 补充 PF 或 /pace-dev 开始开发 -``` - -### align 输出 - -``` -战略对齐度报告: -- OBJ 覆盖率:N/M OBJ 有 Epic 覆盖 -- 孤立实体:[列表] -- 对齐建议:[建议] -``` - -### view 输出 - -``` -业务全景: -OPP-001(评估中) -OPP-002 → EPIC-001(进行中) - ├── BR-001 → PF-001 → CR-001 🔄 - └── BR-002 → PF-002(待开始) -OPP-003 → EPIC-002(规划中) -``` - -### discover 输出 - -``` -已从发现会话创建: -- 1 个业务机会(OPP-xxx) -- 1 个专题(EPIC-xxx) -- N 个业务需求(BR-xxx ~ BR-xxx) -- M 个产品功能(PF-xxx ~ PF-xxx) -→ /pace-biz decompose EPIC-xxx 继续细化 -→ /pace-plan next 排入迭代 -``` - -### import 输出 - -``` -导入完成(来自 N 个文件): -- 新增:X 个 BR + Y 个 PF -- 丰富:Z 个已有实体 -- 跳过:W 个重复项 -→ /pace-biz align 检查战略对齐度 -→ /pace-plan next 排入迭代 -``` - -### infer 输出 - -``` -代码库推断完成: -- 新增追踪:X 个产品功能 -- 技术债务:Y 个待处理项 -- 未实现确认:Z 个功能状态已更新 -→ /pace-biz align 检查战略对齐度 -→ /pace-dev 开始处理优先项 -``` +各子命令输出格式模板见 `biz-procedures-output.md`。 diff --git a/skills/pace-biz/biz-procedures-output.md b/skills/pace-biz/biz-procedures-output.md new file mode 100644 index 0000000..5e836ec --- /dev/null +++ b/skills/pace-biz/biz-procedures-output.md @@ -0,0 +1,85 @@ +# 业务规划域输出格式规程 + +> **职责**:定义 /pace-biz 各子命令的输出格式模板。 + +## opportunity 输出 + +``` +已捕获业务机会:OPP-xxx — [描述] +来源:[类型]([详情]) +状态:评估中 +→ 下一步:/pace-biz epic OPP-xxx 评估并转化为 Epic +``` + +## epic 输出 + +``` +已创建专题:EPIC-xxx — [名称] +关联:OBJ-x([目标])← OPP-xxx(如有) +MoS:[指标列表] +→ 下一步:/pace-biz decompose EPIC-xxx 分解为业务需求 +``` + +## decompose 输出 + +``` +已分解 [EPIC-xxx|BR-xxx]: +├── BR-001:[名称] P0 +├── BR-002:[名称] P1 +└── BR-003:[名称] P2 +→ 下一步:/pace-change add 补充 PF 或 /pace-dev 开始开发 +``` + +## align 输出 + +``` +战略对齐度报告: +- OBJ 覆盖率:N/M OBJ 有 Epic 覆盖 +- 孤立实体:[列表] +- 对齐建议:[建议] +``` + +## view 输出 + +``` +业务全景: +OPP-001(评估中) +OPP-002 → EPIC-001(进行中) + ├── BR-001 → PF-001 → CR-001 🔄 + └── BR-002 → PF-002(待开始) +OPP-003 → EPIC-002(规划中) +``` + +## discover 输出 + +``` +已从发现会话创建: +- 1 个业务机会(OPP-xxx) +- 1 个专题(EPIC-xxx) +- N 个业务需求(BR-xxx ~ BR-xxx) +- M 个产品功能(PF-xxx ~ PF-xxx) +→ /pace-biz decompose EPIC-xxx 继续细化 +→ /pace-plan next 排入迭代 +``` + +## import 输出 + +``` +导入完成(来自 N 个文件): +- 新增:X 个 BR + Y 个 PF +- 丰富:Z 个已有实体 +- 跳过:W 个重复项 +→ /pace-biz align 检查战略对齐度 +→ /pace-plan next 排入迭代 +``` + +## infer 输出 + +``` +代码库推断完成: +- 新增追踪:X 个产品功能 +- 技术债务:Y 个待处理项 +- 未实现确认:Z 个功能状态已更新 +→ /pace-biz align 检查战略对齐度 +→ /pace-dev 开始处理优先项 +``` diff --git a/skills/pace-change/SKILL.md b/skills/pace-change/SKILL.md index 29a4fc3..70fd300 100644 --- a/skills/pace-change/SKILL.md +++ b/skills/pace-change/SKILL.md @@ -1,7 +1,8 @@ --- description: Use when user says "不做了", "先不搞", "加一个", "加需求", "改一下", "改需求", "优先级调", "优先级调整", "延后", "提前", "砍掉", "插入", "新增需求", "先做这个", "恢复之前的", "恢复", "搁置", "放一放", "范围变了", "不要这个功能了", "追加", "补一个", "还需要", "改个需求", "需求变了", "停掉", "捡回来", "排到前面", "pace-change", or wants to add, pause, resume, reprioritize, modify, undo, batch change, or query change history. NOT for code implementation (use /pace-dev) or project initialization (use /pace-init). -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob, Bash, Grep +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Grep, Bash argument-hint: "[add|pause|resume|reprioritize|modify|batch|undo|history|apply] [#N|--last|--dry-run] <描述>" +model: sonnet context: fork agent: pace-pm --- diff --git a/skills/pace-dev/SKILL.md b/skills/pace-dev/SKILL.md index 866d247..3bd8350 100644 --- a/skills/pace-dev/SKILL.md +++ b/skills/pace-dev/SKILL.md @@ -1,7 +1,8 @@ --- description: Use when user says "开始做", "帮我改", "实现", "修复", "继续推进", "编码", "写代码", "开发", "重构", "做个", "coding", "implement", "fix", "refactor", "build", /pace-dev, or explicitly requests to start, continue, or resume coding/development work on a feature or bug fix. "帮我改" applies when the target is code, UI, or configuration — not requirements or acceptance criteria. NOT for requirement changes (use /pace-change) or code review (use /pace-review). NOT for running tests (use /pace-test). NOT for user-reported production issues (use /pace-feedback). -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob, Bash +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Bash argument-hint: "[<功能描述>|#|--last]" +model: sonnet context: fork agent: pace-engineer hooks: @@ -10,7 +11,7 @@ hooks: tool_name: "Write|Edit" hooks: - type: command - command: "${CLAUDE_PLUGIN_ROOT}/hooks/pace-dev-scope-check.mjs" + command: "${CLAUDE_PLUGIN_ROOT}/hooks/skill/pace-dev-scope-check.mjs" timeout: 5 --- diff --git a/skills/pace-dev/dev-procedures-gate.md b/skills/pace-dev/dev-procedures-gate.md index a056ef7..ae021de 100644 --- a/skills/pace-dev/dev-procedures-gate.md +++ b/skills/pace-dev/dev-procedures-gate.md @@ -1,6 +1,29 @@ -# Gate 通过反思规程 +# Gate 规程 -> **职责**:CR 处于 verifying / in_review 阶段时的 Gate 通过反思规则。 +> **职责**:CR 处于 verifying / in_review 阶段时的 Gate 执行裁剪和通过反思规则。 + +## Gate 1 智能裁剪(S-CR 专属) + +S 复杂度 CR(≤3 文件、≤1 目录)执行裁剪版 Gate 1,减少低价值检查。原则:小变更的修正成本低于等待成本,全量检查投入产出比不合理。 + +### 裁剪规则 + +1. **命令检查裁剪**:读取 checks.md 每项的 `sensitivity` 字段,对比 `git diff --name-only` 获取的变更文件列表: + - 有 sensitivity 且与变更文件无交集 → 跳过,标记 `⏭️ 裁剪(S-CR 变更范围外)` + - 有 sensitivity 且与变更文件有交集 → 执行 + - 无 sensitivity 字段 → 执行(保守策略) +2. **内置"需求完整性"精简**:CR 意图 section 有标题 + 验收条件即通过,不做深度复杂度-意图匹配分析 +3. **依赖短路不变**:已有的 depends-on 逻辑照常执行,与裁剪叠加生效 + +### 裁剪不适用的场景 + +- CR 复杂度 ≥ M → 全量 Gate 1 +- Gate 2 → 始终全量(意图一致性 + 对抗审查不可裁剪) +- CR 类型为 hotfix → 全量 Gate 1(紧急修复需更严格验证) + +### 输出格式 + +裁剪版 Gate 1 输出时附加裁剪摘要:`Gate 1(裁剪版):N/M 检查执行,K 项裁剪(S-CR 变更范围外)` ## Gate 通过反思 diff --git a/skills/pace-feedback/SKILL.md b/skills/pace-feedback/SKILL.md index b37c68b..91bb2c3 100644 --- a/skills/pace-feedback/SKILL.md +++ b/skills/pace-feedback/SKILL.md @@ -1,9 +1,11 @@ --- -description: Use when user reports issues, shares feedback, or receives production alerts — "用户反馈", "线上问题", "生产问题", "告警", "改进建议", "新需求", "体验问题", "功能请求", "线上bug", "运维", "事件", "incident", "故障", "P0", "P1", "严重故障", "postmortem", "事后复盘". -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob, Bash +description: Use when user reports issues, shares feedback, or receives production alerts — "用户反馈", "线上问题", "生产问题", "告警", "改进建议", "新需求", "体验问题", "功能请求", "线上bug", "运维", "事件", "incident", "故障", "P0", "P1", "严重故障", "postmortem", "事后复盘". NOT for code implementation or development (use /pace-dev). NOT for requirement changes (use /pace-change). +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Bash argument-hint: "[report <问题描述>] 或 [incident open/close/timeline/list] 或 [反馈描述]" model: sonnet disable-model-invocation: true +context: fork +agent: pace-engineer --- # /pace-feedback — 反馈收集与事件处理 diff --git a/skills/pace-guard/SKILL.md b/skills/pace-guard/SKILL.md index 98fe0c0..1ddc0a5 100644 --- a/skills/pace-guard/SKILL.md +++ b/skills/pace-guard/SKILL.md @@ -1,6 +1,6 @@ --- description: Use when user wants to assess risks before development, check current risk status, analyze risk trends, or says "风险/预检/预分析/guard/risk/隐患/安全检查". Also auto-invoked during advance mode intent checkpoint for L/XL CRs. NOT for /pace-dev (implementation), NOT for /pace-review (quality gate), NOT for /pace-test (testing). -allowed-tools: Read, Glob, Grep, Write, Edit, Bash +allowed-tools: Read, Write, Edit, Glob, Grep, Bash model: sonnet argument-hint: "[scan|monitor|trends|report|resolve] [CR编号] [--full|--brief|--detail|--batch]" context: fork @@ -11,6 +11,15 @@ agent: pace-analyst 统一管理开发全生命周期的风险:从编码前的 Pre-flight 扫描,到开发中的实时监控,再到跨迭代的趋势分析——让风险可见、可追踪、可解决。风险评估覆盖 Epic 级别(Epic 范围风险影响其下所有 BR/PF/CR)。 +## 推荐使用流程 + +``` +编码前预检: scan(L/XL CR 意图检查点自动触发) +开发中监控: monitor(pace-pulse 周期性触发) +问题解决: resolve RISK-xxx mitigated +迭代回顾: trends → report +``` + ## 子命令 | 子命令 | 用途 | 输入 | 自动触发 | diff --git a/skills/pace-guard/guard-procedures-common.md b/skills/pace-guard/guard-procedures-common.md index b4e946a..15b0a29 100644 --- a/skills/pace-guard/guard-procedures-common.md +++ b/skills/pace-guard/guard-procedures-common.md @@ -39,19 +39,11 @@ **High 风险不可绕过人类确认**——这是铁律,与 Gate 3 审批不可绕过同级别。 -## 分层输出约定(SSOT) +## 分层输出约定 -所有子命令支持三级输出详细度,通过 `--brief` / `--detail` 参数控制: +> 三级输出详细度定义见 `knowledge/output-guide.md §分层输出约定`(SSOT)。 -| 层级 | 触发 | 内容 | 适用场景 | -|------|------|------|---------| -| 简要 | `--brief` | 1 行核心结论(风险计数、等级、下一步建议) | 自动化消费、快速确认 | -| 标准 | (默认) | 结构化表格 + 汇总 + 建议(各规程定义的输出格式) | 日常使用 | -| 详细 | `--detail` | 完整输出含历史对比、缓解建议展开、全维度矩阵 | 深度审查、首次使用 | - -**自动升级规则**:各子命令根据上下文自动提升输出层级(具体规则见各子命令规程文件的"自动升级规则"表)。 - -**简要行统一风格**:`[子命令]:关键指标1 · 指标2 · 下一步/趋势` +**pace-guard 自动升级规则**:各子命令根据上下文自动提升输出层级(具体规则见各子命令规程文件的"自动升级规则"表)。 ## 新项目风险扫描兜底 diff --git a/skills/pace-guard/guard-procedures-scan.md b/skills/pace-guard/guard-procedures-scan.md index c8b384b..ed3bfa1 100644 --- a/skills/pace-guard/guard-procedures-scan.md +++ b/skills/pace-guard/guard-procedures-scan.md @@ -24,8 +24,8 @@ **优先使用脚本**(确定性正则匹配,替代 LLM 逐行模式识别): ``` -Bash: git diff main...HEAD | node ${CLAUDE_PLUGIN_ROOT}/scripts/security-scan.mjs -# 或指定 CR:node ${CLAUDE_PLUGIN_ROOT}/scripts/security-scan.mjs --cr CR-001 .devpace +Bash: git diff main...HEAD | node ${CLAUDE_SKILL_DIR}/scripts/security-scan.mjs +# 或指定 CR:node ${CLAUDE_SKILL_DIR}/scripts/security-scan.mjs --cr CR-001 .devpace ``` 脚本输出 JSON `{ findings[], summary: { total, high, medium }, scanned_files }`。findings 为空时跳过 Layer 2 报告。脚本不可用时降级为以下 LLM 手动扫描: diff --git a/scripts/security-scan.mjs b/skills/pace-guard/scripts/security-scan.mjs similarity index 92% rename from scripts/security-scan.mjs rename to skills/pace-guard/scripts/security-scan.mjs index 6b7b6e9..8920ca4 100755 --- a/scripts/security-scan.mjs +++ b/skills/pace-guard/scripts/security-scan.mjs @@ -3,9 +3,9 @@ * OWASP security pattern scanner for git diff output. * * Usage: - * git diff HEAD~1 | node scripts/security-scan.mjs - * node scripts/security-scan.mjs --cr CR-001 - * node scripts/security-scan.mjs --files src/auth.js,src/db.js + * git diff HEAD~1 | node skills/pace-guard/scripts/security-scan.mjs + * node skills/pace-guard/scripts/security-scan.mjs --cr CR-001 + * node skills/pace-guard/scripts/security-scan.mjs --files src/auth.js,src/db.js * * Scans new/modified lines for 6 OWASP risk categories. * Output: JSON { findings[], summary: { total, high, medium }, scanned_files } @@ -15,7 +15,7 @@ import { readFileSync, existsSync } from 'node:fs'; import { join } from 'node:path'; -import { execSync } from 'node:child_process'; +import { execFileSync } from 'node:child_process'; import { createInterface } from 'node:readline'; // ── OWASP Pattern Registry ────────────────────────────────────────── @@ -66,7 +66,7 @@ const PATTERNS = [ // A01: Broken Access Control { category: 'A01', name: 'Path Traversal', severity: 'Medium', patterns: [ - /\.\.\/|\.\.\\|\.\.[/\\]/, + /(?:readFile|readFileSync|open|fopen|createReadStream|access|stat)\s*\(.*\.\.\//i, /(?:readFile|readFileSync|open|fopen)\s*\(.*(?:req|input|param|body|query)/i, ] }, @@ -201,7 +201,11 @@ function getCrDiff(id, devDir) { const branchMatch = content.match(/\*\*分支\*\*[::]\s*(.+)/); if (!branchMatch) return null; const branch = branchMatch[1].trim(); - return execSync(`git diff main...${branch} 2>/dev/null || git diff HEAD~5`, { encoding: 'utf-8', timeout: 15000 }); + try { + return execFileSync('git', ['diff', `main...${branch}`], { encoding: 'utf-8', timeout: 15000 }); + } catch { + return null; // Branch not found or diff failed — do not fallback to unrelated diff + } } catch { return null; } } diff --git a/skills/pace-init/SKILL.md b/skills/pace-init/SKILL.md index 2ba8134..5efaa5c 100644 --- a/skills/pace-init/SKILL.md +++ b/skills/pace-init/SKILL.md @@ -1,6 +1,6 @@ --- description: Use when user says "初始化", "pace-init", "开始追踪", "初始化研发管理", "新项目", "项目管理", "set up devpace", "健康检查 devpace", "重置 devpace", "预览初始化", or wants to set up, verify, or reset project development tracking. NOT for current progress overview (use /pace-status) or starting development (use /pace-dev). -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob, Bash +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Bash argument-hint: "[项目名称] [full] [--from <路径>...] [--import-insights <路径>] [--verify [--fix]] [--dry-run] [--reset [--keep-insights]] [--export-template] [--from-template <路径>] [--interactive] [--lite]" model: sonnet disable-model-invocation: true @@ -10,7 +10,7 @@ hooks: tool_name: "Write|Edit" hooks: - type: command - command: "${CLAUDE_PLUGIN_ROOT}/hooks/pace-init-scope-check.mjs" + command: "${CLAUDE_PLUGIN_ROOT}/hooks/skill/pace-init-scope-check.mjs" timeout: 5 --- diff --git a/skills/pace-init/init-procedures-verify.md b/skills/pace-init/init-procedures-verify.md index 4271ee7..85598ba 100644 --- a/skills/pace-init/init-procedures-verify.md +++ b/skills/pace-init/init-procedures-verify.md @@ -15,7 +15,7 @@ **优先使用脚本**(确定性校验,比逐文件手动检查更快更可靠): ``` -Bash: node ${CLAUDE_PLUGIN_ROOT}/scripts/validate-schema.mjs .devpace +Bash: node ${CLAUDE_SKILL_DIR}/scripts/validate-schema.mjs .devpace ``` 脚本覆盖 state/project/CR/PF/BR 五种文件类型的结构化校验,输出 JSON `{ valid, total, errors, warnings, results[] }`。脚本通过后,仅需对脚本未覆盖的文件类型(rules/、iterations/、releases/、integrations/、metrics/、CLAUDE.md)进行手动校验。 diff --git a/scripts/validate-schema.mjs b/skills/pace-init/scripts/validate-schema.mjs similarity index 91% rename from scripts/validate-schema.mjs rename to skills/pace-init/scripts/validate-schema.mjs index a2e8d95..51db233 100755 --- a/scripts/validate-schema.mjs +++ b/skills/pace-init/scripts/validate-schema.mjs @@ -3,9 +3,9 @@ * Schema validation engine for .devpace/ files. * * Usage: - * node scripts/validate-schema.mjs # validate all known files - * node scripts/validate-schema.mjs --type cr # validate all CR files - * node scripts/validate-schema.mjs --file # validate a specific file + * node skills/pace-init/scripts/validate-schema.mjs # validate all known files + * node skills/pace-init/scripts/validate-schema.mjs --type cr # validate all CR files + * node skills/pace-init/scripts/validate-schema.mjs --file # validate a specific file * * Supported file types: cr, state, project, pf, br * @@ -68,7 +68,7 @@ function discoverFiles(devDir, typeFilter) { const backlog = join(devDir, 'backlog'); if (existsSync(backlog)) { for (const f of readdirSync(backlog)) { - if (/^CR-\d{3}\.md$/.test(f)) targets.push({ path: join(backlog, f), type: 'cr' }); + if (/^CR-\d{3,}\.md$/.test(f)) targets.push({ path: join(backlog, f), type: 'cr' }); } } } @@ -77,7 +77,7 @@ function discoverFiles(devDir, typeFilter) { const features = join(devDir, 'features'); if (existsSync(features)) { for (const f of readdirSync(features)) { - if (/^PF-\d{3}\.md$/.test(f)) targets.push({ path: join(features, f), type: 'pf' }); + if (/^PF-\d{3,}\.md$/.test(f)) targets.push({ path: join(features, f), type: 'pf' }); } } } @@ -86,7 +86,7 @@ function discoverFiles(devDir, typeFilter) { const reqs = join(devDir, 'requirements'); if (existsSync(reqs)) { for (const f of readdirSync(reqs)) { - if (/^BR-\d{3}\.md$/.test(f)) targets.push({ path: join(reqs, f), type: 'br' }); + if (/^BR-\d{3,}\.md$/.test(f)) targets.push({ path: join(reqs, f), type: 'br' }); } } } @@ -98,9 +98,9 @@ function detectFileType(filePath) { const name = basename(filePath); if (name === 'state.md') return 'state'; if (name === 'project.md') return 'project'; - if (/^CR-\d{3}\.md$/.test(name)) return 'cr'; - if (/^PF-\d{3}\.md$/.test(name)) return 'pf'; - if (/^BR-\d{3}\.md$/.test(name)) return 'br'; + if (/^CR-\d{3,}\.md$/.test(name)) return 'cr'; + if (/^PF-\d{3,}\.md$/.test(name)) return 'pf'; + if (/^BR-\d{3,}\.md$/.test(name)) return 'br'; return null; } @@ -190,7 +190,7 @@ const BR_STATUSES = ['待开始', '进行中', '已完成', '暂停']; const RULES = { // ── CR rules ───────────────────────────────────────────────────── cr: [ - (c, r, p) => checkFileNaming(p, r, /^CR-\d{3}\.md$/, 'CR-xxx.md'), + (c, r, p) => checkFileNaming(p, r, /^CR-\d{3,}\.md$/, 'CR-xxx.md'), (c, r) => requireTitle(c, r), (c, r) => requireField(c, r, 'ID'), (c, r) => requireField(c, r, '状态'), @@ -248,8 +248,8 @@ const RULES = { (c, r) => { // ID format check: CR-xxx const match = c.match(/^- \*\*ID\*\*[::]\s*(.+)$/m); - if (match && !/^CR-\d{3}$/.test(match[1].trim())) { - r.errors.push(`Invalid ID format: "${match[1].trim()}". Expected: CR-xxx (3 digits)`); + if (match && !/^CR-\d{3,}$/.test(match[1].trim())) { + r.errors.push(`Invalid ID format: "${match[1].trim()}". Expected: CR-xxx (3+ digits)`); } }, ], @@ -308,7 +308,7 @@ const RULES = { }, (c, r) => { // CR emoji status consistency in tree view - const crRefs = c.matchAll(/CR-(\d{3})\s*(🔄|✅|⏳|🚀|⏸️)?/g); + const crRefs = c.matchAll(/CR-(\d{3,})\s*(🔄|✅|⏳|🚀|⏸️)?/g); for (const m of crRefs) { if (!m[2]) { r.warnings.push(`CR-${m[1]} in tree view missing status emoji`); @@ -320,12 +320,12 @@ const RULES = { // ── PF rules ───────────────────────────────────────────────────── pf: [ - (c, r, p) => checkFileNaming(p, r, /^PF-\d{3}\.md$/, 'PF-xxx.md'), + (c, r, p) => checkFileNaming(p, r, /^PF-\d{3,}\.md$/, 'PF-xxx.md'), (c, r) => requireTitle(c, r), (c, r) => { // Title should contain PF-xxx const titleMatch = c.match(/^# (.+)$/m); - if (titleMatch && !/PF-\d{3}/.test(titleMatch[1])) { + if (titleMatch && !/PF-\d{3,}/.test(titleMatch[1])) { r.warnings.push('Title does not contain PF-xxx identifier'); } }, @@ -342,12 +342,12 @@ const RULES = { // ── BR rules ───────────────────────────────────────────────────── br: [ - (c, r, p) => checkFileNaming(p, r, /^BR-\d{3}\.md$/, 'BR-xxx.md'), + (c, r, p) => checkFileNaming(p, r, /^BR-\d{3,}\.md$/, 'BR-xxx.md'), (c, r) => requireTitle(c, r), (c, r) => { // Title should contain BR-xxx const titleMatch = c.match(/^# (.+)$/m); - if (titleMatch && !/BR-\d{3}/.test(titleMatch[1])) { + if (titleMatch && !/BR-\d{3,}/.test(titleMatch[1])) { r.warnings.push('Title does not contain BR-xxx identifier'); } }, diff --git a/skills/pace-learn/SKILL.md b/skills/pace-learn/SKILL.md index 174ea67..f82c92d 100644 --- a/skills/pace-learn/SKILL.md +++ b/skills/pace-learn/SKILL.md @@ -1,11 +1,11 @@ --- -description: Use when user says "/pace-learn" for knowledge base management, or auto-invoked after CR merge, gate failure recovery, or human rejection. +description: Use when user says "/pace-learn", "经验", "知识库", "pattern", "lessons learned", "学到了什么", or auto-invoked after CR merge, gate failure recovery, or human rejection. allowed-tools: Read, Write, Edit, Glob, Grep model: sonnet argument-hint: "[note|list|stats|export] [参数]" --- -# pace-learn — 经验积累与知识管理 +# /pace-learn — 经验积累与知识管理 devpace 的学习引擎。双模式运行: @@ -31,6 +31,7 @@ devpace 的学习引擎。双模式运行: | CR merged | `merged` | 成功模式 | 可复用的检查项、高效的工作路径 | | Gate fail | `gate1_fail` / `gate2_fail` | 失败教训 | 检查项阈值调整、Claude 盲区识别 | | 人类打回 | `rejected` | 理解差距 | 意图理解偏差模式、审查标准校准 | +| 挣扎信号 | `struggle` | 环境缺陷识别 | Skill/procedures/Schema 改进建议 | 触发源由 `hooks/post-cr-update.mjs` 检测并输出 `devpace:learn-trigger` 提醒。 diff --git a/skills/pace-learn/learn-procedures.md b/skills/pace-learn/learn-procedures.md index 3467a5e..1a87a72 100644 --- a/skills/pace-learn/learn-procedures.md +++ b/skills/pace-learn/learn-procedures.md @@ -37,6 +37,7 @@ | 风险解决记录 | `.devpace/risks/` 中与该 CR 关联的风险文件 | 风险预测准确性、缓解措施有效性 | | diff 统计 | `git diff --stat` 输出 | 复杂度校准——实际变更规模 vs 预估 | | 变更管理记录 | iterations/current.md 变更记录表中与该 CR 相关的条目 | 变更模式、范围蔓延信号 | +| 挣扎信号 | 事件表中 gate1_fail/gate2_fail 次数、pulse-counter stuck 检测记录、自修复循环次数 | 环境缺陷定位——哪个 Skill/procedure/Schema 导致了困难 | ### 提取规则 @@ -48,6 +49,33 @@ - L/XL 复杂度(>7 checkpoint)→ 提炼最多 3 个 pattern(多维度提取) 4. 每个 pattern 必须有明确的证据支撑(不可凭感觉提炼) +### 挣扎信号提取(struggle 触发) + +CR merged 时,如果满足以下任一条件,附加挣扎信号提取(与成功模式提取叠加执行): +- Gate 1 自修复循环 ≥ 3 次(事件表中 gate1_fail 计数) +- 同一 CR 文件写入 ≥ 5 次且状态未变(事件表中 stuck-warning 或 pulse-counter 记录) +- Gate 2 对抗审查发现 ≥ 3 个问题(事件表备注) + +**提取方向**(与其他触发源不同): +- 不提取"代码怎么改",提取"环境哪里不足" +- pattern 类型标记为 `harness-improvement` +- 描述格式:`[Skill/procedure/Schema 名称] 在 [场景] 下导致 [困难类型],建议 [改进方向]` + +示例: +``` +标题:Gate 1 lint 修复循环过多 +类型:harness-improvement +标签:[gate, lint, efficiency] +描述:dev-procedures-gate 未指导 Claude 在首次 lint 前执行 auto-fix 命令,导致连续 3 次自修复 +建议:在 Gate 1 流程中增加"先执行 auto-fix 再跑 lint"的步骤 +证据:CR-005 事件表 gate1_fail ×3,均为 lint 相关 +``` + +**规则**: +- 延迟提取:仅在 CR merged 后回顾性提取,不在挣扎发生时干扰工作 +- 最多 1 个 harness-improvement pattern/CR(聚焦最显著的环境缺陷) +- 与成功模式 pattern 独立计数(不占自适应提取的 1-3 个名额) + ## Step 3:对比与积累(统一写入管道) 此步骤是 insights.md 的**唯一写入路径**。处理两类输入: diff --git a/skills/pace-next/next-procedures.md b/skills/pace-next/next-procedures.md index ef71f4e..6401e3a 100644 --- a/skills/pace-next/next-procedures.md +++ b/skills/pace-next/next-procedures.md @@ -20,7 +20,7 @@ **优先使用脚本**(确定性评估 24 个信号条件,替代 LLM 逐文件 Glob+Grep): ``` -Bash: node ${CLAUDE_PLUGIN_ROOT}/scripts/collect-signals.mjs .devpace [--role <角色>] [--cache-read] +Bash: node ${CLAUDE_SKILL_DIR}/scripts/collect-signals.mjs .devpace [--role <角色>] [--cache-read] ``` 脚本输出 JSON `{ triggered[], top_signal, role, cr_summary }`。`--cache-read` 自动检查 5 分钟缓存。脚本通过后直接跳到 Step 3,使用 `triggered` 数组和 `top_signal` 做优先级决策。 diff --git a/scripts/collect-signals.mjs b/skills/pace-next/scripts/collect-signals.mjs similarity index 92% rename from scripts/collect-signals.mjs rename to skills/pace-next/scripts/collect-signals.mjs index 8acc639..3ebca9a 100755 --- a/scripts/collect-signals.mjs +++ b/skills/pace-next/scripts/collect-signals.mjs @@ -2,23 +2,24 @@ /** * Signal collection engine for devpace. * - * Evaluates 24 signal conditions (S1-S24) from 11 data sources, + * Evaluates signal conditions (S1-S22, S24-S25; S23 reserved) from 11 data sources, * replaces LLM-driven Glob+Grep+reasoning with deterministic checks. * * Usage: - * node scripts/collect-signals.mjs - * node scripts/collect-signals.mjs --role pm - * node scripts/collect-signals.mjs --cache - * node scripts/collect-signals.mjs --cache-read + * node skills/pace-next/scripts/collect-signals.mjs + * node skills/pace-next/scripts/collect-signals.mjs --role pm + * node skills/pace-next/scripts/collect-signals.mjs --cache + * node skills/pace-next/scripts/collect-signals.mjs --cache-read * * Output: JSON { triggered[], top_signal, role, cr_summary, timestamp } * - * Dependencies: Node.js only. Reuses extract-cr-metadata.mjs for CR scanning. + * Dependencies: Node.js only. Reuses skills/scripts/extract-cr-metadata.mjs for CR scanning. */ import { readFileSync, writeFileSync, readdirSync, existsSync, statSync } from 'node:fs'; import { join } from 'node:path'; import { execFileSync } from 'node:child_process'; +import { fileURLToPath } from 'node:url'; // ── Constants ──────────────────────────────────────────────────────── const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes @@ -40,7 +41,7 @@ const SIGNAL_GROUP_MAP = { S9: 'strategic', S10: 'strategic', S11: 'strategic', S12: 'strategic', S13: 'growth', S14: 'growth', S15: 'growth', S16: 'growth', S17: 'growth', S18: 'growth', S19: 'growth', S20: 'idle', - S21: 'growth', S22: 'growth', S24: 'growth', S25: 'blocking', + S21: 'growth', S22: 'growth', /* S23: reserved */ S24: 'growth', S25: 'blocking', }; // ── CLI ────────────────────────────────────────────────────────────── @@ -152,9 +153,9 @@ function collectData(devDir) { function loadCrMetadata(devDir) { try { - const scriptDir = new URL('.', import.meta.url).pathname; + const scriptDir = fileURLToPath(new URL('.', import.meta.url)); const output = execFileSync( - 'node', [join(scriptDir, 'extract-cr-metadata.mjs'), devDir], + 'node', [join(scriptDir, '..', '..', 'scripts', 'extract-cr-metadata.mjs'), devDir], { encoding: 'utf-8', timeout: 10000 } ); return JSON.parse(output); @@ -168,19 +169,35 @@ function manualScanCrs(devDir) { const backlog = join(devDir, 'backlog'); if (!existsSync(backlog)) return []; const crs = []; - for (const f of readdirSync(backlog).filter(f => /^CR-\d{3}\.md$/.test(f))) { + for (const f of readdirSync(backlog).filter(f => /^CR-\d{3,}\.md$/.test(f))) { try { const content = readFileSync(join(backlog, f), 'utf-8'); const status = extractField(content, '状态') || ''; const type = extractField(content, '类型') || 'feature'; const pf = extractField(content, '产品功能') || ''; const blocked = extractField(content, '阻塞') || ''; - crs.push({ id: f.replace('.md', ''), title: extractTitle(content), status, type, breaking: false, pf, blocked }); + const events = extractEventsBasic(content); + crs.push({ id: f.replace('.md', ''), title: extractTitle(content), status, type, breaking: false, pf, blocked, events }); } catch { /* skip unreadable */ } } return crs; } +function extractEventsBasic(content) { + const events = []; + const tableMatch = content.match(/## 事件\s*\n+\|[^\n]+\n\|[-|\s]+\n([\s\S]*?)(?=\n## |$)/); + if (!tableMatch) return events; + const rows = tableMatch[1].trim().split('\n'); + for (const row of rows) { + if (!row.startsWith('|')) continue; + const cells = row.split('|').map(c => c.trim()).filter(Boolean); + if (cells.length >= 3) { + events.push({ date: cells[0], type: cells[1], event: cells[1], actor: cells[2] || '', note: cells[3] || '' }); + } + } + return events; +} + function scanReleases(devDir) { const dir = join(devDir, 'releases'); if (!existsSync(dir)) return []; @@ -244,7 +261,7 @@ function readProjectMeta(devDir) { try { const content = readFileSync(p, 'utf-8'); // Count PFs in tree view - const pfMatches = content.matchAll(/PF-\d{3}/g); + const pfMatches = content.matchAll(/PF-\d{3,}/g); const pfIds = new Set(); for (const m of pfMatches) pfIds.add(m[0]); meta.pfCount = pfIds.size; @@ -252,7 +269,7 @@ function readProjectMeta(devDir) { // PFs without CRs: PF lines that don't have CR-xxx nearby const lines = content.split('\n'); for (const line of lines) { - const pfMatch = line.match(/PF-\d{3}/); + const pfMatch = line.match(/PF-\d{3,}/); if (pfMatch && !line.includes('CR-')) { meta.pfWithoutCr++; } @@ -297,7 +314,7 @@ function scanEpics(devDir) { try { const content = readFileSync(join(dir, f), 'utf-8'); const status = extractField(content, '状态') || ''; - const hasBrs = /BR-\d{3}/.test(content); + const hasBrs = /BR-\d{3,}/.test(content); epics.push({ id: f.replace('.md', ''), status, hasBrs, title: extractTitle(content) }); } catch { /* skip */ } } @@ -355,7 +372,7 @@ function evaluateSignals(data, devDir) { if (pausedCrs.length > 0) { const resumable = pausedCrs.filter(cr => { if (!cr.blocked) return true; // No blocking reason recorded → consider resumable - const blockerRef = cr.blocked.match(/CR-\d{3}/)?.[0]; + const blockerRef = cr.blocked.match(/CR-\d{3,}/)?.[0]; if (!blockerRef) return true; // Non-CR blocking reason → can't determine programmatically, include const blocker = crs.find(c => c.id === blockerRef); return blocker && (blocker.status === 'merged' || blocker.status === 'released'); @@ -478,7 +495,7 @@ function evaluateSignals(data, devDir) { // S21: 跨 CR 依赖阻塞(CR-B 非 merged/developing 超 3 天) for (const cr of crs) { if (cr.blocked) { - const blockedBy = cr.blocked.match(/CR-\d{3}/)?.[0]; + const blockedBy = cr.blocked.match(/CR-\d{3,}/)?.[0]; if (blockedBy) { const blocker = crs.find(c => c.id === blockedBy); if (blocker && blocker.status !== 'merged' && blocker.status !== 'released' && blocker.status !== 'developing') { @@ -541,6 +558,13 @@ function applyRoleReorder(triggered, role) { const promotions = ROLE_PROMOTIONS[role].promote; const groupOrder = [...SIGNAL_GROUPS, 'idle']; + // Mark promoted signals before sorting (avoid side effects inside comparator) + for (const s of triggered) { + if (promotions.includes(s.id) && s.group !== 'blocking') { + s.role_promoted = true; + } + } + return [...triggered].sort((a, b) => { let aGroupIdx = groupOrder.indexOf(a.group); let bGroupIdx = groupOrder.indexOf(b.group); @@ -550,10 +574,8 @@ function applyRoleReorder(triggered, role) { if (b.group === 'blocking') bGroupIdx = 0; // Promote: move up by 1 group level - const aPromoted = promotions.includes(a.id) && a.group !== 'blocking'; - const bPromoted = promotions.includes(b.id) && b.group !== 'blocking'; - if (aPromoted) { aGroupIdx = Math.max(0, aGroupIdx - 1); a.role_promoted = true; } - if (bPromoted) { bGroupIdx = Math.max(0, bGroupIdx - 1); b.role_promoted = true; } + if (a.role_promoted) aGroupIdx = Math.max(0, aGroupIdx - 1); + if (b.role_promoted) bGroupIdx = Math.max(0, bGroupIdx - 1); return aGroupIdx - bGroupIdx; }); diff --git a/skills/pace-plan/SKILL.md b/skills/pace-plan/SKILL.md index 02d469c..dac64ce 100644 --- a/skills/pace-plan/SKILL.md +++ b/skills/pace-plan/SKILL.md @@ -1,7 +1,8 @@ --- description: Use when user says "规划迭代", "下个迭代做什么", "迭代规划", "计划", "排期", "安排", "sprint", "pace-plan", "调整迭代范围", "迭代调整", "迭代健康", or at iteration boundary when planning next iteration scope. NOT for PF-level requirement changes (use /pace-change). -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob argument-hint: "[next|close|adjust|health]" +model: sonnet context: fork agent: pace-pm --- diff --git a/skills/pace-pulse/SKILL.md b/skills/pace-pulse/SKILL.md index 4bbcdd1..ab705c7 100644 --- a/skills/pace-pulse/SKILL.md +++ b/skills/pace-pulse/SKILL.md @@ -1,7 +1,7 @@ --- -description: Auto-invoked during advance mode after 5 checkpoints or 30+ minutes on same CR, at session start/end, or when rhythm anomalies are detected. +description: Auto-invoked during advance mode after extended work on same CR, at session start/end, or when rhythm anomalies are detected. user-invocable: false -allowed-tools: Read, Glob, Write +allowed-tools: Read, Write, Glob model: haiku --- @@ -25,7 +25,7 @@ Claude 自动调用的健康度检查 Skill,不暴露给用户。在推进模 | 触发场景 | 加载文件 | |---------|---------| | 脉搏检查(§10,每 5 checkpoint) | `pulse-procedures-core.md` | -| 会话开始(§1) | `pulse-procedures-session-start.md` + `pulse-procedures-snooze.md` | +| 会话开始(§1) | `pulse-procedures-session-start.md` + `pulse-procedures-snooze.md` + `pulse-procedures-gc.md` | | 会话结束(§6) | `pulse-procedures-session-end.md` | | CR merged 后 Snooze 检测(§11) | `pulse-procedures-snooze.md` | diff --git a/skills/pace-pulse/pulse-procedures-gc.md b/skills/pace-pulse/pulse-procedures-gc.md new file mode 100644 index 0000000..79d91ef --- /dev/null +++ b/skills/pace-pulse/pulse-procedures-gc.md @@ -0,0 +1,72 @@ +# pace-pulse GC 模式规程 + +> **职责**:定义会话开始时的项目基础设施健康度扫描(Garbage Collection)。由 `SKILL.md` 路由表指定加载。 + +## 与核心脉搏检查的关系 + +核心脉搏检查(`pulse-procedures-core.md`)面向**运行时研发节奏**(CR 滞留、Gate 失败、迭代进度)。GC 模式面向**项目基础设施健康度**——文档陈旧、Schema 漂移、孤立 CR 等渐进积累的问题。两者互补。 + +## 设计原则 + +借鉴 Harness Engineering 的 Garbage Collection 理念:Agent 复制仓库中已有的模式——包括次优模式,导致渐进漂移。周期性扫描是对抗熵增的机制。 + +## 触发时机 + +会话开始时(与 session-start 信号评估同步执行)。不在推进模式中执行,避免增加 overhead。 + +## GC 扫描项 + +### 1. 文档陈旧度 + +**检测方式**:对 `.devpace/` 下关键文件执行 `git log -1 --format=%cr`,获取最后修改距今时间。 + +**扫描文件**: +- `project.md`(项目定义) +- `context.md`(技术约定) +- `rules/workflow.md`(工作流规则) + +**阈值**:最后修改 > 30 天 + +**建议**:`"🧹 项目文档 [文件名] 已超 30 天未更新,建议检查是否需要刷新"` + +**规则**: +- `.devpace/` 不存在时跳过 +- 文件不存在时跳过该文件(不是所有项目都有 context.md) +- 多个文件同时过期时合并为 1 条建议(列出文件名) + +### 2. Schema 结构漂移 + +**检测方式**:读取 `backlog/` 中最近修改的 3 个 CR 文件,检查是否包含 `knowledge/_schema/cr-format.md` §0 速查卡片中的必含章节。 + +**必含章节**(最小检查集): +- 元信息行:`**ID**`、`**状态**` +- `## 意图` 章节 +- `## 事件` 章节 + +**阈值**:3 个 CR 中有 ≥ 2 个缺失上述任一必含章节 + +**建议**:`"🧹 最近 CR 存在 Schema 不一致(缺失 [章节名]),建议检查 CR 创建流程或 Schema 定义"` + +**规则**: +- backlog/ 不存在或 CR 数 < 3 时跳过 +- 仅检查结构性缺失,不做语义分析(model: haiku 执行) + +### 3. 孤立 CR 检测 + +**检测方式**:扫描 `backlog/` 中状态为 developing 或 verifying 的 CR,读取事件表最后一条事件的时间戳,计算距今天数。 + +**阈值**:最后事件 > 7 天 + +**建议**:`"🧹 CR-xxx 已滞留 [N] 天未推进,建议 /pace-change pause 暂停或继续推进"` + +**规则**: +- paused 状态 CR 不检测(已是暂停态) +- created 状态 CR 不检测(尚未开始) +- 多个孤立 CR 合并为 1 条建议(列出 CR 编号) + +## 输出规则 + +- **独立配额**:GC 建议每会话最多 1 条,不占用核心 pulse 的 ≤3 条配额 +- **最低优先级**:当 session-start 有更高优先级信号(如 Review 积压、风险积压)时,GC 建议降级为精简列表中的一项 +- **前缀**:`🧹`(区别于核心 pulse 的 `💡` 和正向反馈的 `✓`) +- **3 项扫描均无异常时**:完全静默,不输出 diff --git a/skills/pace-pulse/pulse-procedures-session-start.md b/skills/pace-pulse/pulse-procedures-session-start.md index 436a552..a265987 100644 --- a/skills/pace-pulse/pulse-procedures-session-start.md +++ b/skills/pace-pulse/pulse-procedures-session-start.md @@ -42,6 +42,7 @@ | 9 | dashboard.md 最近更新 > 14 天 + MoS 有未勾选项 | "距上次回顾已超 2 周——`/pace-retro`" | | 10 | Snooze 条目触发条件满足(CR 事件表/迭代变更记录) | "之前延后的变更触发条件已满足(详情见 `pulse-procedures-snooze.md`)" | | 11 | 用户对话含运维关键词 + pace-feedback 未在本会话使用 | "检测到生产问题描述——`/pace-feedback report`" | +| 12 | GC 扫描发现异常(文档陈旧/Schema 漂移/孤立 CR) | 详见 `pulse-procedures-gc.md`(独立配额,最多 1 条) | ## 缓存写入 diff --git a/skills/pace-release/SKILL.md b/skills/pace-release/SKILL.md index 1f5c9a6..a6a7e58 100644 --- a/skills/pace-release/SKILL.md +++ b/skills/pace-release/SKILL.md @@ -1,6 +1,6 @@ --- -description: Use when user says "发布", "部署", "上线", "release", "pace-release", or wants to create, deploy, or close a release. -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob, Bash +description: Use when user says "发布", "部署", "上线", "release", "pace-release", or wants to create, deploy, or close a release. NOT for CI/CD pipeline management (use /pace-sync). NOT for code implementation (use /pace-dev). +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Bash argument-hint: "[create|deploy|verify|close|full|status|status history|changelog|version|tag|notes --role biz|ops|pm|branch|rollback]" model: sonnet disable-model-invocation: true @@ -14,6 +14,16 @@ agent: pace-engineer 管理 Release 生命周期:收集候选变更 → 创建 Release → 追踪部署 → 验证 → 关闭。支持 Changelog、版本 bump、Git Tag、GitHub Release、Release Notes 和发布分支管理。 +## 推荐使用流程 + +``` +标准发布: create → deploy → verify → close +快速查看: status(或无参数启动引导向导) +单步操作: changelog / version / tag / notes(独立使用) +回滚处理: rollback → 创建 hotfix CR → 新 create +历史回顾: status history +``` + ## 输入 $ARGUMENTS: diff --git a/skills/pace-release/release-procedures-version.md b/skills/pace-release/release-procedures-version.md index db7229d..8ed3420 100644 --- a/skills/pace-release/release-procedures-version.md +++ b/skills/pace-release/release-procedures-version.md @@ -22,7 +22,7 @@ 1. 执行版本推断脚本: ``` - Bash: node ${CLAUDE_PLUGIN_ROOT}/scripts/infer-version-bump.mjs .devpace [当前版本号] + Bash: node ${CLAUDE_SKILL_DIR}/scripts/infer-version-bump.mjs .devpace [当前版本号] ``` - 脚本自动扫描 merged 且未关联 Release 的 CR,检测 breaking/feature/defect 信号 - 输出 JSON:`{ current, suggested, bump_type, reasoning[], candidates[] }` diff --git a/scripts/infer-version-bump.mjs b/skills/pace-release/scripts/infer-version-bump.mjs similarity index 81% rename from scripts/infer-version-bump.mjs rename to skills/pace-release/scripts/infer-version-bump.mjs index 77cca5f..fa2c7c5 100755 --- a/scripts/infer-version-bump.mjs +++ b/skills/pace-release/scripts/infer-version-bump.mjs @@ -3,7 +3,7 @@ * Infer semantic version bump from CR metadata. * * Usage: - * node scripts/infer-version-bump.mjs [current-version] + * node skills/pace-release/scripts/infer-version-bump.mjs [current-version] * * Reads merged CRs not yet in a Release, analyzes for breaking/feature/defect, * outputs JSON with suggested version bump. @@ -26,11 +26,13 @@ import { readFileSync, existsSync } from 'node:fs'; import { join } from 'node:path'; import { execFileSync } from 'node:child_process'; +import { fileURLToPath } from 'node:url'; // ── Parse CLI args ─────────────────────────────────────────────────── const args = process.argv.slice(2); -const devpaceDir = args[0]; -const explicitVersion = args[1]; +const positional = args.filter(a => !a.startsWith('--')); +const devpaceDir = positional[0]; +const explicitVersion = positional[1]; if (!devpaceDir) { console.error('Usage: node infer-version-bump.mjs [current-version]'); @@ -38,17 +40,25 @@ if (!devpaceDir) { } // ── Step 1: Get candidate CRs (merged, no release) ────────────────── -const scriptDir = new URL('.', import.meta.url).pathname; +const scriptDir = fileURLToPath(new URL('.', import.meta.url)); let candidates; try { const output = execFileSync( 'node', - [join(scriptDir, 'extract-cr-metadata.mjs'), devpaceDir, '--status', 'merged', '--no-release'], + [join(scriptDir, '..', '..', 'scripts', 'extract-cr-metadata.mjs'), devpaceDir, '--status', 'merged', '--no-release'], { encoding: 'utf-8', timeout: 10000 } ); candidates = JSON.parse(output); } catch (err) { - console.error(`Error extracting CR metadata: ${err.message}`); + const errorOutput = { + current: null, + suggested: null, + bump_type: null, + reasoning: [`Error extracting CR metadata: ${err.message}`], + candidates: [], + error: err.message, + }; + console.log(JSON.stringify(errorOutput, null, 2)); process.exit(1); } @@ -135,8 +145,16 @@ function readCurrentVersion(devDir) { const versionFilePath = pathMatch[1].trim(); - // Determine project root (parent of .devpace) - const projectRoot = join(devDir, '..'); + // Determine project root: check for .git or fallback to parent of devpace dir + let projectRoot = join(devDir, '..'); + // Walk up to find .git (more reliable than assuming parent) + let candidate = projectRoot; + for (let i = 0; i < 5; i++) { + if (existsSync(join(candidate, '.git'))) { projectRoot = candidate; break; } + const parent = join(candidate, '..'); + if (parent === candidate) break; + candidate = parent; + } const absVersionPath = join(projectRoot, versionFilePath); if (!existsSync(absVersionPath)) return null; @@ -169,7 +187,9 @@ function readCurrentVersion(devDir) { * Bump a semver version by the given type. */ function bumpVersion(version, type) { - const parts = version.split('.').map(Number); + // Strip pre-release suffix (e.g., 1.0.0-beta.1 → 1.0.0) + const coreVersion = version.replace(/-.*$/, ''); + const parts = coreVersion.split('.').map(Number); if (parts.length !== 3 || parts.some(isNaN)) return null; switch (type) { diff --git a/skills/pace-retro/SKILL.md b/skills/pace-retro/SKILL.md index 8009c1d..6b71b68 100644 --- a/skills/pace-retro/SKILL.md +++ b/skills/pace-retro/SKILL.md @@ -1,7 +1,8 @@ --- -description: Use when user says "回顾", "复盘", "度量", "retro", "总结", "数据分析", "DORA", "质量报告", "交付效率", "度量报告", "趋势", "中期检查", "对比", "预测", "forecast", "能按时交付吗", "交付概率", "瓶颈", "pace-retro", or at iteration end when reviewing progress and metrics. +description: Use when user says "回顾", "复盘", "度量", "retro", "总结", "数据分析", "DORA", "质量报告", "交付效率", "度量报告", "趋势", "中期检查", "对比", "预测", "forecast", "能按时交付吗", "交付概率", "瓶颈", "pace-retro", or at iteration end when reviewing progress and metrics. NOT for next-step recommendations (use /pace-next). NOT for current status overview (use /pace-status). allowed-tools: Read, Write, Edit, Glob, Bash argument-hint: "[update|focus <维度>|compare|history|mid|accept|forecast]" +model: sonnet context: fork agent: pace-analyst --- diff --git a/skills/pace-retro/retro-procedures-update.md b/skills/pace-retro/retro-procedures-update.md index 306238f..2628a28 100644 --- a/skills/pace-retro/retro-procedures-update.md +++ b/skills/pace-retro/retro-procedures-update.md @@ -6,7 +6,7 @@ - **触发**:`/pace-retro update` - **流程**:Step 1 数据收集(优先使用脚本,见下方)→ Step 2 更新 dashboard.md → 输出变化反馈 -- **脚本采集**:`Bash: node ${CLAUDE_PLUGIN_ROOT}/scripts/compute-metrics.mjs .devpace`——输出 JSON 含 8 核心指标,直接用于 Step 2 dashboard 更新。脚本不可用时降级为共享规程(`retro-procedures-common.md`) +- **脚本采集**:`Bash: node ${CLAUDE_SKILL_DIR}/scripts/compute-metrics.mjs .devpace`——输出 JSON 含 8 核心指标,直接用于 Step 2 dashboard 更新。脚本不可用时降级为共享规程(`retro-procedures-common.md`) - **不生成报告**:不执行 Step 3-6 - **历史快照**:更新时追加到"度量趋势"表,不覆盖旧值 diff --git a/scripts/compute-metrics.mjs b/skills/pace-retro/scripts/compute-metrics.mjs similarity index 94% rename from scripts/compute-metrics.mjs rename to skills/pace-retro/scripts/compute-metrics.mjs index 037e821..754bff9 100755 --- a/scripts/compute-metrics.mjs +++ b/skills/pace-retro/scripts/compute-metrics.mjs @@ -5,18 +5,19 @@ * Computes 8 core indicators + forecast from .devpace/ data. * * Usage: - * node scripts/compute-metrics.mjs # all metrics - * node scripts/compute-metrics.mjs --scope iteration # iteration only - * node scripts/compute-metrics.mjs --scope forecast # forecast only + * node skills/pace-retro/scripts/compute-metrics.mjs # all metrics + * node skills/pace-retro/scripts/compute-metrics.mjs --scope iteration # iteration only + * node skills/pace-retro/scripts/compute-metrics.mjs --scope forecast # forecast only * * Output: JSON { metrics: {...}, forecast?: {...} } * - * Dependencies: Node.js only. Reuses extract-cr-metadata.mjs. + * Dependencies: Node.js only. Reuses skills/scripts/extract-cr-metadata.mjs. */ import { readFileSync, readdirSync, existsSync, statSync } from 'node:fs'; import { join } from 'node:path'; import { execFileSync } from 'node:child_process'; +import { fileURLToPath } from 'node:url'; // ── CLI ────────────────────────────────────────────────────────────── const args = process.argv.slice(2); @@ -206,10 +207,13 @@ function computeForecast(crs, iter, devDir, metrics) { function loadCrs(devDir) { try { - const scriptDir = new URL('.', import.meta.url).pathname; - const output = execFileSync('node', [join(scriptDir, 'extract-cr-metadata.mjs'), devDir], { encoding: 'utf-8', timeout: 10000 }); + const scriptDir = fileURLToPath(new URL('.', import.meta.url)); + const output = execFileSync('node', [join(scriptDir, '..', '..', 'scripts', 'extract-cr-metadata.mjs'), devDir], { encoding: 'utf-8', timeout: 10000 }); return JSON.parse(output); - } catch { return []; } + } catch (err) { + console.error(`Warning: extract-cr-metadata failed, metrics will be incomplete: ${err.message}`); + return []; + } } function readIteration(devDir) { diff --git a/skills/pace-review/SKILL.md b/skills/pace-review/SKILL.md index 0500e9a..322deac 100644 --- a/skills/pace-review/SKILL.md +++ b/skills/pace-review/SKILL.md @@ -1,6 +1,6 @@ --- description: Use when user says "review", "审核", "帮我看看", "代码审查", "提交审核", "Gate 2", "提交审批", "pace-review", or when a change request reaches in_review state. NOT for running tests or acceptance verification (use /pace-test). -allowed-tools: Read, Write, Edit, Glob, Bash, AskUserQuestion +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Bash argument-hint: "[<关键词>]" model: opus context: fork @@ -11,7 +11,7 @@ hooks: tool_name: "Write|Edit" hooks: - type: command - command: "${CLAUDE_PLUGIN_ROOT}/hooks/pace-review-scope-check.mjs" + command: "${CLAUDE_PLUGIN_ROOT}/hooks/skill/pace-review-scope-check.mjs" timeout: 5 --- diff --git a/skills/pace-role/SKILL.md b/skills/pace-role/SKILL.md index e92402f..047b71f 100644 --- a/skills/pace-role/SKILL.md +++ b/skills/pace-role/SKILL.md @@ -1,6 +1,6 @@ --- -description: Use when user wants to switch output perspective (视角切换), says "切换角色/视角", "以XX视角", "pace-role", "作为产品经理", "作为运维", "换个角度看", or wants to view project from a different role perspective. -allowed-tools: Read, Glob, Write +description: Use when user wants to switch output perspective (视角切换), says "切换角色/视角", "以XX视角", "pace-role", "作为产品经理", "作为运维", "换个角度看", or wants to view project from a different role perspective. NOT for project status overview (use /pace-status). NOT for understanding devpace concepts (use /pace-theory). +allowed-tools: Read, Write, Glob argument-hint: "[biz 业务视角|pm 产品视角|dev 开发视角|tester 测试视角|ops 运维视角|auto 自动推断|compare 多视角快照]" model: haiku --- @@ -36,6 +36,8 @@ $ARGUMENTS: 角色关注维度权威定义及跨 Skill 适配原则见 `role-procedures-dimensions.md`。 +> **隐式依赖**:`role-procedures-inference.md` 不在路由表中直接路由,但由 `rules/devpace-rules.md` §10 在运行时加载(自动推断模式的关键词映射权威源)。 + ## 输出 - 角色切换:确认信息(1-3 行,含相关性评估摘要) diff --git a/skills/pace-sync/SKILL.md b/skills/pace-sync/SKILL.md index e52bb53..80d80e0 100644 --- a/skills/pace-sync/SKILL.md +++ b/skills/pace-sync/SKILL.md @@ -1,7 +1,7 @@ --- -description: "Use when user wants to sync devpace state with external tools (GitHub/Linear/Jira), says '同步/sync/push/pull/关联 Issue/配置同步/setup/解除关联/unlink/创建 Issue/create/同步状态/status/CI/构建/build/pipeline/workflow/GitHub Actions', or /pace-sync. NOT for internal devpace state changes (use /pace-dev) or release operations (use /pace-release)" +description: Use when user wants to sync devpace state with external tools (GitHub/Linear/Jira), says "同步", "sync", "push", "pull", "关联 Issue", "配置同步", "setup", "解除关联", "unlink", "创建 Issue", "create", "同步状态", "status", "CI", "构建", "build", "pipeline", "workflow", "GitHub Actions", or /pace-sync. NOT for internal devpace state changes (use /pace-dev) or release operations (use /pace-release). argument-hint: "[子命令] [参数]" -allowed-tools: Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Grep, Bash model: sonnet --- diff --git a/skills/pace-test/SKILL.md b/skills/pace-test/SKILL.md index 74d028f..a41b23c 100644 --- a/skills/pace-test/SKILL.md +++ b/skills/pace-test/SKILL.md @@ -1,6 +1,6 @@ --- -description: Use when user says "跑测试", "测试覆盖", "验证一下", "验收", "回归", "影响分析", "test", "verify", "accept", "coverage", "测试策略", /pace-test, or when test results, coverage gaps, or acceptance readiness are discussed. -allowed-tools: AskUserQuestion, Write, Read, Edit, Glob, Grep, Bash +description: Use when user says "跑测试", "测试覆盖", "验证一下", "验收", "回归", "影响分析", "test", "verify", "accept", "coverage", "测试策略", /pace-test, or when test results, coverage gaps, or acceptance readiness are discussed. NOT for code implementation (use /pace-dev). NOT for code review or approval (use /pace-review). +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Grep, Bash argument-hint: "[accept|strategy|coverage|impact|report|generate|...] [目标]" model: sonnet context: fork @@ -20,7 +20,7 @@ agent: pace-engineer ### accept 的定位 -Gate 2 仅二元判定整体一致性。accept 提供精细能力:逐条验收标准附证据、三级判定(✅/⚠️/❌)、测试预言审查断言实质性、弱覆盖自动降级策略。不做 accept 也能过 Gate 2,但做了的 CR 在 Gate 3 有更充分的证据支撑(详见 verify-procedures.md)。 +Gate 2 仅二元判定整体一致性。accept 提供精细能力:逐条验收标准附证据、三级判定(✅/⚠️/❌)、测试预言审查断言实质性、弱覆盖自动降级策略。不做 accept 也能过 Gate 2,但做了的 CR 在 Gate 3 有更充分的证据支撑(详见 test-procedures-verify.md)。 ## 输入 @@ -66,7 +66,7 @@ $ARGUMENTS: | 参数 | 流程 | 详细规程 | |------|------|---------| | (空) | Layer 1 基础执行 | `test-procedures-core.md` §1 | -| `accept`(旧名 `verify`) | Layer 3 AI 验收验证 | `verify-procedures.md` | +| `accept`(旧名 `verify`) | Layer 3 AI 验收验证 | `test-procedures-verify.md` | | `generate`(旧名 `gen`) | 测试用例生成 | `test-procedures-generate.md`(自包含) | | `strategy` | 测试策略生成 | `test-procedures-strategy-gen.md` | | `coverage` | 需求覆盖分析 | `test-procedures-coverage.md` | diff --git a/skills/pace-test/test-procedures-common.md b/skills/pace-test/test-procedures-common.md index 89614e0..d1e4ebb 100644 --- a/skills/pace-test/test-procedures-common.md +++ b/skills/pace-test/test-procedures-common.md @@ -36,21 +36,9 @@ 3. 检测到多种技术栈 → 返回全部(项目可能是多语言) 4. 无可识别技术栈 → 标注"待定",不阻断后续流程 -## 分层输出约定(SSOT) +## 分层输出约定 -所有子命令支持三级输出详细度,通过 `--brief` / `--detail` 参数控制: - -| 层级 | 触发 | 内容 | 适用场景 | -|------|------|------|---------| -| 摘要 | `--brief` | 1-3 行核心结论(通过率、风险等级、下一步建议) | 自动化消费、快速确认 | -| 标准 | (默认) | 结构化表格 + 汇总 + 建议(当前各规程定义的输出格式) | 日常使用 | -| 详细 | `--detail` | 完整输出含实施指导、历史对比、扩展分析 | 深度审查、首次使用 | - -**规则**: -- 未指定参数时使用"标准"层级(向后兼容,当前行为不变) -- `--brief` 输出可被其他 Skill 或 report 子命令程序化消费 -- 各子命令的"详细"内容由各规程文件定义 -- 子命令摘要行(`--brief`)统一风格:`类型:关键指标1 · 指标2 · 下一步/趋势` +> 三级输出详细度定义见 `knowledge/output-guide.md §分层输出约定`(SSOT)。本 Skill 无额外自动升级规则。 ## 智能推荐(SSOT) diff --git a/skills/pace-test/test-procedures-dryrun.md b/skills/pace-test/test-procedures-dryrun.md index 4d41355..570dd10 100644 --- a/skills/pace-test/test-procedures-dryrun.md +++ b/skills/pace-test/test-procedures-dryrun.md @@ -26,7 +26,7 @@ - 命令检查:实际执行 bash 命令,记录结果 - 意图检查:Claude 按规则判定,输出结论 - 对抗审查(Gate 2):执行对抗审查,输出发现 - - 浏览器验收(Gate 2,前端项目 + Playwright MCP 可用时):按 `verify-procedures.md` L1+ 流程执行,标注"🖥️ 浏览器验收" + - 浏览器验收(Gate 2,前端项目 + Playwright MCP 可用时):按 `test-procedures-verify.md` L1+ 流程执行,标注"🖥️ 浏览器验收" - **不触发 CR 状态转换**——仅输出结果 4. **生成模拟报告** diff --git a/skills/pace-test/test-procedures-flaky.md b/skills/pace-test/test-procedures-flaky.md index a7a28b1..1f2edb6 100644 --- a/skills/pace-test/test-procedures-flaky.md +++ b/skills/pace-test/test-procedures-flaky.md @@ -32,17 +32,18 @@ - 不稳定测试导致的 Gate 1 误拦截次数(从 CR 事件表的重试记录推断) - 受影响的 PF 范围 5. **生成建议** -6. **结果持久化**: - - 将发现的不稳定测试 pattern 和维护问题追加到 `.devpace/metrics/insights.md` - - 格式遵循 insights.md 的 pattern 格式(category: 质量保障),示例: +6. **结果持久化**(通过 pace-learn 统一写入管道): + - 将发现的不稳定测试 pattern 和维护问题构造为学习请求: ``` - ### [日期] 不稳定测试:[检查项名称] - **观察**:[失败率、模式描述] - **规律**:[识别的不稳定模式] - **证据**:[受影响 CR 列表、执行历史] - **建议**:[修复方向] + 请求来源:pace-test flaky + 事件类型:flaky-detection + 建议类型:防御 + 建议标签:质量保障 + 描述:[模式描述] + 证据:[受影响 CR 列表、执行历史] ``` - - 无发现时不写入 + - 交给 pace-learn Step 3 统一写入管道处理(去重 + 置信度初始化 + 格式合规) + - 无发现时不生成学习请求 7. **回写 test-strategy.md**(flaky→strategy 闭环): - 如果 `.devpace/rules/test-strategy.md` 存在且 Step 2 识别到不稳定测试: - 将不稳定测试对应的验收条件状态从 `✅ 已有` 降级为 `⚠️ 不稳定` diff --git a/skills/pace-test/verify-procedures.md b/skills/pace-test/test-procedures-verify.md similarity index 100% rename from skills/pace-test/verify-procedures.md rename to skills/pace-test/test-procedures-verify.md diff --git a/skills/pace-theory/SKILL.md b/skills/pace-theory/SKILL.md index 61dc628..e978675 100644 --- a/skills/pace-theory/SKILL.md +++ b/skills/pace-theory/SKILL.md @@ -1,5 +1,5 @@ --- -description: Use when user asks "为什么", "怎么理解", "概念", "理论", "方法论", "BizDevOps", "原理", "什么是 BR", "什么是 PF", "CR 是什么意思", "价值链", "状态机原理", "追溯", "闭环", "度量", "MoS", "成效指标", "设计决策", "pace-theory", or wants to understand devpace concepts, behavior rationale, or methodology. +description: Use when user asks "为什么", "怎么理解", "概念", "理论", "方法论", "BizDevOps", "原理", "什么是 BR", "什么是 PF", "CR 是什么意思", "价值链", "状态机原理", "追溯", "闭环", "度量", "MoS", "成效指标", "设计决策", "pace-theory", or wants to understand devpace concepts, behavior rationale, or methodology. NOT for specific CR decision audit trail (use /pace-trace). NOT for code implementation (use /pace-dev). allowed-tools: Read, Glob, Grep argument-hint: "[model|objects|spaces|rules|trace|topic|metrics|loops|change|mapping|decisions|vs-devops|sdd|why|all|<关键词>]" model: haiku diff --git a/skills/pace-trace/SKILL.md b/skills/pace-trace/SKILL.md index 5713120..ecc560f 100644 --- a/skills/pace-trace/SKILL.md +++ b/skills/pace-trace/SKILL.md @@ -1,6 +1,6 @@ --- description: Use when user asks "why did devpace decide X", "追溯", "为什么这样做", "决策记录", "决策原因", "架构决策", "ADR", "技术选型", wants to see AI decision trail or manage architecture decisions, or says /pace-trace [CR] [gate/decision/arch] -allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion +allowed-tools: AskUserQuestion, Read, Write, Edit, Glob, Grep argument-hint: "[CR 名称或编号] [gate1|gate2|gate3|intent|change|risk|autonomy|timeline|arch]" model: haiku --- diff --git a/scripts/extract-cr-metadata.mjs b/skills/scripts/extract-cr-metadata.mjs similarity index 89% rename from scripts/extract-cr-metadata.mjs rename to skills/scripts/extract-cr-metadata.mjs index fead02b..cafc7b7 100755 --- a/scripts/extract-cr-metadata.mjs +++ b/skills/scripts/extract-cr-metadata.mjs @@ -3,10 +3,10 @@ * Extract structured metadata from CR markdown files. * * Usage: - * node scripts/extract-cr-metadata.mjs - * node scripts/extract-cr-metadata.mjs --status merged - * node scripts/extract-cr-metadata.mjs --status merged --no-release - * node scripts/extract-cr-metadata.mjs --id CR-001 + * node skills/scripts/extract-cr-metadata.mjs + * node skills/scripts/extract-cr-metadata.mjs --status merged + * node skills/scripts/extract-cr-metadata.mjs --status merged --no-release + * node skills/scripts/extract-cr-metadata.mjs --id CR-001 * * Output: JSON array of CR metadata objects to stdout. * @@ -33,7 +33,7 @@ const filterId = getFlagValue(args, '--id'); const backlogDir = join(devpaceDir, 'backlog'); let crFiles; try { - crFiles = readdirSync(backlogDir).filter(f => /^CR-\d{3}\.md$/.test(f)).sort(); + crFiles = readdirSync(backlogDir).filter(f => /^CR-\d{3,}\.md$/.test(f)).sort(); } catch { console.error(`Error: Cannot read ${backlogDir}`); process.exit(1); @@ -43,7 +43,13 @@ const results = []; for (const fileName of crFiles) { const filePath = join(backlogDir, fileName); - const content = readFileSync(filePath, 'utf-8'); + let content; + try { + content = readFileSync(filePath, 'utf-8'); + } catch (err) { + console.error(`Warning: cannot read ${filePath}: ${err.message}`); + continue; + } const meta = parseCrContent(content, fileName); // Apply filters diff --git a/tests/__pycache__/__init__.cpython-313.pyc b/tests/__pycache__/__init__.cpython-313.pyc deleted file mode 100644 index c5e69f3..0000000 Binary files a/tests/__pycache__/__init__.cpython-313.pyc and /dev/null differ diff --git a/tests/conftest.py b/tests/conftest.py index e38a4e8..f99e899 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -1,8 +1,9 @@ """Shared fixtures and constants for devpace test suite.""" +import re from pathlib import Path -import pytest +import yaml # ── Paths ────────────────────────────────────────────────────────────────── DEVPACE_ROOT = Path(__file__).resolve().parent.parent # devpace/ @@ -10,6 +11,11 @@ PRODUCT_DIRS = ["rules", "skills", "knowledge", ".claude-plugin"] DEV_DIRS = [".claude", "docs"] +SKILLS_ROOT = DEVPACE_ROOT / "skills" +SCHEMA_DIR = DEVPACE_ROOT / "knowledge" / "_schema" +TEMPLATE_DIR = DEVPACE_ROOT / "skills" / "pace-init" / "templates" +RULES_FILE = DEVPACE_ROOT / "rules" / "devpace-rules.md" + # ── Skill / Schema / Template inventories ────────────────────────────────── SKILL_NAMES = [ "pace-biz", @@ -25,9 +31,9 @@ "pace-release", "pace-retro", "pace-review", - "pace-sync", "pace-role", "pace-status", + "pace-sync", "pace-test", "pace-theory", "pace-trace", @@ -129,6 +135,25 @@ ("deployed", "rolled_back"), ] +# ── Shared utilities ───────────────────────────────────────────────────── + +def parse_frontmatter(path): + """Extract YAML frontmatter; returns None if missing or malformed.""" + text = path.read_text(encoding="utf-8") + if not text.startswith("---"): + return None + end = text.find("---", 3) + if end == -1: + return None + return yaml.safe_load(text[3:end]) + + +def headings(text): + """Extract markdown headings as (level, title) tuples.""" + return [(len(m.group(1)), m.group(2).strip()) + for m in re.finditer(r'^(#{1,6})\s+(.+)$', text, re.MULTILINE)] + + # ── Workspace exclusion ────────────────────────────────────────────────── # skill-creator evaluation workspaces (*-workspace/) are gitignored but may # exist on disk. Tests scanning product-layer directories must skip them. @@ -137,58 +162,16 @@ def _is_workspace_path(p: Path) -> bool: """Return True if any ancestor directory name ends with '-workspace'.""" return any(part.endswith("-workspace") for part in p.parts) -# ── Fixtures ─────────────────────────────────────────────────────────────── - -@pytest.fixture -def devpace_root(): - """Return the devpace project root as a Path.""" - return DEVPACE_ROOT - -@pytest.fixture -def product_md_files(): - """Yield all .md files under product-layer directories.""" +def product_md_files(exclude_workspace=True): + """Collect all .md files under product-layer directories.""" files = [] for d in PRODUCT_DIRS: dirpath = DEVPACE_ROOT / d if dirpath.is_dir(): - files.extend(dirpath.rglob("*.md")) + for f in dirpath.rglob("*.md"): + if exclude_workspace and _is_workspace_path(f): + continue + files.append(f) return files - -@pytest.fixture -def skill_dirs(): - """Return list of (name, path) tuples for each skill directory.""" - skills_root = DEVPACE_ROOT / "skills" - return [ - (name, skills_root / name) - for name in SKILL_NAMES - if (skills_root / name).is_dir() - ] - - -@pytest.fixture -def template_dir(): - """Return the templates directory path.""" - return DEVPACE_ROOT / "skills" / "pace-init" / "templates" - - -@pytest.fixture -def schema_dir(): - """Return the schema directory path.""" - return DEVPACE_ROOT / "knowledge" / "_schema" - - -@pytest.fixture -def eval_dir(): - """Return the evaluation directory path.""" - return EVAL_DIR - - -@pytest.fixture -def eval_skill_dirs(): - """Return list of (name, path) tuples for each Skill's eval directory.""" - return [ - (name, EVAL_DIR / name) - for name in SKILL_NAMES - ] diff --git a/tests/hooks/_test-helpers.mjs b/tests/hooks/_test-helpers.mjs new file mode 100644 index 0000000..f44cc19 --- /dev/null +++ b/tests/hooks/_test-helpers.mjs @@ -0,0 +1,96 @@ +/** + * Shared test helpers for hook integration tests. + * Eliminates duplication of runHook / createTmpProject / cleanupDir across test files. + */ +import { spawn } from 'node:child_process'; +import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; +import { fileURLToPath } from 'node:url'; +import { dirname } from 'node:path'; + +/** + * Resolve the absolute path to a hook script relative to the calling test file. + * @param {string} importMetaUrl - import.meta.url of the calling file + * @param {string} hookRelPath - relative path from hooks/ dir, e.g. 'post-cr-update.mjs' + */ +export function resolveHookScript(importMetaUrl, hookRelPath) { + const callerDir = dirname(fileURLToPath(importMetaUrl)); + return join(callerDir, '..', '..', 'hooks', hookRelPath); +} + +/** + * Create a temporary project directory with .devpace/ structure. + * @param {string} prefix - prefix for the tmp dir name + * @param {object} [options] + * @param {string[]} [options.subdirs] - subdirs to create under .devpace/ (default: ['.devpace', 'backlog']) + */ +export function createTmpProject(prefix, { subdirs } = {}) { + const dir = join(tmpdir(), `devpace-${prefix}-${Date.now()}-${Math.random().toString(36).slice(2)}`); + const dirs = subdirs || ['.devpace', 'backlog']; + // Build the deepest path: join all subdirs under .devpace if 'backlog' is present, + // otherwise just create .devpace with the specified subdirs + if (dirs.includes('backlog')) { + mkdirSync(join(dir, '.devpace', 'backlog'), { recursive: true }); + } else { + mkdirSync(join(dir, '.devpace'), { recursive: true }); + } + // Create any additional subdirs + for (const sub of dirs) { + if (sub !== '.devpace' && sub !== 'backlog') { + mkdirSync(join(dir, '.devpace', sub), { recursive: true }); + } + } + return dir; +} + +/** + * Remove a temporary directory. + */ +export function cleanupDir(dir) { + if (existsSync(dir)) { + rmSync(dir, { recursive: true, force: true }); + } +} + +/** + * Run a hook script as a subprocess with JSON on stdin. + * @param {string} hookScript - absolute path to the hook script + * @param {object} stdinJson - JSON object to pass on stdin + * @param {string} projectDir - CLAUDE_PROJECT_DIR value + * @returns {Promise<{exitCode: number, stdout: string, stderr: string}>} + */ +export function runHook(hookScript, stdinJson, projectDir) { + return new Promise((resolve) => { + const child = spawn('node', [hookScript], { + env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, + stdio: ['pipe', 'pipe', 'pipe'], + }); + let stdout = ''; + let stderr = ''; + child.stdout.on('data', (data) => { stdout += data.toString(); }); + child.stderr.on('data', (data) => { stderr += data.toString(); }); + child.on('close', (code) => { + resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); + }); + child.stdin.write(JSON.stringify(stdinJson)); + child.stdin.end(); + }); +} + +/** + * Write a CR file into .devpace/backlog/. + * @returns {string} the full path to the CR file + */ +export function writeCr(projectDir, crId, content) { + const crPath = join(projectDir, '.devpace', 'backlog', `CR-${crId}.md`); + writeFileSync(crPath, content); + return crPath; +} + +/** + * Write state.md into .devpace/. + */ +export function writeState(projectDir, content) { + writeFileSync(join(projectDir, '.devpace', 'state.md'), content); +} diff --git a/tests/hooks/test_intent_detect.mjs b/tests/hooks/test_intent_detect.mjs index dc665e0..6311c42 100644 --- a/tests/hooks/test_intent_detect.mjs +++ b/tests/hooks/test_intent_detect.mjs @@ -4,46 +4,17 @@ */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { mkdirSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, createTmpProject, cleanupDir, runHook as _runHook, writeState, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'intent-detect.mjs'); - -function createTmpProject() { - const dir = join(tmpdir(), `devpace-intent-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); - mkdirSync(join(dir, '.devpace'), { recursive: true }); - writeFileSync(join(dir, '.devpace', 'state.md'), '> 目标:测试\n'); - return dir; -} - -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} +const HOOK_SCRIPT = resolveHookScript(import.meta.url, 'intent-detect.mjs'); function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - let stdout = ''; - let stderr = ''; - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); + return _runHook(HOOK_SCRIPT, stdinJson, projectDir); } describe('intent-detect: no .devpace', () => { @@ -61,7 +32,8 @@ describe('intent-detect: change trigger words', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('intent-test', { subdirs: ['.devpace'] }); + writeState(projectDir, '> 目标:测试\n'); }); afterEach(() => { @@ -71,7 +43,7 @@ describe('intent-detect: change trigger words', () => { const triggerWords = [ '不做了', '先不搞', '加一个', '改一下', '优先级', '延后', '提前', '砍掉', - '插入', '新增需求', '先做这个', '恢复之前' + '插入', '新增需求', '先做这个', '恢复之前', ]; for (const word of triggerWords) { @@ -98,4 +70,16 @@ describe('intent-detect: change trigger words', () => { const result = await runHook({ content: '不做了这个任务' }, projectDir); assert.equal(result.exitCode, 0, 'Intent detect should never block (exit 2)'); }); + + it('skips detection when technical context words present', async () => { + const result = await runHook({ content: '恢复之前的 git stash' }, projectDir); + assert.equal(result.exitCode, 0); + assert.equal(result.stdout, '', 'Should not trigger when tech context detected'); + }); + + it('skips detection for code formatting requests', async () => { + const result = await runHook({ content: '帮我格式化一下代码缩进' }, projectDir); + assert.equal(result.exitCode, 0); + assert.equal(result.stdout, '', 'Should not trigger for code formatting'); + }); }); diff --git a/tests/hooks/test_pace_dev_scope_check.mjs b/tests/hooks/test_pace_dev_scope_check.mjs index 3ca40a4..a09da9a 100644 --- a/tests/hooks/test_pace_dev_scope_check.mjs +++ b/tests/hooks/test_pace_dev_scope_check.mjs @@ -1,74 +1,27 @@ /** - * Integration tests for hooks/pace-dev-scope-check.mjs + * Integration tests for hooks/skill/pace-dev-scope-check.mjs * Tests the hook by spawning it as a subprocess with simulated stdin JSON. * Run: node --test tests/hooks/test_pace_dev_scope_check.mjs */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, mkdirSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, createTmpProject, cleanupDir, + runHook as _sharedRunHook, writeCr, writeState, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'pace-dev-scope-check.mjs'); - -// ── Test helpers ──────────────────────────────────────────────────── - -function createTmpProject() { - const dir = join(tmpdir(), `devpace-scope-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); - mkdirSync(join(dir, '.devpace', 'backlog'), { recursive: true }); - return dir; -} - -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} +const HOOK_SCRIPT = resolveHookScript(import.meta.url, join('skill', 'pace-dev-scope-check.mjs')); function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - - let stdout = ''; - let stderr = ''; - - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); -} - -/** - * Create a CR file with scope-relevant content. - */ -function writeCr(projectDir, crId, content) { - const crPath = join(projectDir, '.devpace', 'backlog', `CR-${crId}.md`); - writeFileSync(crPath, content); - return crPath; + return _sharedRunHook(HOOK_SCRIPT, stdinJson, projectDir); } -/** - * Write state.md with an active CR reference. - */ -function writeState(projectDir, crId) { - writeFileSync( - join(projectDir, '.devpace', 'state.md'), - `> 目标:测试项目\n\n- **进行中**:实现功能 → CR-${crId}\n\n下一步:继续\n` - ); +/** Write state.md with an active CR reference. */ +function writeActiveState(projectDir, crId) { + writeState(projectDir, `> 目标:测试项目\n\n- **进行中**:实现功能 → CR-${crId}\n\n下一步:继续\n`); } // ── Tests: No .devpace → exit 0 ──────────────────────────────────── @@ -88,7 +41,7 @@ describe('pace-dev-scope-check: no .devpace', () => { describe('pace-dev-scope-check: no file_path', () => { let projectDir; - beforeEach(() => { projectDir = createTmpProject(); }); + beforeEach(() => { projectDir = createTmpProject('scope-test'); }); afterEach(() => { cleanupDir(projectDir); }); it('exits 0 when tool input has no file_path', async () => { @@ -102,15 +55,15 @@ describe('pace-dev-scope-check: no file_path', () => { }); }); -// ── Tests: Gate 3 — block automated approved state change ────────── +// ── Tests: CR writes — Gate 3 delegated to global hook ─────────────── -describe('pace-dev-scope-check: Gate 3 enforcement', () => { +describe('pace-dev-scope-check: CR writes (Gate 3 delegated to global hook)', () => { let projectDir; - beforeEach(() => { projectDir = createTmpProject(); }); + beforeEach(() => { projectDir = createTmpProject('scope-test'); }); afterEach(() => { cleanupDir(projectDir); }); - it('blocks state change to approved on CR file (exit 2)', async () => { + it('allows all CR writes including approved state (Gate 3 handled by global hook)', async () => { const crPath = writeCr(projectDir, '001', '# CR-001\n\n- **状态**:in_review\n'); const input = { tool_input: { @@ -119,21 +72,7 @@ describe('pace-dev-scope-check: Gate 3 enforcement', () => { } }; const result = await runHook(input, projectDir); - assert.equal(result.exitCode, 2, `Expected exit 2 (Gate 3 block) but got ${result.exitCode}`); - assert.ok(result.stderr.includes('Gate 3'), 'Should mention Gate 3'); - }); - - it('blocks Edit with new_string changing to approved (exit 2)', async () => { - const crPath = writeCr(projectDir, '002', '# CR-002\n\n- **状态**:in_review\n'); - const input = { - tool_input: { - file_path: crPath, - old_string: '- **状态**:in_review', - new_string: '- **状态**:approved' - } - }; - const result = await runHook(input, projectDir); - assert.equal(result.exitCode, 2); + assert.equal(result.exitCode, 0, 'Gate 3 is no longer enforced here — delegated to global pre-tool-use.mjs'); }); it('allows non-approved state change on CR file (exit 0)', async () => { @@ -166,7 +105,7 @@ describe('pace-dev-scope-check: Gate 3 enforcement', () => { describe('pace-dev-scope-check: .devpace fast-path', () => { let projectDir; - beforeEach(() => { projectDir = createTmpProject(); }); + beforeEach(() => { projectDir = createTmpProject('scope-test'); }); afterEach(() => { cleanupDir(projectDir); }); it('allows write to .devpace/state.md', async () => { @@ -197,7 +136,7 @@ describe('pace-dev-scope-check: .devpace fast-path', () => { describe('pace-dev-scope-check: no active CR', () => { let projectDir; - beforeEach(() => { projectDir = createTmpProject(); }); + beforeEach(() => { projectDir = createTmpProject('scope-test'); }); afterEach(() => { cleanupDir(projectDir); }); it('allows file write when no state.md exists', async () => { @@ -213,10 +152,7 @@ describe('pace-dev-scope-check: no active CR', () => { }); it('allows file write when state.md has no active CR', async () => { - writeFileSync( - join(projectDir, '.devpace', 'state.md'), - '> 目标:测试\n\n- 当前工作:(无)\n' - ); + writeState(projectDir, '> 目标:测试\n\n- 当前工作:(无)\n'); const input = { tool_input: { file_path: join(projectDir, 'src', 'main.js'), @@ -234,8 +170,8 @@ describe('pace-dev-scope-check: scope validation', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); - writeState(projectDir, '010'); + projectDir = createTmpProject('scope-test'); + writeActiveState(projectDir, '010'); }); afterEach(() => { cleanupDir(projectDir); }); @@ -300,9 +236,6 @@ describe('pace-dev-scope-check: scope validation', () => { }); it('warns for different file in same directory (startsWith needs prefix match)', async () => { - // matchesScope's same-directory check uses startsWith, which requires - // the target path to begin with the pattern's directory prefix. - // With absolute tmp paths vs relative patterns, this won't match. writeCr(projectDir, '010', '# CR-010\n\n- **状态**:developing\n\n## 执行计划\n\n**src/auth/login.js**:实现登录逻辑\n' ); @@ -338,14 +271,11 @@ describe('pace-dev-scope-check: scope validation', () => { describe('pace-dev-scope-check: degradation', () => { let projectDir; - beforeEach(() => { projectDir = createTmpProject(); }); + beforeEach(() => { projectDir = createTmpProject('scope-test'); }); afterEach(() => { cleanupDir(projectDir); }); it('allows when state.md references non-existent CR', async () => { - writeFileSync( - join(projectDir, '.devpace', 'state.md'), - '> 目标:测试\n\n- **进行中**:实现功能 → CR-999\n' - ); + writeState(projectDir, '> 目标:测试\n\n- **进行中**:实现功能 → CR-999\n'); // CR-999.md does not exist const input = { tool_input: { diff --git a/tests/hooks/test_post_cr_update.mjs b/tests/hooks/test_post_cr_update.mjs index bc73c10..611dc5f 100644 --- a/tests/hooks/test_post_cr_update.mjs +++ b/tests/hooks/test_post_cr_update.mjs @@ -4,45 +4,17 @@ */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, mkdirSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, createTmpProject, cleanupDir, runHook as _runHook, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'post-cr-update.mjs'); - -function createTmpProject() { - const dir = join(tmpdir(), `devpace-post-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); - mkdirSync(join(dir, '.devpace', 'backlog'), { recursive: true }); - return dir; -} - -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} +const HOOK_SCRIPT = resolveHookScript(import.meta.url, 'post-cr-update.mjs'); function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - let stdout = ''; - let stderr = ''; - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); + return _runHook(HOOK_SCRIPT, stdinJson, projectDir); } describe('post-cr-update: no .devpace', () => { @@ -59,7 +31,7 @@ describe('post-cr-update: merged CR detection', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('post-test'); }); afterEach(() => { diff --git a/tests/hooks/test_post_tool_failure.mjs b/tests/hooks/test_post_tool_failure.mjs index 2d48437..346ec71 100644 --- a/tests/hooks/test_post_tool_failure.mjs +++ b/tests/hooks/test_post_tool_failure.mjs @@ -5,59 +5,29 @@ */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, mkdirSync, rmSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, cleanupDir, runHook as _runHook, writeState, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'post-tool-failure.mjs'); +const HOOK_SCRIPT = resolveHookScript(import.meta.url, 'post-tool-failure.mjs'); -// ── Test helpers ──────────────────────────────────────────────────── +function runHook(stdinJson, projectDir) { + return _runHook(HOOK_SCRIPT, stdinJson, projectDir); +} +/** Local variant: creates project with optional advance-mode state.md. */ function createTmpProject(advanceMode) { const dir = join(tmpdir(), `devpace-failure-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); mkdirSync(join(dir, '.devpace', 'backlog'), { recursive: true }); if (advanceMode) { - writeFileSync( - join(dir, '.devpace', 'state.md'), - '> 目标:测试\n\n- **进行中**:开发中 → CR-001\n\n下一步:继续\n' - ); + writeState(dir, '> 目标:测试\n\n- **进行中**:开发中 → CR-001\n\n下一步:继续\n'); } return dir; } -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} - -function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - - let stdout = ''; - let stderr = ''; - - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); -} - // ── Tests: No .devpace → silent exit ─────────────────────────────── describe('post-tool-failure: no .devpace', () => { @@ -113,7 +83,7 @@ describe('post-tool-failure: advance mode + CR file', () => { assert.equal(result.exitCode, 0); assert.ok(result.stdout.includes('tool-failure'), 'Should include tool-failure prefix'); assert.ok(result.stdout.includes('CR'), 'Should mention CR'); - assert.ok(result.stdout.includes('consistency'), 'Should mention consistency check'); + assert.ok(result.stdout.includes('ACTION'), 'Should include remediation ACTION'); }); }); diff --git a/tests/hooks/test_pre_tool_use.mjs b/tests/hooks/test_pre_tool_use.mjs index 5714260..441f894 100644 --- a/tests/hooks/test_pre_tool_use.mjs +++ b/tests/hooks/test_pre_tool_use.mjs @@ -5,55 +5,17 @@ */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, mkdirSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, createTmpProject, cleanupDir, runHook as _runHook, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'pre-tool-use.mjs'); +const HOOK_SCRIPT = resolveHookScript(import.meta.url, 'pre-tool-use.mjs'); -// ── Test helpers ──────────────────────────────────────────────────── - -function createTmpProject() { - const dir = join(tmpdir(), `devpace-hook-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); - mkdirSync(join(dir, '.devpace', 'backlog'), { recursive: true }); - return dir; -} - -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} - -/** - * Run the pre-tool-use hook with given stdin JSON and env. - * Returns { exitCode, stdout, stderr }. - */ function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - - let stdout = ''; - let stderr = ''; - - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); + return _runHook(HOOK_SCRIPT, stdinJson, projectDir); } // ── Tests: No .devpace → exit 0 ──────────────────────────────────── @@ -74,7 +36,7 @@ describe('pre-tool-use: explore mode enforcement', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('hook-test'); // state.md with NO active work → explore mode writeFileSync( join(projectDir, '.devpace', 'state.md'), @@ -98,17 +60,55 @@ describe('pre-tool-use: explore mode enforcement', () => { assert.ok(result.stderr.includes('devpace:blocked'), 'Should output blocked message'); }); - it('blocks write to .devpace/backlog/CR-001.md in explore mode (exit 2)', async () => { + it('allows write to CR file in explore mode when no state escalation', async () => { + const crPath = join(projectDir, '.devpace', 'backlog', 'CR-001.md'); + writeFileSync(crPath, '# CR-001\n\n- **状态**:created\n'); + const input = { + tool_input: { + file_path: crPath, + content: '# CR-001\n\n- **状态**:created\n- **优先级**:high\n' + } + }; + const result = await runHook(input, projectDir); + assert.equal(result.exitCode, 0, `Expected exit 0 (management Skill writes allowed) but got ${result.exitCode}`); + }); + + it('blocks CR state escalation to developing in explore mode (exit 2)', async () => { const crPath = join(projectDir, '.devpace', 'backlog', 'CR-001.md'); writeFileSync(crPath, '# CR-001\n\n- **状态**:created\n'); const input = { tool_input: { file_path: crPath, - content: '# CR-001 modified' + content: '# CR-001\n\n- **状态**:developing\n' + } + }; + const result = await runHook(input, projectDir); + assert.equal(result.exitCode, 2, `Expected exit 2 (state escalation blocked) but got ${result.exitCode}`); + assert.ok(result.stderr.includes('devpace:blocked'), 'Should output blocked message'); + }); + + it('allows new CR creation in explore mode (file does not exist)', async () => { + const crPath = join(projectDir, '.devpace', 'backlog', 'CR-NEW.md'); + // File does not exist — new CR creation by pace-change + const input = { + tool_input: { + file_path: crPath, + content: '# CR-NEW\n\n- **状态**:created\n' + } + }; + const result = await runHook(input, projectDir); + assert.equal(result.exitCode, 0, `Expected exit 0 (new CR creation allowed) but got ${result.exitCode}`); + }); + + it('allows write to .devpace/project.md in explore mode', async () => { + const input = { + tool_input: { + file_path: join(projectDir, '.devpace', 'project.md'), + content: '# Project\n' } }; const result = await runHook(input, projectDir); - assert.equal(result.exitCode, 2, `Expected exit 2 but got ${result.exitCode}`); + assert.equal(result.exitCode, 0, 'Management files like project.md should be allowed in explore mode'); }); it('allows write to .devpace/rules/ in explore mode (config files)', async () => { @@ -152,7 +152,7 @@ describe('pre-tool-use: advance mode allows .devpace writes', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('hook-test'); // state.md with active work → advance mode writeFileSync( join(projectDir, '.devpace', 'state.md'), @@ -195,7 +195,7 @@ describe('pre-tool-use: Gate 3 enforcement', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('hook-test'); // Advance mode active writeFileSync( join(projectDir, '.devpace', 'state.md'), @@ -254,7 +254,7 @@ describe('pre-tool-use: advisory gate reminders', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('hook-test'); writeFileSync( join(projectDir, '.devpace', 'state.md'), '> 目标:测试项目\n\n- **进行中**:开发中\n\n下一步:继续\n' diff --git a/tests/hooks/test_pulse_counter.mjs b/tests/hooks/test_pulse_counter.mjs index 020092e..3ed7e98 100644 --- a/tests/hooks/test_pulse_counter.mjs +++ b/tests/hooks/test_pulse_counter.mjs @@ -4,45 +4,17 @@ */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, readFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, readFileSync, mkdirSync, existsSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, createTmpProject, cleanupDir, runHook as _runHook, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'pulse-counter.mjs'); - -function createTmpProject() { - const dir = join(tmpdir(), `devpace-pulse-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); - mkdirSync(join(dir, '.devpace'), { recursive: true }); - return dir; -} - -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} +const HOOK_SCRIPT = resolveHookScript(import.meta.url, 'pulse-counter.mjs'); function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - let stdout = ''; - let stderr = ''; - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); + return _runHook(HOOK_SCRIPT, stdinJson, projectDir); } describe('pulse-counter: no .devpace', () => { @@ -59,7 +31,7 @@ describe('pulse-counter: counting and reminders', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('pulse-test', { subdirs: ['.devpace'] }); }); afterEach(() => { cleanupDir(projectDir); }); @@ -86,7 +58,7 @@ describe('pulse-counter: counting and reminders', () => { const result = await runHook({}, projectDir); assert.equal(result.exitCode, 0); assert.equal(readFileSync(counterPath, 'utf-8'), '10'); - assert.ok(result.stdout.includes('devpace:pulse-reminder'), 'Should output pulse reminder at 10'); + assert.ok(result.stdout.includes('devpace:write-volume'), 'Should output pulse reminder at 10'); }); it('outputs pulse reminder at count 20', async () => { @@ -94,7 +66,7 @@ describe('pulse-counter: counting and reminders', () => { writeFileSync(counterPath, '19'); const result = await runHook({}, projectDir); assert.equal(result.exitCode, 0); - assert.ok(result.stdout.includes('devpace:pulse-reminder'), 'Should output pulse reminder at 20'); + assert.ok(result.stdout.includes('devpace:write-volume'), 'Should output pulse reminder at 20'); }); it('no reminder at non-10 counts', async () => { diff --git a/tests/hooks/test_subagent_stop.mjs b/tests/hooks/test_subagent_stop.mjs index f5606a8..d56bf0b 100644 --- a/tests/hooks/test_subagent_stop.mjs +++ b/tests/hooks/test_subagent_stop.mjs @@ -4,45 +4,17 @@ */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, mkdirSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, createTmpProject, cleanupDir, runHook as _runHook, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'subagent-stop.mjs'); - -function createTmpProject() { - const dir = join(tmpdir(), `devpace-subagent-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); - mkdirSync(join(dir, '.devpace', 'backlog'), { recursive: true }); - return dir; -} - -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} +const HOOK_SCRIPT = resolveHookScript(import.meta.url, 'subagent-stop.mjs'); function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - let stdout = ''; - let stderr = ''; - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); + return _runHook(HOOK_SCRIPT, stdinJson, projectDir); } describe('subagent-stop: no .devpace', () => { @@ -59,7 +31,7 @@ describe('subagent-stop: non-devpace agents', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('subagent-test'); writeFileSync(join(projectDir, '.devpace', 'state.md'), '> 目标:测试\n'); }); @@ -76,7 +48,7 @@ describe('subagent-stop: consistency checks', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('subagent-test'); }); afterEach(() => { cleanupDir(projectDir); }); diff --git a/tests/hooks/test_sync_push.mjs b/tests/hooks/test_sync_push.mjs index 557419d..cece4f3 100644 --- a/tests/hooks/test_sync_push.mjs +++ b/tests/hooks/test_sync_push.mjs @@ -5,59 +5,20 @@ */ import { describe, it, beforeEach, afterEach } from 'node:test'; import assert from 'node:assert/strict'; -import { spawn } from 'node:child_process'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, mkdirSync, rmSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; -import { fileURLToPath } from 'node:url'; -import { dirname } from 'node:path'; +import { + resolveHookScript, createTmpProject, cleanupDir, runHook as _runHook, writeCr, +} from './_test-helpers.mjs'; -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -const HOOK_SCRIPT = join(__dirname, '..', '..', 'hooks', 'sync-push.mjs'); - -// ── Test helpers ──────────────────────────────────────────────────── - -function createTmpProject() { - const dir = join(tmpdir(), `devpace-sync-test-${Date.now()}-${Math.random().toString(36).slice(2)}`); - mkdirSync(join(dir, '.devpace', 'backlog'), { recursive: true }); - mkdirSync(join(dir, '.devpace', 'integrations'), { recursive: true }); - return dir; -} - -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} +const HOOK_SCRIPT = resolveHookScript(import.meta.url, 'sync-push.mjs'); function runHook(stdinJson, projectDir) { - return new Promise((resolve) => { - const child = spawn('node', [HOOK_SCRIPT], { - env: { ...process.env, CLAUDE_PROJECT_DIR: projectDir }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - - let stdout = ''; - let stderr = ''; - - child.stdout.on('data', (data) => { stdout += data.toString(); }); - child.stderr.on('data', (data) => { stderr += data.toString(); }); - - child.on('close', (code) => { - resolve({ exitCode: code, stdout: stdout.trim(), stderr: stderr.trim() }); - }); - - child.stdin.write(JSON.stringify(stdinJson)); - child.stdin.end(); - }); + return _runHook(HOOK_SCRIPT, stdinJson, projectDir); } -function writeCr(projectDir, crId, content) { - const crPath = join(projectDir, '.devpace', 'backlog', `CR-${crId}.md`); - writeFileSync(crPath, content); - return crPath; -} +// ── Local helpers (sync-push specific) ────────────────────────────── function writeSyncMapping(projectDir, content) { writeFileSync( @@ -90,7 +51,7 @@ describe('sync-push: no .devpace', () => { describe('sync-push: non-CR file', () => { let projectDir; - beforeEach(() => { projectDir = createTmpProject(); }); + beforeEach(() => { projectDir = createTmpProject('sync-test', { subdirs: ['backlog', 'integrations'] }); }); afterEach(() => { cleanupDir(projectDir); }); it('exits 0 silently for non-CR file path', async () => { @@ -118,7 +79,7 @@ describe('sync-push: no sync-mapping', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('sync-test', { subdirs: ['backlog', 'integrations'] }); // Remove default integrations dir to simulate no sync-mapping rmSync(join(projectDir, '.devpace', 'integrations'), { recursive: true, force: true }); }); @@ -138,7 +99,7 @@ describe('sync-push: cache hit (state unchanged)', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('sync-test', { subdirs: ['backlog', 'integrations'] }); writeSyncMapping(projectDir); }); afterEach(() => { cleanupDir(projectDir); }); @@ -160,7 +121,7 @@ describe('sync-push: merged transition with external link', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('sync-test', { subdirs: ['backlog', 'integrations'] }); writeSyncMapping(projectDir); }); afterEach(() => { cleanupDir(projectDir); }); @@ -175,7 +136,7 @@ describe('sync-push: merged transition with external link', () => { assert.ok(result.stdout.includes('sync-push'), 'Should include sync-push prefix'); assert.ok(result.stdout.includes('merged'), 'Should mention merged state'); assert.ok(result.stdout.includes('Issue #123'), 'Should include external link text'); - assert.ok(result.stdout.includes('Auto-execute'), 'Should use directive language for merged'); + assert.ok(result.stdout.includes('Suggest'), 'Should use advisory language for merged'); }); it('outputs directive for new CR (no cached state) transitioning to merged', async () => { @@ -196,7 +157,7 @@ describe('sync-push: other state transitions with external link', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('sync-test', { subdirs: ['backlog', 'integrations'] }); writeSyncMapping(projectDir); }); afterEach(() => { cleanupDir(projectDir); }); @@ -220,7 +181,7 @@ describe('sync-push: no external link', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('sync-test', { subdirs: ['backlog', 'integrations'] }); writeSyncMapping(projectDir); }); afterEach(() => { cleanupDir(projectDir); }); @@ -242,7 +203,7 @@ describe('sync-push: no state field in CR', () => { let projectDir; beforeEach(() => { - projectDir = createTmpProject(); + projectDir = createTmpProject('sync-test', { subdirs: ['backlog', 'integrations'] }); writeSyncMapping(projectDir); }); afterEach(() => { cleanupDir(projectDir); }); diff --git a/tests/hooks/test_utils.mjs b/tests/hooks/test_utils.mjs index 8b0fb8e..eb37735 100644 --- a/tests/hooks/test_utils.mjs +++ b/tests/hooks/test_utils.mjs @@ -4,9 +4,10 @@ */ import { describe, it } from 'node:test'; import assert from 'node:assert/strict'; -import { writeFileSync, mkdirSync, rmSync, existsSync } from 'node:fs'; +import { writeFileSync, mkdirSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; +import { cleanupDir } from './_test-helpers.mjs'; import { getProjectDir, @@ -17,6 +18,7 @@ import { isAdvanceMode, extractWriteContent, isStateChangeToApproved, + isStateEscalation, readSyncStateCache, updateSyncStateCache } from '../../hooks/lib/utils.mjs'; @@ -29,12 +31,6 @@ function createTmpDir() { return dir; } -function cleanupDir(dir) { - if (existsSync(dir)) { - rmSync(dir, { recursive: true, force: true }); - } -} - // ── extractFilePath ───────────────────────────────────────────────── describe('extractFilePath', () => { @@ -223,6 +219,45 @@ describe('isStateChangeToApproved', () => { }); }); +// ── isStateEscalation ───────────────────────────────────────────── + +describe('isStateEscalation', () => { + it('detects developing as escalation', () => { + assert.equal(isStateEscalation('- **状态**:developing'), true); + }); + + it('detects verifying as escalation', () => { + assert.equal(isStateEscalation('- **状态**:verifying'), true); + }); + + it('detects in_review as escalation', () => { + assert.equal(isStateEscalation('- **状态**:in_review'), true); + }); + + it('does not flag created (management Skill state)', () => { + assert.equal(isStateEscalation('- **状态**:created'), false); + }); + + it('does not flag paused (management Skill state)', () => { + assert.equal(isStateEscalation('- **状态**:paused'), false); + }); + + it('does not flag approved/merged', () => { + assert.equal(isStateEscalation('- **状态**:approved'), false); + assert.equal(isStateEscalation('- **状态**:merged'), false); + }); + + it('returns false for empty/null', () => { + assert.equal(isStateEscalation(''), false); + assert.equal(isStateEscalation(null), false); + assert.equal(isStateEscalation(undefined), false); + }); + + it('works with ASCII colon', () => { + assert.equal(isStateEscalation('- **状态**: developing'), true); + }); +}); + // ── readSyncStateCache ────────────────────────────────────────────── describe('readSyncStateCache', () => { diff --git a/tests/integration/test_plugin_loading.sh b/tests/integration/test_plugin_loading.sh index 75b36a0..4d1b144 100755 --- a/tests/integration/test_plugin_loading.sh +++ b/tests/integration/test_plugin_loading.sh @@ -43,15 +43,18 @@ else echo -e "${GREEN}PASS${NC}" fi -# ── TC-PL-02: All 14 skills discovered ──────────────────────────────── -echo -n "TC-PL-02: All 13 skills discovered... " +# ── TC-PL-02: All 19 skills discovered ──────────────────────────────── +echo -n "TC-PL-02: All 19 skills discovered... " EXPECTED_SKILLS=( + "pace-biz" "pace-change" "pace-dev" "pace-feedback" + "pace-guard" "pace-init" "pace-learn" + "pace-next" "pace-plan" "pace-pulse" "pace-release" @@ -59,7 +62,10 @@ EXPECTED_SKILLS=( "pace-review" "pace-role" "pace-status" + "pace-sync" + "pace-test" "pace-theory" + "pace-trace" ) # Check that skill directories with SKILL.md exist @@ -71,7 +77,7 @@ for skill in "${EXPECTED_SKILLS[@]}"; do done if [ ${#MISSING_SKILLS[@]} -eq 0 ]; then - echo -e "${GREEN}PASS${NC} (14/14 skills have SKILL.md)" + echo -e "${GREEN}PASS${NC} (19/19 skills have SKILL.md)" else echo -e "${RED}FAIL${NC}" echo " Missing skills: ${MISSING_SKILLS[*]}" diff --git a/tests/static/test_agent_skill_tools.py b/tests/static/test_agent_skill_tools.py index 9ee5fc7..d0ba1eb 100644 --- a/tests/static/test_agent_skill_tools.py +++ b/tests/static/test_agent_skill_tools.py @@ -1,16 +1,7 @@ """TC-AST: Agent-Skill tool consistency for context:fork skills.""" import pytest import yaml -from tests.conftest import DEVPACE_ROOT, SKILL_NAMES - - -def _parse_frontmatter(path): - """Extract YAML frontmatter from a markdown file.""" - text = path.read_text(encoding="utf-8") - if not text.startswith("---"): - return None - end = text.index("---", 3) - return yaml.safe_load(text[3:end]) +from tests.conftest import DEVPACE_ROOT, SKILL_NAMES, parse_frontmatter def _parse_tools_list(raw): @@ -34,7 +25,7 @@ def _forked_skill_agent_pairs(): skill_md = skills_root / name / "SKILL.md" if not skill_md.exists(): continue - fm = _parse_frontmatter(skill_md) + fm = parse_frontmatter(skill_md) if fm is None: continue if fm.get("context") != "fork" or "agent" not in fm: @@ -85,7 +76,7 @@ def test_tc_ast_02_agent_has_tools( """TC-AST-02: The referenced Agent must declare a tools field.""" if not agent_path.exists(): pytest.skip(f"Agent file {agent_path.name} missing (covered by TC-AST-01)") - fm = _parse_frontmatter(agent_path) + fm = parse_frontmatter(agent_path) assert fm is not None, f"Agent {agent_name} has no frontmatter" assert "tools" in fm, ( f"Agent '{agent_name}' (used by Skill '{skill_name}') " @@ -109,8 +100,8 @@ def test_tc_ast_03_agent_tools_superset_of_skill( if not agent_path.exists(): pytest.skip(f"Agent file {agent_path.name} missing (covered by TC-AST-01)") - skill_fm = _parse_frontmatter(skill_path) - agent_fm = _parse_frontmatter(agent_path) + skill_fm = parse_frontmatter(skill_path) + agent_fm = parse_frontmatter(agent_path) if skill_fm is None or "allowed-tools" not in skill_fm: pytest.skip(f"Skill '{skill_name}' has no allowed-tools") diff --git a/tests/static/test_cross_references.py b/tests/static/test_cross_references.py index 502c896..cb49bd5 100644 --- a/tests/static/test_cross_references.py +++ b/tests/static/test_cross_references.py @@ -1,7 +1,7 @@ """TC-CR: Cross-reference integrity between product-layer files.""" import re import pytest -from tests.conftest import DEVPACE_ROOT, PRODUCT_DIRS, SKILL_NAMES +from tests.conftest import DEVPACE_ROOT, PRODUCT_DIRS, SKILL_NAMES, product_md_files LINK_RE = re.compile(r'\[([^\]]*)\]\(([^)]+)\)') FENCE_RE = re.compile(r'```[^\n]*\n.*?```', re.DOTALL) @@ -15,21 +15,12 @@ def _strip_code(content: str) -> str: return content -def _product_md_files(): - files = [] - for d in PRODUCT_DIRS: - dirpath = DEVPACE_ROOT / d - if dirpath.is_dir(): - files.extend(dirpath.rglob("*.md")) - return files - - @pytest.mark.static class TestCrossReferences: def test_tc_cr_01_internal_links_valid(self): """TC-CR-01: Markdown internal links point to existing files.""" broken = [] - for f in _product_md_files(): + for f in product_md_files(): content = f.read_text(encoding="utf-8") # Strip code blocks — example links should not be validated content = _strip_code(content) @@ -85,6 +76,38 @@ def test_tc_cr_04_init_template_refs_exist(self): templates = list(template_dir.glob("*.md")) assert len(templates) >= 7, f"Expected ≥7 templates, found {len(templates)}" + def test_tc_cr_05_claude_md_template_synced_with_rules(self): + """TC-CR-05: claude-md-devpace.md template contains key content or delegates to rules.""" + template = DEVPACE_ROOT / "skills" / "pace-init" / "templates" / "claude-md-devpace.md" + rules = DEVPACE_ROOT / "rules" / "devpace-rules.md" + if not template.exists() or not rules.exists(): + pytest.skip("Template or rules file not found") + template_content = template.read_text(encoding="utf-8") + missing = [] + # Template must either contain key concepts directly OR delegate to rules + delegates_to_rules = "devpace-rules.md" in template_content + if not delegates_to_rules: + # §2 dual mode: explore vs advance + if "探索" not in template_content or "推进" not in template_content: + missing.append("§2 双模式(探索/推进)关键词缺失") + # §9 change management trigger words + change_triggers = ["不做了", "加一个", "改一下"] + if not any(t in template_content for t in change_triggers): + missing.append("§9 变更管理触发词缺失(至少需包含一个:不做了/加一个/改一下)") + # Session end summary + if "3-5" not in template_content and "3-5" not in template_content.replace("–", "-"): + missing.append("会话结束 3-5 行摘要规则缺失") + # state.md reference always required (either inline or in file table) + if "state.md" not in template_content: + missing.append("会话开始读 state.md 规则缺失") + # .devpace/ reference always required + if ".devpace/" not in template_content: + missing.append(".devpace/ 文件参考缺失") + assert not missing, ( + f"claude-md-devpace.md template is out of sync with rules:\n" + + "\n".join(f" - {m}" for m in missing) + ) + def test_tc_cr_06_rules_section_refs_valid(self): """TC-CR-06: §N cross-references within rules file point to existing sections.""" rules = DEVPACE_ROOT / "rules" / "devpace-rules.md" @@ -112,7 +135,7 @@ def test_tc_cr_07_detail_refs_exist(self): """TC-CR-07: '详见' backtick references point to existing files.""" broken = [] ref_pattern = re.compile(r'(?:详见|见)\s+`([a-zA-Z0-9_/.-]+\.md)`') - for f in _product_md_files(): + for f in product_md_files(): content = f.read_text(encoding="utf-8") for m in ref_pattern.finditer(content): ref_path = m.group(1) @@ -171,34 +194,75 @@ def test_tc_cr_11_br_schema_refs_valid(self): assert "PF-" in content or "PF" in content, "br-format.md missing PF reference" assert "project.md" in content, "br-format.md missing project.md reference" - def test_tc_cr_05_claude_md_template_synced_with_rules(self): - """TC-CR-05: claude-md-devpace.md template contains key content or delegates to rules.""" - template = DEVPACE_ROOT / "skills" / "pace-init" / "templates" / "claude-md-devpace.md" - rules = DEVPACE_ROOT / "rules" / "devpace-rules.md" - if not template.exists() or not rules.exists(): - pytest.skip("Template or rules file not found") - template_content = template.read_text(encoding="utf-8") - missing = [] - # Template must either contain key concepts directly OR delegate to rules - delegates_to_rules = "devpace-rules.md" in template_content - if not delegates_to_rules: - # §2 dual mode: explore vs advance - if "探索" not in template_content or "推进" not in template_content: - missing.append("§2 双模式(探索/推进)关键词缺失") - # §9 change management trigger words - change_triggers = ["不做了", "加一个", "改一下"] - if not any(t in template_content for t in change_triggers): - missing.append("§9 变更管理触发词缺失(至少需包含一个:不做了/加一个/改一下)") - # Session end summary - if "3-5" not in template_content and "3-5" not in template_content.replace("–", "-"): - missing.append("会话结束 3-5 行摘要规则缺失") - # state.md reference always required (either inline or in file table) - if "state.md" not in template_content: - missing.append("会话开始读 state.md 规则缺失") - # .devpace/ reference always required - if ".devpace/" not in template_content: - missing.append(".devpace/ 文件参考缺失") + def test_tc_cr_12_no_orphan_procedures(self): + """TC-CR-12: Every procedures file is reachable from SKILL.md or sibling procedures. + + TC-CR-03 checks forward (SKILL.md refs → file exists). + This checks reverse (file exists → is referenced somewhere). + Handles abbreviated references (e.g. "status.md" for "release-procedures-status.md") + when SKILL.md declares a prefix convention. + """ + proc_pattern = re.compile(r"[a-z]+-procedures?[-\w]*\.md") + orphans = [] + for name in SKILL_NAMES: + skill_dir = DEVPACE_ROOT / "skills" / name + skill_md = skill_dir / "SKILL.md" + if not skill_md.exists(): + continue + # Collect all text from SKILL.md + all procedures + all_text = "" + for md_file in [skill_md] + list(skill_dir.glob("*-procedures*.md")): + all_text += md_file.read_text(encoding="utf-8") + "\n" + full_refs = set(proc_pattern.findall(all_text)) + # Also collect short .md refs for prefix-abbreviated routing tables + short_refs = set(re.findall(r"(?= 5, ( + f"pace-change routing table references too few procedures: {proc_refs}" + ) + missing = [p for p in proc_refs if not (skill_dir / p).exists()] + assert not missing, ( + f"pace-change routing table references missing files: {missing}" ) diff --git a/tests/static/test_eval_modules.py b/tests/static/test_eval_modules.py new file mode 100644 index 0000000..c11d16f --- /dev/null +++ b/tests/static/test_eval_modules.py @@ -0,0 +1,396 @@ +"""TC-EVAL: Unit tests for eval package modules. + +Tests skill_io, results, regress, baseline, and loop utilities +without requiring Agent SDK or API calls. +""" +import json +import pytest +import tempfile +from pathlib import Path +from unittest.mock import patch + +from tests.conftest import DEVPACE_ROOT + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +@pytest.fixture +def tmp_skill(tmp_path): + """Create a temporary skill directory with a SKILL.md.""" + skill_dir = tmp_path / "test-skill" + skill_dir.mkdir() + (skill_dir / "SKILL.md").write_text( + "---\n" + "description: Use when user says \"test\" or \"check\"\n" + "allowed-tools: Read, Write\n" + "---\n" + "\n" + "# /test-skill\n" + "Test skill body.\n" + ) + return skill_dir + + +@pytest.fixture +def tmp_skill_multiline(tmp_path): + """Create a skill with multi-line folded description.""" + skill_dir = tmp_path / "multi-skill" + skill_dir.mkdir() + (skill_dir / "SKILL.md").write_text( + "---\n" + "description: >\n" + " Use when user says \"build\" or \"create\"\n" + " or wants to start a new project.\n" + " NOT for existing project changes.\n" + "allowed-tools: Read\n" + "---\n" + "\n" + "# /multi-skill\n" + ) + return skill_dir + + +@pytest.fixture +def tmp_results_dir(tmp_path): + """Create a temporary results directory structure.""" + rdir = tmp_path / "results" + rdir.mkdir() + (rdir / "history").mkdir() + (rdir / "loop").mkdir() + return rdir + + +# --------------------------------------------------------------------------- +# TC-EVAL-IO: skill_io module +# --------------------------------------------------------------------------- + +@pytest.mark.static +class TestSkillIO: + def test_read_description_inline(self, tmp_skill): + """TC-EVAL-IO-01: Read inline description.""" + from eval.skill_io import read_description + desc = read_description(tmp_skill) + assert desc == 'Use when user says "test" or "check"' + + def test_read_description_multiline(self, tmp_skill_multiline): + """TC-EVAL-IO-02: Read multi-line folded description.""" + from eval.skill_io import read_description + desc = read_description(tmp_skill_multiline) + assert "build" in desc + assert "create" in desc + assert "NOT for existing" in desc + + def test_description_hash_deterministic(self, tmp_skill): + """TC-EVAL-IO-03: Hash is deterministic for same description.""" + from eval.skill_io import description_hash + h1 = description_hash(tmp_skill) + h2 = description_hash(tmp_skill) + assert h1 == h2 + assert len(h1) == 16 + + def test_replace_description_inline(self, tmp_skill): + """TC-EVAL-IO-04: Replace inline description.""" + from eval.skill_io import read_description, replace_description + skill_md = tmp_skill / "SKILL.md" + original = replace_description(skill_md, "New description here") + assert "test" in original # original content returned + new_desc = read_description(tmp_skill) + assert new_desc == "New description here" + + def test_replace_description_long(self, tmp_skill): + """TC-EVAL-IO-05: Long description uses folded block scalar.""" + from eval.skill_io import replace_description + skill_md = tmp_skill / "SKILL.md" + long_desc = "Use when " + " ".join(f"word{i}" for i in range(50)) + replace_description(skill_md, long_desc) + content = skill_md.read_text() + assert "description: >" in content + + def test_replace_preserves_other_fields(self, tmp_skill): + """TC-EVAL-IO-06: Replacement preserves other frontmatter fields.""" + from eval.skill_io import replace_description + skill_md = tmp_skill / "SKILL.md" + replace_description(skill_md, "New desc") + content = skill_md.read_text() + assert "allowed-tools: Read, Write" in content + + def test_read_skill_md(self, tmp_skill): + """TC-EVAL-IO-07: Read full SKILL.md content.""" + from eval.skill_io import read_skill_md + content = read_skill_md(tmp_skill) + assert "# /test-skill" in content + + def test_read_description_no_description(self, tmp_path): + """TC-EVAL-IO-08: Returns empty string if no description field.""" + skill_dir = tmp_path / "no-desc" + skill_dir.mkdir() + (skill_dir / "SKILL.md").write_text("---\nallowed-tools: Read\n---\n") + from eval.skill_io import read_description + assert read_description(skill_dir) == "" + + +# --------------------------------------------------------------------------- +# TC-EVAL-RES: results module +# --------------------------------------------------------------------------- + +@pytest.mark.static +class TestResults: + def test_build_metadata_basic(self): + """TC-EVAL-RES-01: Build metadata with defaults.""" + from eval.results import build_metadata + meta = build_metadata(model="test-model", duration_seconds=12.345) + assert meta["model"] == "test-model" + assert meta["sdk_options"]["max_turns"] == 5 + assert meta["environment"]["eval_version"] == "0.2.0" + assert meta["duration_seconds"] == 12.3 + + def test_build_metadata_no_model(self): + """TC-EVAL-RES-02: Build metadata without model.""" + from eval.results import build_metadata + meta = build_metadata() + assert "model" not in meta + assert "sdk_options" in meta + + def test_eval_score(self): + """TC-EVAL-RES-03: Score computation.""" + from eval.results import eval_score + assert eval_score({"summary": {"total": 10, "passed": 8}}) == 0.8 + assert eval_score({"summary": {"total": 0, "passed": 0}}) == 0.0 + assert eval_score({"summary": {"total": 5, "passed": 5}}) == 1.0 + assert eval_score({}) == 0.0 + + def test_save_and_load(self, tmp_path): + """TC-EVAL-RES-04: Save and load results round-trip.""" + from eval.results import save_trigger_results, load_results, EVAL_DATA_DIR + # Patch EVAL_DATA_DIR temporarily + with patch("eval.results.EVAL_DATA_DIR", tmp_path): + with patch("eval.results.SKILLS_DIR", tmp_path): + # Create fake skill for hash + fake_skill = tmp_path / "fake" + fake_skill.mkdir() + (fake_skill / "SKILL.md").write_text("---\ndescription: test\n---\n") + with patch("eval.results.description_hash", return_value="abc123"): + raw = { + "summary": {"total": 3, "passed": 2, "failed": 1}, + "results": [ + {"query": "q1", "should_trigger": True, "pass": True, "runs": 1}, + {"query": "q2", "should_trigger": True, "pass": False, "runs": 1}, + {"query": "q3", "should_trigger": False, "pass": True, "runs": 1}, + ], + } + path = save_trigger_results("fake", raw) + assert path.exists() + data = json.loads(path.read_text()) + assert data["skill"] == "fake" + assert data["summary"]["passed"] == 2 + assert len(data["false_negatives"]) == 1 + assert len(data["false_positives"]) == 0 + + +# --------------------------------------------------------------------------- +# TC-EVAL-REG: regress module +# --------------------------------------------------------------------------- + +@pytest.mark.static +class TestRegress: + def test_compute_metrics_no_regression(self): + """TC-EVAL-REG-01: No regression when latest >= baseline.""" + from eval.regress import _compute_metrics + baseline = { + "summary": {"total": 10, "passed": 8}, + "positive": {"total": 7, "passed": 5}, + "false_negatives": [{"id": 0, "query": "q1"}, {"id": 1, "query": "q2"}], + "false_positives": [], + } + latest = { + "summary": {"total": 10, "passed": 9}, + "positive": {"total": 7, "passed": 6}, + "false_negatives": [{"id": 0, "query": "q1"}], + "false_positives": [], + } + metrics = _compute_metrics(baseline, latest) + assert metrics["overall_pass_rate_drop"] < 0 # improved + assert metrics["false_negative_increase"] == -1 # fewer FN + + def test_compute_metrics_regression(self): + """TC-EVAL-REG-02: Detect regression when latest < baseline.""" + from eval.regress import _compute_metrics + baseline = { + "summary": {"total": 10, "passed": 9}, + "positive": {"total": 7, "passed": 7}, + "false_negatives": [], + "false_positives": [], + } + latest = { + "summary": {"total": 10, "passed": 6}, + "positive": {"total": 7, "passed": 4}, + "false_negatives": [{"id": i, "query": f"q{i}"} for i in range(3)], + "false_positives": [{"id": 0, "query": "fp1"}], + } + metrics = _compute_metrics(baseline, latest) + assert metrics["overall_pass_rate_drop"] > 0.15 # FAILURE level + assert metrics["false_negative_increase"] == 3 + assert metrics["false_positive_increase"] == 1 + + def test_classify_thresholds(self): + """TC-EVAL-REG-03: Classification matches expected thresholds.""" + from eval.regress import _classify + assert _classify("overall_pass_rate_drop", 0.03) == "OK" + assert _classify("overall_pass_rate_drop", 0.07) == "WARNING" + assert _classify("overall_pass_rate_drop", 0.20) == "FAILURE" + assert _classify("false_positive_increase", 0) == "OK" + assert _classify("false_positive_increase", 1) == "FAILURE" + + def test_sibling_skills(self): + """TC-EVAL-REG-04: Sibling skill lookup works.""" + from eval.regress import get_sibling_skills + siblings = get_sibling_skills("pace-dev") + assert "pace-change" in siblings + assert get_sibling_skills("nonexistent") == [] + + +# --------------------------------------------------------------------------- +# TC-EVAL-BASE: baseline module +# --------------------------------------------------------------------------- + +@pytest.mark.static +class TestBaseline: + def test_save_baseline(self, tmp_path): + """TC-EVAL-BASE-01: Save copies latest to baseline.""" + from eval.baseline import save_baseline + rdir = tmp_path / "test-skill" / "results" + rdir.mkdir(parents=True) + (rdir / "latest.json").write_text('{"summary":{"total":5,"passed":4}}') + with patch("eval.baseline.EVAL_DATA_DIR", tmp_path): + with patch("eval.baseline.DEVPACE_ROOT", tmp_path.parent): + rc = save_baseline("test-skill") + assert rc == 0 + assert (rdir / "baseline.json").exists() + assert json.loads((rdir / "baseline.json").read_text())["summary"]["passed"] == 4 + + def test_save_baseline_no_latest(self, tmp_path): + """TC-EVAL-BASE-02: Save fails without latest.json.""" + from eval.baseline import save_baseline + rdir = tmp_path / "test-skill" / "results" + rdir.mkdir(parents=True) + with patch("eval.baseline.EVAL_DATA_DIR", tmp_path): + rc = save_baseline("test-skill") + assert rc == 1 + + +# --------------------------------------------------------------------------- +# TC-EVAL-SPLIT: train/test split +# --------------------------------------------------------------------------- + +@pytest.mark.static +class TestTrainTestSplit: + def test_split_proportions(self): + """TC-EVAL-SPLIT-01: Split maintains approximate holdout ratio.""" + from eval.loop import _split_train_test + eval_set = [ + {"query": f"pos{i}", "should_trigger": True} for i in range(20) + ] + [ + {"query": f"neg{i}", "should_trigger": False} for i in range(10) + ] + train, test = _split_train_test(eval_set, holdout=0.3, seed=42) + assert len(train) + len(test) == 30 + assert 8 <= len(test) <= 10 # ~30% of 30 + + def test_split_preserves_all_queries(self): + """TC-EVAL-SPLIT-02: All queries appear in exactly one set.""" + from eval.loop import _split_train_test + eval_set = [{"query": f"q{i}", "should_trigger": i < 5} for i in range(10)] + train, test = _split_train_test(eval_set, holdout=0.3, seed=42) + all_queries = {e["query"] for e in train} | {e["query"] for e in test} + assert all_queries == {f"q{i}" for i in range(10)} + + def test_split_deterministic_with_seed(self): + """TC-EVAL-SPLIT-03: Same seed produces same split.""" + from eval.loop import _split_train_test + eval_set = [{"query": f"q{i}", "should_trigger": True} for i in range(20)] + t1, s1 = _split_train_test(eval_set, holdout=0.3, seed=123) + t2, s2 = _split_train_test(eval_set, holdout=0.3, seed=123) + assert [e["query"] for e in t1] == [e["query"] for e in t2] + assert [e["query"] for e in s1] == [e["query"] for e in s2] + + def test_split_both_sets_have_pos_and_neg(self): + """TC-EVAL-SPLIT-04: Both sets have positive and negative queries.""" + from eval.loop import _split_train_test + eval_set = [ + {"query": f"pos{i}", "should_trigger": True} for i in range(10) + ] + [ + {"query": f"neg{i}", "should_trigger": False} for i in range(10) + ] + train, test = _split_train_test(eval_set, holdout=0.3, seed=42) + train_pos = sum(1 for e in train if e["should_trigger"]) + train_neg = sum(1 for e in train if not e["should_trigger"]) + test_pos = sum(1 for e in test if e["should_trigger"]) + test_neg = sum(1 for e in test if not e["should_trigger"]) + assert train_pos > 0 and train_neg > 0 + assert test_pos > 0 and test_neg > 0 + + +# --------------------------------------------------------------------------- +# TC-EVAL-TRIGGER: trigger module (unit-level, no SDK calls) +# --------------------------------------------------------------------------- + +@pytest.mark.static +class TestTriggerUtils: + def test_wilson_interval_basic(self): + """TC-EVAL-TRIG-01: Wilson interval for known proportions.""" + from eval.trigger import _wilson_interval + lo, hi = _wilson_interval(5, 10) + assert 0.2 < lo < 0.5 + assert 0.5 < hi < 0.8 + + def test_wilson_interval_zero(self): + """TC-EVAL-TRIG-02: Wilson interval for zero total.""" + from eval.trigger import _wilson_interval + assert _wilson_interval(0, 0) == (0.0, 0.0) + + def test_wilson_interval_perfect(self): + """TC-EVAL-TRIG-03: Wilson interval for perfect rate.""" + from eval.trigger import _wilson_interval + lo, hi = _wilson_interval(10, 10) + assert lo > 0.6 + assert hi == 1.0 or hi > 0.95 + + +# --------------------------------------------------------------------------- +# TC-EVAL-CLI: CLI module +# --------------------------------------------------------------------------- + +@pytest.mark.static +class TestCLI: + def test_parser_has_all_commands(self): + """TC-EVAL-CLI-01: Parser has all expected subcommands.""" + from eval.cli import build_parser + parser = build_parser() + # Check by parsing known subcommands + args = parser.parse_args(["trigger", "--skill", "test"]) + assert args.command == "trigger" + assert args.skill == "test" + assert args.max_turns == 5 # default + + def test_parser_loop_requires_model(self): + """TC-EVAL-CLI-02: Loop subcommand requires --model.""" + from eval.cli import build_parser + parser = build_parser() + with pytest.raises(SystemExit): + parser.parse_args(["loop", "--skill", "test"]) + + def test_parser_changed_default_base(self): + """TC-EVAL-CLI-03: Changed subcommand has default base ref.""" + from eval.cli import build_parser + parser = build_parser() + args = parser.parse_args(["changed"]) + assert args.base == "origin/main" + + def test_parser_loop_holdout(self): + """TC-EVAL-CLI-04: Loop subcommand accepts --holdout.""" + from eval.cli import build_parser + parser = build_parser() + args = parser.parse_args(["loop", "--skill", "t", "--model", "m", "--holdout", "0.2"]) + assert args.holdout == 0.2 diff --git a/tests/static/test_frontmatter.py b/tests/static/test_frontmatter.py index 49ae2bf..1936e47 100644 --- a/tests/static/test_frontmatter.py +++ b/tests/static/test_frontmatter.py @@ -1,15 +1,7 @@ """TC-FM: SKILL.md frontmatter validation.""" import pytest import yaml -from tests.conftest import DEVPACE_ROOT, SKILL_NAMES, LEGAL_SKILL_FIELDS, LEGAL_MODEL_VALUES, LEGAL_TOOL_NAMES - -def _parse_frontmatter(path): - """Extract YAML frontmatter from a markdown file.""" - text = path.read_text(encoding="utf-8") - if not text.startswith("---"): - return None - end = text.index("---", 3) - return yaml.safe_load(text[3:end]) +from tests.conftest import DEVPACE_ROOT, SKILL_NAMES, LEGAL_SKILL_FIELDS, LEGAL_MODEL_VALUES, LEGAL_TOOL_NAMES, parse_frontmatter def _skill_md_files(): skills_root = DEVPACE_ROOT / "skills" @@ -31,7 +23,7 @@ def test_tc_fm_01_has_frontmatter(self, name, path): @pytest.mark.parametrize("name,path", _skill_md_files(), ids=[n for n, _ in _skill_md_files()]) def test_tc_fm_02_legal_fields_only(self, name, path): """TC-FM-02: Frontmatter uses only legal fields.""" - fm = _parse_frontmatter(path) + fm = parse_frontmatter(path) if fm is None: pytest.skip(f"{name} has no frontmatter") illegal = set(fm.keys()) - LEGAL_SKILL_FIELDS @@ -40,13 +32,13 @@ def test_tc_fm_02_legal_fields_only(self, name, path): @pytest.mark.parametrize("name,path", _skill_md_files(), ids=[n for n, _ in _skill_md_files()]) def test_tc_fm_03_description_required(self, name, path): """TC-FM-03: description field must exist.""" - fm = _parse_frontmatter(path) + fm = parse_frontmatter(path) assert fm and "description" in fm, f"{name} SKILL.md missing 'description' in frontmatter" @pytest.mark.parametrize("name,path", _skill_md_files(), ids=[n for n, _ in _skill_md_files()]) def test_tc_fm_04_allowed_tools_valid(self, name, path): """TC-FM-04: allowed-tools values are recognized tool names.""" - fm = _parse_frontmatter(path) + fm = parse_frontmatter(path) if fm is None or "allowed-tools" not in fm: pytest.skip(f"{name} has no allowed-tools") tools = [t.strip() for t in fm["allowed-tools"].split(",")] @@ -56,7 +48,7 @@ def test_tc_fm_04_allowed_tools_valid(self, name, path): @pytest.mark.parametrize("name,path", _skill_md_files(), ids=[n for n, _ in _skill_md_files()]) def test_tc_fm_05_model_valid(self, name, path): """TC-FM-05: model field (if present) is sonnet/opus/haiku.""" - fm = _parse_frontmatter(path) + fm = parse_frontmatter(path) if fm is None or "model" not in fm: pytest.skip(f"{name} has no model field") assert fm["model"] in LEGAL_MODEL_VALUES, f"{name} has invalid model: {fm['model']}" @@ -86,7 +78,7 @@ def test_tc_fm_07_file_reading_skills_have_allowed_tools(self, name, path): reads_files = any(kw in body for kw in self._FILE_READ_INDICATORS) if not reads_files: pytest.skip(f"{name} does not appear to read files") - fm = _parse_frontmatter(path) + fm = parse_frontmatter(path) assert fm and "allowed-tools" in fm, ( f"{name} reads files but has no allowed-tools declared" ) @@ -105,7 +97,7 @@ def test_tc_fm_08_argument_hint_present(self, name, path): has_arguments = "$ARGUMENTS" in body or "$0" in body or "$1" in body if not has_arguments: pytest.skip(f"{name} does not use $ARGUMENTS") - fm = _parse_frontmatter(path) + fm = parse_frontmatter(path) if not fm or "argument-hint" not in fm: import warnings warnings.warn( @@ -118,7 +110,7 @@ def test_tc_fm_08_argument_hint_present(self, name, path): @pytest.mark.parametrize("name,path", _skill_md_files(), ids=[n for n, _ in _skill_md_files()]) def test_tc_fm_09_hook_matcher_tools_in_allowed_tools(self, name, path): """TC-FM-09: Hook matcher tool_name entries must be a subset of allowed-tools.""" - fm = _parse_frontmatter(path) + fm = parse_frontmatter(path) if fm is None or "hooks" not in fm or "allowed-tools" not in fm: pytest.skip(f"{name} has no hooks or no allowed-tools") allowed = {t.strip() for t in fm["allowed-tools"].split(",")} diff --git a/tests/static/test_hooks.py b/tests/static/test_hooks.py index 43d6bdc..9a0f283 100644 --- a/tests/static/test_hooks.py +++ b/tests/static/test_hooks.py @@ -4,11 +4,16 @@ import stat import pytest -from tests.conftest import DEVPACE_ROOT, CR_STATES +from tests.conftest import DEVPACE_ROOT, CR_STATES, parse_frontmatter HOOKS_DIR = DEVPACE_ROOT / "hooks" HOOKS_JSON = HOOKS_DIR / "hooks.json" + +def _load_hooks_json(): + """Load and parse hooks.json.""" + return json.loads(HOOKS_JSON.read_text(encoding="utf-8")) + # Valid hook event names (case-sensitive per Claude Code spec) VALID_HOOK_EVENTS = { "PreToolUse", @@ -26,7 +31,9 @@ } EXPECTED_SCRIPTS_SH = ["session-start.sh", "session-stop.sh", "pre-compact.sh", "session-end.sh"] -EXPECTED_SCRIPTS_MJS = ["pre-tool-use.mjs", "post-cr-update.mjs", "intent-detect.mjs", "subagent-stop.mjs", "pulse-counter.mjs", "post-tool-failure.mjs", "sync-push.mjs", "pace-dev-scope-check.mjs"] +EXPECTED_SCRIPTS_MJS = ["pre-tool-use.mjs", "post-cr-update.mjs", "intent-detect.mjs", "subagent-stop.mjs", "pulse-counter.mjs", "post-tool-failure.mjs", "sync-push.mjs", "post-schema-check.mjs"] +SKILL_HOOKS_DIR = HOOKS_DIR / "skill" +EXPECTED_SKILL_SCRIPTS = ["pace-dev-scope-check.mjs", "pace-init-scope-check.mjs", "pace-review-scope-check.mjs"] EXPECTED_SCRIPTS = EXPECTED_SCRIPTS_SH + EXPECTED_SCRIPTS_MJS @@ -41,7 +48,7 @@ def test_tc_hk_01_hooks_json_valid(self): def test_tc_hk_02_event_names_case_correct(self): """TC-HK-02: Hook event names use correct casing.""" - data = json.loads(HOOKS_JSON.read_text(encoding="utf-8")) + data = _load_hooks_json() for event_name in data["hooks"]: assert event_name in VALID_HOOK_EVENTS, ( f"Invalid hook event name '{event_name}'. " @@ -84,6 +91,39 @@ def test_tc_hk_05_scripts_have_shebang(self): no_shebang.append(script) assert not no_shebang, f"Scripts missing shebang: {no_shebang}" + def test_tc_hk_03b_skill_scripts_exist(self): + """TC-HK-03b: All expected skill-level hook scripts exist in hooks/skill/.""" + missing = [] + for script in EXPECTED_SKILL_SCRIPTS: + if not (SKILL_HOOKS_DIR / script).exists(): + missing.append(script) + assert not missing, f"Missing skill hook scripts: {missing}" + + def test_tc_hk_04b_skill_scripts_executable(self): + """TC-HK-04b: Skill hook scripts have execute permission.""" + not_executable = [] + for script in EXPECTED_SKILL_SCRIPTS: + path = SKILL_HOOKS_DIR / script + if path.exists(): + mode = path.stat().st_mode + if not (mode & stat.S_IXUSR): + not_executable.append(script) + assert not not_executable, ( + f"Skill scripts lack execute permission: {not_executable}. " + f"Fix with: chmod +x hooks/skill/