From 07bedbbe29003a9630cdfbc8b8e12b7ae6afb3cb Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 30 Mar 2026 12:33:29 +0000 Subject: [PATCH 1/3] Initial plan From f09c2d84dc49e3d3dcd1069a6c53722fc906ad54 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 30 Mar 2026 12:45:51 +0000 Subject: [PATCH 2/3] =?UTF-8?q?feat:=20task=20decomposition=20research=20?= =?UTF-8?q?=E2=80=94=20workspace=20detection,=20planner=20context,=20workf?= =?UTF-8?q?low=20templates?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Agent-Logs-Url: https://github.com/huberp/agentloop/sessions/3a8fc459-0fd4-4ab9-8e11-a59e4d76960b Co-authored-by: huberp <4027454+huberp@users.noreply.github.com> --- issues/2.md | 208 ++++++++++++++++++ src/__tests__/builtin-agent-profiles.test.ts | 4 +- src/__tests__/builtin-skills.test.ts | 4 +- .../fixtures/workspace-cargo/Cargo.toml | 4 + .../workspace-cmake-presets/CMakeLists.txt | 0 .../workspace-cmake-presets/CMakePresets.json | 1 + .../fixtures/workspace-cmake/CMakeLists.txt | 0 .../workspace-gradle-kotlin/build.gradle.kts | 1 + .../fixtures/workspace-gradle/build.gradle | 1 + .../fixtures/workspace-maven/pom.xml | 1 + src/__tests__/workspace.test.ts | 160 ++++++++++++++ src/agents/builtin/build-verify.agent.json | 12 + src/agents/builtin/test-runner.agent.json | 12 + src/skills/builtin/build-verify.skill.md | 43 ++++ src/skills/builtin/cmake-workflow.skill.md | 82 +++++++ src/subagents/planner.ts | 14 +- src/workspace.ts | 124 ++++++++++- 17 files changed, 664 insertions(+), 7 deletions(-) create mode 100644 issues/2.md create mode 100644 src/__tests__/fixtures/workspace-cargo/Cargo.toml create mode 100644 src/__tests__/fixtures/workspace-cmake-presets/CMakeLists.txt create mode 100644 src/__tests__/fixtures/workspace-cmake-presets/CMakePresets.json create mode 100644 src/__tests__/fixtures/workspace-cmake/CMakeLists.txt create mode 100644 src/__tests__/fixtures/workspace-gradle-kotlin/build.gradle.kts create mode 100644 src/__tests__/fixtures/workspace-gradle/build.gradle create mode 100644 src/__tests__/fixtures/workspace-maven/pom.xml create mode 100644 src/agents/builtin/build-verify.agent.json create mode 100644 src/agents/builtin/test-runner.agent.json create mode 100644 src/skills/builtin/build-verify.skill.md create mode 100644 src/skills/builtin/cmake-workflow.skill.md diff --git a/issues/2.md b/issues/2.md new file mode 100644 index 00000000..402e5406 --- /dev/null +++ b/issues/2.md @@ -0,0 +1,208 @@ +## Research: Task Planner and Task Decomposition for Coding Agents + +### 1. Problem Statement + +Modern AI coding agents need to tackle tasks that span multiple steps, require diverse tools, and benefit from specialised domain knowledge at each step. The key challenge is **task decomposition**: how should a high-level goal (e.g. "add input validation to all POST handlers") be broken into concrete, executable steps that an agent can carry out reliably? + +A related challenge is **workflow templates**: many coding workflows (build, test, lint, release) have a fixed *shape* but vary in their concrete commands depending on the workspace. Can these patterns be captured once and reused across projects? + +--- + +### 2. Baseline: The `plan-and-run` Loop in agentloop + +agentloop already ships a layered planning architecture: + +| Component | Location | Role | +|---|---|---| +| `generatePlan` | `src/subagents/planner.ts` | LLM-powered decomposition of a goal into `PlanStep[]` | +| `refinePlan` | `src/subagents/planner.ts` | Corrects a plan that references unknown tools | +| `validatePlan` | `src/subagents/planner.ts` | Checks that all tool names in the plan are registered | +| `executePlan` | `src/orchestrator.ts` | Runs steps in sequence; supports `retry/skip/abort` on failure and checkpoint/resume | +| `plan` tool | `src/tools/plan.ts` | Exposes plan generation to the agent loop as a callable tool | +| `plan-and-run` tool | `src/tools/plan-and-run.ts` | Combines generation + execution in one tool call | + +The planner runs as a **tool-free subagent**: it receives workspace context and a list of available tools, then returns a JSON plan. The orchestrator dispatches each step to a `runSubagent` call (or `SubagentManager` for complex steps) with an iteration budget derived from `estimatedComplexity`. + +**Key insight already present in agentloop**: each `PlanStep` carries an optional `agentProfile` field. The orchestrator activates the named profile (model, temperature, tool subset) for that step. This enables per-step specialisation without a separate orchestration framework. + +--- + +### 3. What Other Frameworks Do + +#### 3.1 LangGraph (LangChain) +- Models agent behaviour as a **directed graph** of nodes (LLM calls, tool calls) with conditional edges. +- Supports cycles (retry loops), parallel fan-out/fan-in, and human-in-the-loop interrupts. +- Templates are *graph patterns* stored as reusable subgraphs. +- Complexity: full graph authoring required for every new workflow shape. + +#### 3.2 AutoGen (Microsoft) +- Multi-agent conversation: a **Planner** agent, a **Coder** agent, and an **Executor** agent exchange messages until the task is done. +- Task decomposition happens in natural language — the Planner emits step descriptions that the Coder implements. +- Workflow templates are **system prompts** for each role, often provided in a configuration YAML. +- Strength: easy to add domain-expert agents. Weakness: conversation history grows rapidly; quality depends on message-passing discipline. + +#### 3.3 CrewAI +- Defines **Crew** (team of agents), **Agents** (role + backstory + tools), and **Tasks** (description + expected output + dependencies). +- Supports sequential and hierarchical execution; tasks can pass their output as context to dependent tasks. +- Workflow templates are *Crew + Task YAML configurations* that can be parameterised and re-instantiated for different inputs. +- Strong alignment with the "workflow template" concept in this issue: a `BuildVerifyCrew` YAML is a reusable template instantiated per workspace. + +#### 3.4 OpenAI Assistants + Structured Outputs +- Persistent thread context allows multi-turn tasks without re-injecting history. +- `run_step` objects provide a built-in audit trail of each tool call and its output. +- Templates are **Assistant instructions** (system prompt) combined with few-shot examples in the thread. +- Limitation: tied to the OpenAI API; no local model support. + +#### 3.5 Copilot Coding Agent (GitHub Copilot) +The example in this issue shows a subtask with: +```json +{ + "name": "build-verify", + "agent_type": "task", + "description": "Build the plugin to verify changes", + "prompt": "...\nSteps:\n1. Run: sudo bash scripts/install-linux-deps.sh\n2. Run: git submodule update --init --recursive\n3. Run: cmake --preset linux-release\n4. Run: cmake --build --preset linux-build -j2\n\nReport whether the build succeeded or failed..." +} +``` + +Key observations: +- The template name (`build-verify`) is **stable and reusable** across tasks. +- The concrete steps (cmake preset names, script paths) are **workspace-specific** and were derived from workspace knowledge. +- Instantiation happens once, at workspace-setup time — not re-derived on every task. +- This is equivalent to agentloop's `agentProfile` + `skill` combination, but the steps are baked into the prompt string rather than generated by a planner. + +--- + +### 4. Template Taxonomy for Coding Agents + +Based on the above analysis, coding workflow templates fall into three categories: + +#### Category A — Build Lifecycle Templates +Fixed structure, workspace-specific commands: + +| Template | Shape | Workspace-specific parts | +|---|---|---| +| `build-verify` | configure → compile → report | preset name, script paths, parallelism flag | +| `clean-build` | clean → configure → compile | build directory, preset/profile | +| `release-package` | build → test → package → sign | packaging format, signing key | + +#### Category B — Quality Gate Templates +Fixed checklist, tool-specific commands: + +| Template | Shape | Workspace-specific parts | +|---|---|---| +| `test-and-fix` | run tests → parse failures → locate code → fix → re-run | test runner command, test output format | +| `lint-and-format` | run linter → parse output → apply fixes → re-verify | linter binary, fix flags | +| `security-scan` | run scanner → parse findings → generate report | scanner CLI, severity threshold | + +#### Category C — Development Workflow Templates +Higher-level patterns: + +| Template | Shape | Notes | +|---|---|---| +| `feature-branch` | branch → implement → test → PR | Uses git tools + planner | +| `dependency-update` | audit → update → test → commit | Integrates vulnerability check | +| `hotfix` | branch from tag → apply fix → test → backport | Requires git-log, cherry-pick | + +--- + +### 5. How agentloop Can Implement Workflow Templates + +agentloop's existing primitives map cleanly onto the template concept: + +#### 5.1 Templates as Agent Profiles + Skills (recommended) + +A workflow template = **agent profile** (what tools, model, iteration budget) + **skill** (domain knowledge, step sequence, error heuristics). + +Example — `build-verify` profile (`src/agents/builtin/build-verify.agent.json`): +```json +{ + "name": "build-verify", + "description": "Build verification agent — compiles the workspace and reports success or failure", + "temperature": 0.1, + "skills": ["build-verify"], + "tools": ["shell", "file-read", "file-list"], + "maxIterations": 10 +} +``` + +The paired `build-verify` skill (`src/skills/builtin/build-verify.skill.md`) injects: +- Step sequence (identify build system → install deps → configure → compile → report) +- Error triage heuristics (linker errors, missing headers, stale cache) +- Parallelism flags per build tool + +The planner can then annotate a step with `"agentProfile": "build-verify"` and the orchestrator will activate the matching profile for that step — automatically binding the right skill, tool subset, and temperature. + +#### 5.2 Templates as Planner Context (workspace-aware instantiation) + +The planner prompt includes `workspaceInfo` fields including the detected lifecycle commands (`buildCommand`, `testCommand`, `lintCommand`). This allows the planner to produce **concrete, workspace-specific steps** in one shot: + +``` +Workspace: language=cmake, packageManager=cmake, + build="cmake --preset linux-release && cmake --build --preset linux-build", + test="ctest --preset linux-test" +``` + +The planner output then directly embeds the correct commands rather than using a generic placeholder. + +#### 5.3 Template Instantiation: Agent vs Static + +| Approach | When to use | Trade-offs | +|---|---|---| +| **Planner-time instantiation** (current) | Novel tasks, unknown workspaces | Flexible, adapts to workspace; requires LLM call | +| **Profile+skill pre-configuration** (new) | Recurring workflows (CI, build-verify) | Fast, deterministic, version-controlled; less adaptive | +| **Hybrid** (recommended) | Plan overall task, but use pre-defined profiles per step | Best of both worlds | + +The hybrid approach is already supported: the planner annotates `agentProfile` on steps, and the orchestrator activates the profile. Adding skills that encode the step sequence means the profile-activated agent "knows" the right procedure without the planner having to enumerate every sub-step. + +--- + +### 6. Concrete Example: CMake Build-Verify Flow + +**Goal**: "Build the plugin to verify changes compile correctly" + +**Planner output** (with workspace context `build="cmake --preset linux-release && cmake --build --preset linux-build -j2"`): + +```json +{ + "steps": [ + { + "description": "Install Linux build dependencies", + "toolsNeeded": ["shell"], + "estimatedComplexity": "low", + "agentProfile": "devops" + }, + { + "description": "Update git submodules", + "toolsNeeded": ["shell"], + "estimatedComplexity": "low", + "agentProfile": "devops" + }, + { + "description": "Build the project using cmake --preset linux-release && cmake --build --preset linux-build -j2 and report compiler output", + "toolsNeeded": ["shell"], + "estimatedComplexity": "medium", + "agentProfile": "build-verify" + } + ] +} +``` + +The `build-verify` agent profile activates the `build-verify` skill, which provides the step sequence and error triage guidance. The concrete commands come from `workspaceInfo.buildCommand`, injected into the planner prompt. + +--- + +### 7. Recommendations and Gaps Addressed + +| Gap | Solution implemented | +|---|---| +| Planner didn't know lifecycle commands | ✅ `buildPlannerTask` now includes `build`, `test`, `lint` commands from `WorkspaceInfo` | +| Only Node/Python/Go workspace detection | ✅ Added CMake, Rust/Cargo, Gradle, Maven analyzers in `workspace.ts` | +| No build-workflow agent profile | ✅ `build-verify.agent.json` and `test-runner.agent.json` added | +| No build-workflow skill | ✅ `build-verify.skill.md` and `cmake-workflow.skill.md` added | + +### 8. Remaining Open Questions + +1. **Template registry**: Should templates be discoverable at runtime (e.g. `list-templates` tool) so the planner can reference them by name? The current profile registry partially serves this role. +2. **Workspace-once vs task-every-time**: For expensive workspace analysis (submodule init, dependency install), should a "workspace setup" template run once at session start and cache results? This aligns with CrewAI's `before_kickoff` hook concept. +3. **Multi-repo / monorepo**: `analyzeWorkspace` currently detects one build system per root. Monorepos with mixed build systems (e.g. a CMake C++ library + a Node.js frontend) need a recursive scan. +4. **Template versioning**: When the workspace changes (new preset, renamed script), how are baked templates kept in sync? A solution is to keep commands in `WorkspaceInfo` (auto-detected) rather than hard-coding them in profile prompts. diff --git a/src/__tests__/builtin-agent-profiles.test.ts b/src/__tests__/builtin-agent-profiles.test.ts index ed5fa0d7..1078a492 100644 --- a/src/__tests__/builtin-agent-profiles.test.ts +++ b/src/__tests__/builtin-agent-profiles.test.ts @@ -24,8 +24,8 @@ beforeAll(async () => { }); describe("builtin agent profiles", () => { - it("loads exactly 5 builtin profiles", () => { - expect(registry.list()).toHaveLength(5); + it("loads exactly 7 builtin profiles", () => { + expect(registry.list()).toHaveLength(7); }); it("coder profile has name === 'coder' and model === 'gpt-4o'", () => { diff --git a/src/__tests__/builtin-skills.test.ts b/src/__tests__/builtin-skills.test.ts index 2d6e48d7..b2107d10 100644 --- a/src/__tests__/builtin-skills.test.ts +++ b/src/__tests__/builtin-skills.test.ts @@ -20,9 +20,11 @@ describe("built-in skill library", () => { "test-writer", "git-workflow", "security-auditor", + "build-verify", + "cmake-workflow", ]; - it("loads all 5 built-in skills", () => { + it("loads all 7 built-in skills", () => { const names = registry.list().map((s) => s.name); for (const name of BUILTIN_NAMES) { expect(names).toContain(name); diff --git a/src/__tests__/fixtures/workspace-cargo/Cargo.toml b/src/__tests__/fixtures/workspace-cargo/Cargo.toml new file mode 100644 index 00000000..965b5937 --- /dev/null +++ b/src/__tests__/fixtures/workspace-cargo/Cargo.toml @@ -0,0 +1,4 @@ +[package] +name = "my-app" +version = "0.1.0" +edition = "2021" diff --git a/src/__tests__/fixtures/workspace-cmake-presets/CMakeLists.txt b/src/__tests__/fixtures/workspace-cmake-presets/CMakeLists.txt new file mode 100644 index 00000000..e69de29b diff --git a/src/__tests__/fixtures/workspace-cmake-presets/CMakePresets.json b/src/__tests__/fixtures/workspace-cmake-presets/CMakePresets.json new file mode 100644 index 00000000..62e6c2f0 --- /dev/null +++ b/src/__tests__/fixtures/workspace-cmake-presets/CMakePresets.json @@ -0,0 +1 @@ +{"version":3,"cmakeMinimumRequired":{"major":3,"minor":21},"configurePresets":[{"name":"default","binaryDir":"build"}],"buildPresets":[{"name":"default","configurePreset":"default"}],"testPresets":[{"name":"default","configurePreset":"default"}]} diff --git a/src/__tests__/fixtures/workspace-cmake/CMakeLists.txt b/src/__tests__/fixtures/workspace-cmake/CMakeLists.txt new file mode 100644 index 00000000..e69de29b diff --git a/src/__tests__/fixtures/workspace-gradle-kotlin/build.gradle.kts b/src/__tests__/fixtures/workspace-gradle-kotlin/build.gradle.kts new file mode 100644 index 00000000..5b1dae2a --- /dev/null +++ b/src/__tests__/fixtures/workspace-gradle-kotlin/build.gradle.kts @@ -0,0 +1 @@ +plugins { kotlin("jvm") version "1.9.0" } diff --git a/src/__tests__/fixtures/workspace-gradle/build.gradle b/src/__tests__/fixtures/workspace-gradle/build.gradle new file mode 100644 index 00000000..b95276ac --- /dev/null +++ b/src/__tests__/fixtures/workspace-gradle/build.gradle @@ -0,0 +1 @@ +plugins { id("java") } diff --git a/src/__tests__/fixtures/workspace-maven/pom.xml b/src/__tests__/fixtures/workspace-maven/pom.xml new file mode 100644 index 00000000..12ac61cc --- /dev/null +++ b/src/__tests__/fixtures/workspace-maven/pom.xml @@ -0,0 +1 @@ +4.0.0com.examplemy-app1.0 diff --git a/src/__tests__/workspace.test.ts b/src/__tests__/workspace.test.ts index 718b6169..df50e200 100644 --- a/src/__tests__/workspace.test.ts +++ b/src/__tests__/workspace.test.ts @@ -118,3 +118,163 @@ describe("analyzeWorkspace — git detection", () => { expect(info.gitInitialized).toBe(true); }); }); + +describe("analyzeWorkspace — Rust/Cargo project", () => { + const root = path.join(fixturesDir, "workspace-cargo"); + + let info: WorkspaceInfo; + beforeAll(async () => { + info = await analyzeWorkspace(root); + }); + + it("detects language as 'rust'", () => { + expect(info.language).toBe("rust"); + }); + + it("uses 'cargo' as the package manager", () => { + expect(info.packageManager).toBe("cargo"); + }); + + it("defaults the build command to 'cargo build'", () => { + expect(info.buildCommand).toBe("cargo build"); + }); + + it("defaults the test command to 'cargo test'", () => { + expect(info.testCommand).toBe("cargo test"); + }); + + it("defaults the lint command to 'cargo clippy'", () => { + expect(info.lintCommand).toBe("cargo clippy"); + }); + + it("reports hasTests as true when a tests/ directory exists", () => { + expect(info.hasTests).toBe(true); + }); +}); + +describe("analyzeWorkspace — CMake project (no presets)", () => { + const root = path.join(fixturesDir, "workspace-cmake"); + + let info: WorkspaceInfo; + beforeAll(async () => { + info = await analyzeWorkspace(root); + }); + + it("detects language as 'cmake'", () => { + expect(info.language).toBe("cmake"); + }); + + it("uses 'cmake' as the package manager", () => { + expect(info.packageManager).toBe("cmake"); + }); + + it("uses classic out-of-source build command when no presets file is present", () => { + expect(info.buildCommand).toBe("cmake -S . -B build && cmake --build build"); + }); + + it("defaults the test command to ctest", () => { + expect(info.testCommand).toBe("ctest --output-on-failure"); + }); + + it("reports hasTests as true when a tests/ directory exists", () => { + expect(info.hasTests).toBe(true); + }); +}); + +describe("analyzeWorkspace — CMake project (with CMakePresets.json)", () => { + const root = path.join(fixturesDir, "workspace-cmake-presets"); + + let info: WorkspaceInfo; + beforeAll(async () => { + info = await analyzeWorkspace(root); + }); + + it("detects language as 'cmake'", () => { + expect(info.language).toBe("cmake"); + }); + + it("uses preset-based build command when CMakePresets.json is present", () => { + expect(info.buildCommand).toBe( + "cmake --preset default && cmake --build --preset default" + ); + }); + + it("uses preset-based test command when CMakePresets.json is present", () => { + expect(info.testCommand).toBe("ctest --preset default"); + }); +}); + +describe("analyzeWorkspace — Gradle (Java) project", () => { + const root = path.join(fixturesDir, "workspace-gradle"); + + let info: WorkspaceInfo; + beforeAll(async () => { + info = await analyzeWorkspace(root); + }); + + it("detects language as 'java'", () => { + expect(info.language).toBe("java"); + }); + + it("uses 'gradle' as the package manager", () => { + expect(info.packageManager).toBe("gradle"); + }); + + it("uses 'gradle build' as the build command (no gradlew wrapper)", () => { + expect(info.buildCommand).toBe("gradle build"); + }); + + it("uses 'gradle test' as the test command", () => { + expect(info.testCommand).toBe("gradle test"); + }); + + it("reports hasTests as true when src/test exists", () => { + expect(info.hasTests).toBe(true); + }); +}); + +describe("analyzeWorkspace — Gradle (Kotlin DSL) project", () => { + const root = path.join(fixturesDir, "workspace-gradle-kotlin"); + + let info: WorkspaceInfo; + beforeAll(async () => { + info = await analyzeWorkspace(root); + }); + + it("detects language as 'kotlin'", () => { + expect(info.language).toBe("kotlin"); + }); + + it("uses 'gradle' as the package manager", () => { + expect(info.packageManager).toBe("gradle"); + }); +}); + +describe("analyzeWorkspace — Maven project", () => { + const root = path.join(fixturesDir, "workspace-maven"); + + let info: WorkspaceInfo; + beforeAll(async () => { + info = await analyzeWorkspace(root); + }); + + it("detects language as 'java'", () => { + expect(info.language).toBe("java"); + }); + + it("uses 'maven' as the package manager", () => { + expect(info.packageManager).toBe("maven"); + }); + + it("uses 'mvn package -DskipTests' as the build command (no wrapper)", () => { + expect(info.buildCommand).toBe("mvn package -DskipTests"); + }); + + it("uses 'mvn test' as the test command", () => { + expect(info.testCommand).toBe("mvn test"); + }); + + it("reports hasTests as true when src/test exists", () => { + expect(info.hasTests).toBe(true); + }); +}); diff --git a/src/agents/builtin/build-verify.agent.json b/src/agents/builtin/build-verify.agent.json new file mode 100644 index 00000000..6fec92e5 --- /dev/null +++ b/src/agents/builtin/build-verify.agent.json @@ -0,0 +1,12 @@ +{ + "name": "build-verify", + "description": "Build verification agent — compiles the workspace and reports success or failure with compiler diagnostics", + "version": "1.0.0", + "temperature": 0.1, + "skills": ["build-verify"], + "tools": ["shell", "file-read", "file-list"], + "maxIterations": 10, + "constraints": { + "requireConfirmation": [] + } +} diff --git a/src/agents/builtin/test-runner.agent.json b/src/agents/builtin/test-runner.agent.json new file mode 100644 index 00000000..64a992ea --- /dev/null +++ b/src/agents/builtin/test-runner.agent.json @@ -0,0 +1,12 @@ +{ + "name": "test-runner", + "description": "Test execution agent — runs the project test suite, reports failures, and suggests targeted fixes", + "version": "1.0.0", + "temperature": 0.2, + "skills": ["test-writer"], + "tools": ["shell", "file-read", "file-write", "file-edit", "file-list", "code-search"], + "maxIterations": 20, + "constraints": { + "requireConfirmation": [] + } +} diff --git a/src/skills/builtin/build-verify.skill.md b/src/skills/builtin/build-verify.skill.md new file mode 100644 index 00000000..699a76f5 --- /dev/null +++ b/src/skills/builtin/build-verify.skill.md @@ -0,0 +1,43 @@ +--- +name: build-verify +description: Workflow guidance for verifying that a project compiles and links correctly +version: 1.0.0 +slot: section +--- + +## Build Verification Workflow + +The goal of this workflow is to confirm the project compiles cleanly and to surface any errors with actionable context. + +### Step sequence + +1. **Identify the build system** — inspect the workspace root for `CMakeLists.txt`, `Cargo.toml`, `package.json`, `build.gradle`, or `pom.xml` to determine which build tool to invoke. +2. **Install / update dependencies** — run the dependency installation step *before* building: + - CMake: `git submodule update --init --recursive` (if submodules present) + - Node: `npm ci` or `yarn install --frozen-lockfile` + - Rust: `cargo fetch` + - Gradle: `./gradlew dependencies` (optional) +3. **Configure the build** (if required): + - CMake: `cmake -S . -B build [-DCMAKE_BUILD_TYPE=Release]` or `cmake --preset ` + - Gradle: no separate configure step +4. **Compile**: + - CMake: `cmake --build build [--parallel $(nproc)]` or `cmake --build --preset ` + - Node: `npm run build` + - Rust: `cargo build [--release]` + - Gradle: `./gradlew assemble` (compile only, no tests) + - Maven: `mvn package -DskipTests` +5. **Report** — emit a structured summary: overall status (success/failure), number of errors and warnings, and the first 20 lines of compiler output for failures. + +### Error triage heuristics + +- **Linker errors** (`undefined reference`, `unresolved symbol`): check `CMakeLists.txt` for missing `target_link_libraries` entries; for Gradle check `dependencies` block. +- **Missing headers / imports**: confirm that all required packages are declared in the manifest and that dependency installation succeeded in step 2. +- **Type / compilation errors** in generated code: regenerate protobuf, Thrift, or OpenAPI sources before building. +- **Out-of-date build cache**: perform a clean build (`rm -rf build && cmake …` or `cargo clean && cargo build`) to rule out stale artifacts. + +### Parallel build flag + +When invoking multi-core builds, pass a parallelism flag to keep wall-clock time low: +- CMake/Ninja: `--parallel $(nproc)` or `-j$(nproc)` +- Maven: `-T 1C` (one thread per CPU core) +- Gradle: `--parallel` diff --git a/src/skills/builtin/cmake-workflow.skill.md b/src/skills/builtin/cmake-workflow.skill.md new file mode 100644 index 00000000..856f0366 --- /dev/null +++ b/src/skills/builtin/cmake-workflow.skill.md @@ -0,0 +1,82 @@ +--- +name: cmake-workflow +description: CMake-specific build, test, and packaging patterns including preset-based workflows +version: 1.0.0 +slot: section +--- + +## CMake Workflow Guidelines + +### Project layout conventions + +- Source lives in `src/`; headers in `include/`; tests in `tests/` or `test/`. +- Out-of-source builds go in `build/` (excluded from version control via `.gitignore`). +- `CMakeLists.txt` at the repository root is the entry point; each subdirectory may have its own `CMakeLists.txt`. + +### Preset-based workflow (preferred when `CMakePresets.json` exists) + +```bash +# Configure +cmake --preset # e.g. linux-release, debug, ci + +# Build +cmake --build --preset [--parallel $(nproc)] + +# Test +ctest --preset [--output-on-failure] +``` + +List available presets: +```bash +cmake --list-presets # configure presets +cmake --build --list-presets # build presets +ctest --list-presets # test presets +``` + +### Classic out-of-source workflow (no presets) + +```bash +# Configure (Release build, Ninja generator recommended) +cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release + +# Build (parallel) +cmake --build build --parallel $(nproc) + +# Test +cd build && ctest --output-on-failure +``` + +### Dependency management + +- **Submodules**: always run `git submodule update --init --recursive` before configuring. +- **find_package**: ensure system libraries are installed (e.g. `sudo apt install libssl-dev`). +- **FetchContent / CPM.cmake**: dependencies are downloaded during configure; verify internet access or a local cache is available. +- **vcpkg / Conan**: run `vcpkg install` or `conan install .` before `cmake -S . -B build`. + +### Install-step dependencies pattern + +When a project ships a dependency-installation script (e.g. `scripts/install-linux-deps.sh`), run it *before* the CMake configure step: + +```bash +sudo bash scripts/install-linux-deps.sh +git submodule update --init --recursive +cmake --preset +cmake --build --preset --parallel $(nproc) +``` + +### Common CMake variables + +| Variable | Purpose | +|---|---| +| `CMAKE_BUILD_TYPE` | `Debug`, `Release`, `RelWithDebInfo`, `MinSizeRel` | +| `CMAKE_INSTALL_PREFIX` | Install destination (default `/usr/local`) | +| `CMAKE_TOOLCHAIN_FILE` | Cross-compile or vcpkg toolchain | +| `BUILD_SHARED_LIBS` | `ON` to build shared libraries by default | +| `CMAKE_EXPORT_COMPILE_COMMANDS` | `ON` to generate `compile_commands.json` for tooling | + +### Diagnosing build failures + +1. Check the **configure step** output first — missing dependencies abort here. +2. Look for the **first** error in compiler output; subsequent errors are often cascading. +3. Enable verbose output to see exact compiler flags: `cmake --build build --verbose` or `VERBOSE=1 make`. +4. Use `--fresh` flag to force a clean reconfigure: `cmake --fresh --preset `. diff --git a/src/subagents/planner.ts b/src/subagents/planner.ts index 82c77be1..624c6c56 100644 --- a/src/subagents/planner.ts +++ b/src/subagents/planner.ts @@ -83,8 +83,18 @@ function buildPlannerTask( let result = `Task: ${task}\n` + `Workspace: language=${workspaceInfo.language}, framework=${workspaceInfo.framework}, ` + - `packageManager=${workspaceInfo.packageManager}, gitInitialized=${workspaceInfo.gitInitialized}\n` + - `Available tools: ${toolList}`; + `packageManager=${workspaceInfo.packageManager}, gitInitialized=${workspaceInfo.gitInitialized}`; + + // Include lifecycle commands so the planner can generate concrete, workspace-specific steps + const lifecycleLines: string[] = []; + if (workspaceInfo.buildCommand) lifecycleLines.push(`build="${workspaceInfo.buildCommand}"`); + if (workspaceInfo.testCommand) lifecycleLines.push(`test="${workspaceInfo.testCommand}"`); + if (workspaceInfo.lintCommand) lifecycleLines.push(`lint="${workspaceInfo.lintCommand}"`); + if (lifecycleLines.length > 0) { + result += `, ${lifecycleLines.join(", ")}`; + } + + result += `\nAvailable tools: ${toolList}`; if (availableProfiles && availableProfiles.length > 0) { const profileList = availableProfiles.map((p) => `${p.name}: ${p.description}`).join("; "); result += `\nAvailable agent profiles: ${profileList}`; diff --git a/src/workspace.ts b/src/workspace.ts index 84b19524..9ce29008 100644 --- a/src/workspace.ts +++ b/src/workspace.ts @@ -3,11 +3,11 @@ import * as path from "path"; /** Structured information about the project workspace. */ export interface WorkspaceInfo { - /** Primary language detected: 'node', 'python', 'go', or 'unknown'. */ + /** Primary language detected: 'node', 'python', 'go', 'rust', 'cmake', or 'unknown'. */ language: string; /** Framework detected from dependencies (e.g. 'react', 'django'), or 'none'. */ framework: string; - /** Package manager inferred from lock files or language (e.g. 'npm', 'pip'). */ + /** Package manager inferred from lock files or language (e.g. 'npm', 'pip', 'cargo', 'gradle'). */ packageManager: string; /** True if a test directory or test script was found. */ hasTests: boolean; @@ -168,6 +168,115 @@ async function analyzeGo(rootPath: string): Promise> { return info; } +/** + * Analyse a Rust/Cargo workspace. + * Reads Cargo.toml for basic metadata and checks for a `tests/` directory. + */ +async function analyzeCargo(rootPath: string): Promise> { + const info: Partial = { + language: "rust", + packageManager: "cargo", + testCommand: "cargo test", + lintCommand: "cargo clippy", + buildCommand: "cargo build", + }; + + // Override defaults with Makefile targets when available + const make = await parseMakefileTargets(rootPath); + if (make["test"]) info.testCommand = make["test"]; + if (make["lint"]) info.lintCommand = make["lint"]; + if (make["build"]) info.buildCommand = make["build"]; + + // Consider tests present if a tests/ directory or any #[cfg(test)] usage exists + info.hasTests = + (await exists(path.join(rootPath, "tests"))) || + (await exists(path.join(rootPath, "src", "tests"))); + + return info; +} + +/** + * Analyse a CMake workspace. + * Reads CMakeLists.txt for basic metadata and suggests cmake preset commands + * when a CMakePresets.json file is present. + */ +async function analyzeCMake(rootPath: string): Promise> { + const info: Partial = { + language: "cmake", + packageManager: "cmake", + testCommand: "ctest --output-on-failure", + lintCommand: "", + buildCommand: "cmake --build build", + }; + + // When CMakePresets.json is present, recommend the preset-based workflow + if (await exists(path.join(rootPath, "CMakePresets.json"))) { + info.buildCommand = "cmake --preset default && cmake --build --preset default"; + info.testCommand = "ctest --preset default"; + } else { + // Classic out-of-source build pattern + info.buildCommand = "cmake -S . -B build && cmake --build build"; + } + + // Override with Makefile targets when available (common for CMake super-builds) + const make = await parseMakefileTargets(rootPath); + if (make["test"]) info.testCommand = make["test"]; + if (make["build"]) info.buildCommand = make["build"]; + + // Detect tests by presence of a CTestTestfile, tests/ directory, or test subdirectory + info.hasTests = + (await exists(path.join(rootPath, "CTestTestfile.cmake"))) || + (await exists(path.join(rootPath, "tests"))) || + (await exists(path.join(rootPath, "test"))); + + return info; +} + +/** + * Analyse a Gradle (Java/Kotlin/Android) workspace. + */ +async function analyzeGradle(rootPath: string): Promise> { + // Prefer ./gradlew wrapper when present + const gradleCmd = (await exists(path.join(rootPath, "gradlew"))) ? "./gradlew" : "gradle"; + + const info: Partial = { + language: "java", + packageManager: "gradle", + testCommand: `${gradleCmd} test`, + lintCommand: `${gradleCmd} check`, + buildCommand: `${gradleCmd} build`, + }; + + // Check for Kotlin DSL (build.gradle.kts) to refine the language label + if (await exists(path.join(rootPath, "build.gradle.kts"))) { + info.language = "kotlin"; + } + + info.hasTests = (await exists(path.join(rootPath, "src", "test"))); + + return info; +} + +/** + * Analyse a Maven (Java) workspace. + */ +async function analyzeMaven(rootPath: string): Promise> { + // Prefer ./mvnw wrapper when present + const mvnCmd = (await exists(path.join(rootPath, "mvnw"))) ? "./mvnw" : "mvn"; + + const info: Partial = { + language: "java", + packageManager: "maven", + testCommand: `${mvnCmd} test`, + lintCommand: `${mvnCmd} verify`, + buildCommand: `${mvnCmd} package -DskipTests`, + }; + + info.hasTests = (await exists(path.join(rootPath, "src", "test"))); + + return info; +} + /** * Analyse the workspace rooted at `rootPath` and return a `WorkspaceInfo` * object. Language is detected via well-known indicator files; commands are @@ -201,6 +310,17 @@ export async function analyzeWorkspace(rootPath: string): Promise langInfo = await analyzePython(rootPath); } else if (await exists(path.join(rootPath, "go.mod"))) { langInfo = await analyzeGo(rootPath); + } else if (await exists(path.join(rootPath, "Cargo.toml"))) { + langInfo = await analyzeCargo(rootPath); + } else if (await exists(path.join(rootPath, "CMakeLists.txt"))) { + langInfo = await analyzeCMake(rootPath); + } else if ( + (await exists(path.join(rootPath, "build.gradle"))) || + (await exists(path.join(rootPath, "build.gradle.kts"))) + ) { + langInfo = await analyzeGradle(rootPath); + } else if (await exists(path.join(rootPath, "pom.xml"))) { + langInfo = await analyzeMaven(rootPath); } return { ...base, ...langInfo }; From 3e615c933263fb61eaf8f8d9917aeede284cd30c Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 30 Mar 2026 13:25:37 +0000 Subject: [PATCH 3/3] =?UTF-8?q?refactor:=20rewrite=20research=20doc=20to?= =?UTF-8?q?=20focus=20on=20dynamic=20intent=E2=86=92action=20transformatio?= =?UTF-8?q?n,=20remove=20cmake-specific=20skill?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Agent-Logs-Url: https://github.com/huberp/agentloop/sessions/d37e4f03-c06f-4113-a6da-f608dbd6b3f4 Co-authored-by: huberp <4027454+huberp@users.noreply.github.com> --- issues/2.md | 400 +++++++++++------- src/__tests__/builtin-skills.test.ts | 3 +- .../fixtures/workspace-cargo/tests/.keep | 0 .../fixtures/workspace-cmake/tests/.keep | 0 .../workspace-gradle-kotlin/src/test/.keep | 0 .../fixtures/workspace-gradle/src/test/.keep | 0 .../fixtures/workspace-maven/src/test/.keep | 0 src/skills/builtin/cmake-workflow.skill.md | 82 ---- 8 files changed, 246 insertions(+), 239 deletions(-) create mode 100644 src/__tests__/fixtures/workspace-cargo/tests/.keep create mode 100644 src/__tests__/fixtures/workspace-cmake/tests/.keep create mode 100644 src/__tests__/fixtures/workspace-gradle-kotlin/src/test/.keep create mode 100644 src/__tests__/fixtures/workspace-gradle/src/test/.keep create mode 100644 src/__tests__/fixtures/workspace-maven/src/test/.keep delete mode 100644 src/skills/builtin/cmake-workflow.skill.md diff --git a/issues/2.md b/issues/2.md index 402e5406..67177cac 100644 --- a/issues/2.md +++ b/issues/2.md @@ -1,208 +1,298 @@ -## Research: Task Planner and Task Decomposition for Coding Agents +## Research: Intent-to-Action Transformation — How a Generic Workflow Step Becomes Concrete -### 1. Problem Statement +### 1. The Core Problem -Modern AI coding agents need to tackle tasks that span multiple steps, require diverse tools, and benefit from specialised domain knowledge at each step. The key challenge is **task decomposition**: how should a high-level goal (e.g. "add input validation to all POST handlers") be broken into concrete, executable steps that an agent can carry out reliably? +A coding agent receives a generic intent such as **"verify-build"**. This is a template name +that means "compile the project and confirm whether the build succeeds or fails". But the +_concrete steps_ vary entirely by workspace: -A related challenge is **workflow templates**: many coding workflows (build, test, lint, release) have a fixed *shape* but vary in their concrete commands depending on the workspace. Can these patterns be captured once and reused across projects? +- For a CMake project with presets: `cmake --preset linux-release && cmake --build --preset linux-build -j2` +- For a Node.js project: `npm ci && npm run build` +- For a Rust project: `cargo build` +- For a Gradle project: `./gradlew assemble` ---- +The question is: **what is the correct point in the machinery to perform this transformation, +and which components are responsible for deriving the concrete steps?** -### 2. Baseline: The `plan-and-run` Loop in agentloop +--- -agentloop already ships a layered planning architecture: +### 2. What Must NOT Happen — No Hardcoded Instantiation -| Component | Location | Role | -|---|---|---| -| `generatePlan` | `src/subagents/planner.ts` | LLM-powered decomposition of a goal into `PlanStep[]` | -| `refinePlan` | `src/subagents/planner.ts` | Corrects a plan that references unknown tools | -| `validatePlan` | `src/subagents/planner.ts` | Checks that all tool names in the plan are registered | -| `executePlan` | `src/orchestrator.ts` | Runs steps in sequence; supports `retry/skip/abort` on failure and checkpoint/resume | -| `plan` tool | `src/tools/plan.ts` | Exposes plan generation to the agent loop as a callable tool | -| `plan-and-run` tool | `src/tools/plan-and-run.ts` | Combines generation + execution in one tool call | +The transformation must not be done by pre-wiring cmake commands (or any other build-system +commands) into static configuration files. A hardcoded solution: -The planner runs as a **tool-free subagent**: it receives workspace context and a list of available tools, then returns a JSON plan. The orchestrator dispatches each step to a `runSubagent` call (or `SubagentManager` for complex steps) with an iteration budget derived from `estimatedComplexity`. +- Cannot adapt when a project changes its build system or adds presets. +- Does not scale across projects or workspaces. +- Defeats the purpose of a coding agent that is supposed to _reason_ about its environment. -**Key insight already present in agentloop**: each `PlanStep` carries an optional `agentProfile` field. The orchestrator activates the named profile (model, temperature, tool subset) for that step. This enables per-step specialisation without a separate orchestration framework. +The transformation must be agent-driven and workspace-aware at runtime. --- -### 3. What Other Frameworks Do - -#### 3.1 LangGraph (LangChain) -- Models agent behaviour as a **directed graph** of nodes (LLM calls, tool calls) with conditional edges. -- Supports cycles (retry loops), parallel fan-out/fan-in, and human-in-the-loop interrupts. -- Templates are *graph patterns* stored as reusable subgraphs. -- Complexity: full graph authoring required for every new workflow shape. - -#### 3.2 AutoGen (Microsoft) -- Multi-agent conversation: a **Planner** agent, a **Coder** agent, and an **Executor** agent exchange messages until the task is done. -- Task decomposition happens in natural language — the Planner emits step descriptions that the Coder implements. -- Workflow templates are **system prompts** for each role, often provided in a configuration YAML. -- Strength: easy to add domain-expert agents. Weakness: conversation history grows rapidly; quality depends on message-passing discipline. - -#### 3.3 CrewAI -- Defines **Crew** (team of agents), **Agents** (role + backstory + tools), and **Tasks** (description + expected output + dependencies). -- Supports sequential and hierarchical execution; tasks can pass their output as context to dependent tasks. -- Workflow templates are *Crew + Task YAML configurations* that can be parameterised and re-instantiated for different inputs. -- Strong alignment with the "workflow template" concept in this issue: a `BuildVerifyCrew` YAML is a reusable template instantiated per workspace. - -#### 3.4 OpenAI Assistants + Structured Outputs -- Persistent thread context allows multi-turn tasks without re-injecting history. -- `run_step` objects provide a built-in audit trail of each tool call and its output. -- Templates are **Assistant instructions** (system prompt) combined with few-shot examples in the thread. -- Limitation: tied to the OpenAI API; no local model support. - -#### 3.5 Copilot Coding Agent (GitHub Copilot) -The example in this issue shows a subtask with: -```json -{ - "name": "build-verify", - "agent_type": "task", - "description": "Build the plugin to verify changes", - "prompt": "...\nSteps:\n1. Run: sudo bash scripts/install-linux-deps.sh\n2. Run: git submodule update --init --recursive\n3. Run: cmake --preset linux-release\n4. Run: cmake --build --preset linux-build -j2\n\nReport whether the build succeeded or failed..." -} +### 3. Current agentloop Machinery and the Transformation Points + +agentloop already has the primitives for this transformation. The pipeline is: + +``` +Generic intent: "verify-build" + │ + ▼ +[1] analyzeWorkspace() ← detects build system, extracts lifecycle commands + │ WorkspaceInfo { language="cmake", buildCommand="cmake --preset…", … } + ▼ +[2] generatePlan() / planner ← LLM reasons: intent + workspace → concrete PlanStep[] + │ PlanStep { description="Run cmake --preset linux-release …", + │ toolsNeeded=["shell"], agentProfile="build-verify" } + ▼ +[3] executePlan() / orchestrator ← activates per-step agent profile, runs subagent + │ build-verify profile → shell tool, low temperature, build-verify skill + ▼ +[4] StepResult { output="…compiler output…", status="success"|"failed" } ``` -Key observations: -- The template name (`build-verify`) is **stable and reusable** across tasks. -- The concrete steps (cmake preset names, script paths) are **workspace-specific** and were derived from workspace knowledge. -- Instantiation happens once, at workspace-setup time — not re-derived on every task. -- This is equivalent to agentloop's `agentProfile` + `skill` combination, but the steps are baked into the prompt string rather than generated by a planner. +#### Step [1] — `analyzeWorkspace()` in `src/workspace.ts` ---- +This is the **workspace probe**. It inspects the repository root for indicator files +(`CMakeLists.txt`, `Cargo.toml`, `package.json`, `build.gradle`, `pom.xml`, etc.) and extracts +the concrete lifecycle commands (`buildCommand`, `testCommand`, `lintCommand`). The result is a +`WorkspaceInfo` object — the source of truth for what commands the workspace actually uses. -### 4. Template Taxonomy for Coding Agents +This is the right place for build-system detection: once, per session, before planning. + +#### Step [2] — `generatePlan()` / `buildPlannerTask()` in `src/subagents/planner.ts` + +The planner LLM receives the generic intent plus the `WorkspaceInfo` context (including +detected lifecycle commands) and reasons about what concrete steps to produce. This is the +**intent-to-steps transformation**: + +``` +Task: verify-build +Workspace: language=cmake, packageManager=cmake, + build="cmake --preset linux-release && cmake --build --preset linux-build", + test="ctest --preset default" +Available tools: shell, file-read, file-list +``` -Based on the above analysis, coding workflow templates fall into three categories: +The LLM returns a plan with concrete step descriptions drawn from the workspace context. No +static configuration is needed — the planner derives the steps dynamically from what +`analyzeWorkspace()` found. -#### Category A — Build Lifecycle Templates -Fixed structure, workspace-specific commands: +The planner can also annotate each step with an `agentProfile`, directing the orchestrator to +activate a specialised agent (e.g. `build-verify`) for that step. -| Template | Shape | Workspace-specific parts | -|---|---|---| -| `build-verify` | configure → compile → report | preset name, script paths, parallelism flag | -| `clean-build` | clean → configure → compile | build directory, preset/profile | -| `release-package` | build → test → package → sign | packaging format, signing key | +#### Step [3] — `executePlan()` / `runStep()` in `src/orchestrator.ts` -#### Category B — Quality Gate Templates -Fixed checklist, tool-specific commands: +The orchestrator executes each step as a `runSubagent` call. When a step carries an +`agentProfile` annotation, `activateProfile()` loads the profile (tools, model, temperature, +skills). The agent then has both the **concrete step description** (from the planner) and the +**domain guidance** (from its skill) to execute reliably. -| Template | Shape | Workspace-specific parts | -|---|---|---| -| `test-and-fix` | run tests → parse failures → locate code → fix → re-run | test runner command, test output format | -| `lint-and-format` | run linter → parse output → apply fixes → re-verify | linter binary, fix flags | -| `security-scan` | run scanner → parse findings → generate report | scanner CLI, severity threshold | +--- -#### Category C — Development Workflow Templates -Higher-level patterns: +### 4. Which Interaction Patterns from the Baseline Research Are Essential -| Template | Shape | Notes | -|---|---|---| -| `feature-branch` | branch → implement → test → PR | Uses git tools + planner | -| `dependency-update` | audit → update → test → commit | Integrates vulnerability check | -| `hotfix` | branch from tag → apply fix → test → backport | Requires git-log, cherry-pick | +The baseline branch (`copilot/research-agent-fws`) identified eight gaps in agentloop. For +intent-to-action transformation, three are directly essential: --- -### 5. How agentloop Can Implement Workflow Templates +#### 4.1 Plan-Execute-Verify Loop (Baseline Issue 3) — **Critical** -agentloop's existing primitives map cleanly onto the template concept: +**Why it matters for "verify-build"**: The word "verify" in the intent means the agent must +_confirm_ that the build succeeded — not merely that the build process ran without throwing an +exception. Today, `executePlan()` marks a step as `status: "success"` as soon as the subagent +returns without throwing. A build that silently failed (zero exit code but wrong output, a +`make` that skipped targets, a test that passed vacuously) is indistinguishable from a correct +build. -#### 5.1 Templates as Agent Profiles + Skills (recommended) +**The missing piece**: A `VerificationAgent` (proposed in Issue 3) runs after each step and +produces a structured `VerificationResult { passed, reasoning, issues[] }`. For a build step, +the verifier checks: "Does the output contain evidence of a successful compilation? Are there +error messages? Is the binary present?" -A workflow template = **agent profile** (what tools, model, iteration budget) + **skill** (domain knowledge, step sequence, error heuristics). +**Dynamic replanning on failure**: When the verifier flags the build as failed, the system +calls `refinePlan()` with the verifier's feedback (e.g., "missing dependency X"). The +orchestrator replaces the remaining steps with a revised plan that installs the dependency and +retries the build. This is the essential self-correction loop for "verify-build". -Example — `build-verify` profile (`src/agents/builtin/build-verify.agent.json`): -```json -{ - "name": "build-verify", - "description": "Build verification agent — compiles the workspace and reports success or failure", - "temperature": 0.1, - "skills": ["build-verify"], - "tools": ["shell", "file-read", "file-list"], - "maxIterations": 10 -} +**Interaction pattern** (from Issue 3): ``` +executePlan() + └─ for each step: + ├─ runStep() ← executes the build command + ├─ verifyStep() ← checks build output for success/failure + │ ├─ pass → next step + │ └─ fail → refinePlan(feedback) → re-execute + └─ checkpoint.save() +``` + +**Without this pattern**, a "verify-build" intent can only execute the build — it cannot +actually verify the outcome. -The paired `build-verify` skill (`src/skills/builtin/build-verify.skill.md`) injects: -- Step sequence (identify build system → install deps → configure → compile → report) -- Error triage heuristics (linker errors, missing headers, stale cache) -- Parallelism flags per build tool +--- -The planner can then annotate a step with `"agentProfile": "build-verify"` and the orchestrator will activate the matching profile for that step — automatically binding the right skill, tool subset, and temperature. +#### 4.2 Dynamic Task Decomposition (Baseline Issue 4) — **Important** -#### 5.2 Templates as Planner Context (workspace-aware instantiation) +**Why it matters**: The "verify-build" intent may require sub-steps that cannot be known at +planning time. For example: +- The planner produces a step "run the build" +- During execution, the build fails with "submodules not initialised" +- The agent needs to inject a sub-step "git submodule update --init --recursive" _before_ + retrying the build -The planner prompt includes `workspaceInfo` fields including the detected lifecycle commands (`buildCommand`, `testCommand`, `lintCommand`). This allows the planner to produce **concrete, workspace-specific steps** in one shot: +This is addressed by **Dynamic Task Decomposition** (Issue 4, section 4): a complex step can +call a `decompose_task` tool at runtime to inject new sub-steps immediately after the current +step. The orchestrator's `executePlan()` maintains a mutable steps list and inserts the new +steps in-place. +**Interaction pattern**: ``` -Workspace: language=cmake, packageManager=cmake, - build="cmake --preset linux-release && cmake --build --preset linux-build", - test="ctest --preset linux-test" +executePlan() + ├─ steps = [... mutable list ...] + └─ step i: "Run build" + └─ subagent calls decompose_task({newSteps: [ + { description: "Init submodules", … }, + { description: "Re-run build", … } + ]}) + → steps[i+1..] = [init-submodules, re-run-build, ...original-remaining-steps] ``` -The planner output then directly embeds the correct commands rather than using a generic placeholder. +**Without this pattern**, intent-to-action transformation is only as good as the planner's +initial plan. When the environment deviates from expectations (missing deps, wrong tool +version, first-time setup required), the agent has no way to adapt mid-execution. -#### 5.3 Template Instantiation: Agent vs Static +--- -| Approach | When to use | Trade-offs | -|---|---|---| -| **Planner-time instantiation** (current) | Novel tasks, unknown workspaces | Flexible, adapts to workspace; requires LLM call | -| **Profile+skill pre-configuration** (new) | Recurring workflows (CI, build-verify) | Fast, deterministic, version-controlled; less adaptive | -| **Hybrid** (recommended) | Plan overall task, but use pre-defined profiles per step | Best of both worlds | +#### 4.3 Hierarchical Delegation (Baseline Issue 4) — **Architectural** -The hybrid approach is already supported: the planner annotates `agentProfile` on steps, and the orchestrator activates the profile. Adding skills that encode the step sequence means the profile-activated agent "knows" the right procedure without the planner having to enumerate every sub-step. +**Why it matters**: At a higher level of organisation, a _coordinator agent_ can receive the +"verify-build" intent and delegate workspace analysis and step instantiation to a child agent. +This is the **Hierarchical pattern** from Issue 4. + +**Interaction pattern**: +``` +Coordinator receives: "verify-build" + └─ calls delegate_subagent("workspace-analyst") + └─ workspace-analyst: reads workspace, returns WorkspaceInfo + recommended steps + └─ coordinator constructs a plan from the returned recommendations + └─ calls delegate_subagent("build-verify") with concrete step descriptions + └─ build-verify agent: executes build, returns structured result + └─ coordinator synthesises final report +``` + +This pattern separates concerns cleanly: the coordinator holds the intent, the workspace +analyst provides grounding, and the build-verify agent executes. Today's planner partially +plays the coordinator role, but it cannot delegate to a workspace analyst because it is a +tool-free subagent that only outputs JSON. Hierarchical delegation would allow the planner to +_actively probe_ the workspace via tool calls rather than relying on pre-computed +`WorkspaceInfo`. --- -### 6. Concrete Example: CMake Build-Verify Flow - -**Goal**: "Build the plugin to verify changes compile correctly" - -**Planner output** (with workspace context `build="cmake --preset linux-release && cmake --build --preset linux-build -j2"`): - -```json -{ - "steps": [ - { - "description": "Install Linux build dependencies", - "toolsNeeded": ["shell"], - "estimatedComplexity": "low", - "agentProfile": "devops" - }, - { - "description": "Update git submodules", - "toolsNeeded": ["shell"], - "estimatedComplexity": "low", - "agentProfile": "devops" - }, - { - "description": "Build the project using cmake --preset linux-release && cmake --build --preset linux-build -j2 and report compiler output", - "toolsNeeded": ["shell"], - "estimatedComplexity": "medium", - "agentProfile": "build-verify" - } - ] -} -``` +#### 4.4 Toolbox Refiner (Baseline Issue 5) — **Supporting** -The `build-verify` agent profile activates the `build-verify` skill, which provides the step sequence and error triage guidance. The concrete commands come from `workspaceInfo.buildCommand`, injected into the planner prompt. +**Why it matters**: The build-verify agent only needs `shell`, `file-read`, and `file-list`. +Exposing all 16+ registered tools dilutes the agent's focus and wastes context budget. The +**Toolbox Refiner** (Issue 5) dynamically narrows the exposed tool set per invocation based on +the step's declared `toolsNeeded` list and the task description. + +This is already partially addressed by the profile-based `tools[]` list in agent profiles. +The Toolbox Refiner would make this dynamic (keyword or embedding matching) rather than +requiring a manually-maintained allowlist per profile. --- -### 7. Recommendations and Gaps Addressed +### 5. The Role of Templates in the Dynamic System + +Templates (agent profiles + skills) play a supporting role — they are **not** the source of +concrete steps. Their actual function is: -| Gap | Solution implemented | +| Template element | Role | |---|---| -| Planner didn't know lifecycle commands | ✅ `buildPlannerTask` now includes `build`, `test`, `lint` commands from `WorkspaceInfo` | -| Only Node/Python/Go workspace detection | ✅ Added CMake, Rust/Cargo, Gradle, Maven analyzers in `workspace.ts` | -| No build-workflow agent profile | ✅ `build-verify.agent.json` and `test-runner.agent.json` added | -| No build-workflow skill | ✅ `build-verify.skill.md` and `cmake-workflow.skill.md` added | +| Agent profile (`tools`, `temperature`, `maxIterations`) | Shapes the execution environment for a step | +| Skill (`promptFragment`) | Provides domain guidance to the agent running the step — what to look for, what errors mean, how to report | + +The **concrete steps** always come from the planner, which derives them from: +1. The generic intent ("verify-build") +2. The workspace context (`WorkspaceInfo` from `analyzeWorkspace()`) +3. The available agent profiles (the planner can annotate `agentProfile` per step) + +A `build-verify` profile + skill gives the executing agent the knowledge to: +- Identify which build system is in use (from the workspace `language` field) +- Interpret compiler output (error triage heuristics in the skill) +- Produce a structured success/failure report + +But the specific commands to run come from the workspace analysis, injected into the planner +context at planning time. + +--- + +### 6. Recommended Interaction Pattern: Full "verify-build" Flow -### 8. Remaining Open Questions +Combining the above, the complete agent-driven "verify-build" flow using agentloop components: -1. **Template registry**: Should templates be discoverable at runtime (e.g. `list-templates` tool) so the planner can reference them by name? The current profile registry partially serves this role. -2. **Workspace-once vs task-every-time**: For expensive workspace analysis (submodule init, dependency install), should a "workspace setup" template run once at session start and cache results? This aligns with CrewAI's `before_kickoff` hook concept. -3. **Multi-repo / monorepo**: `analyzeWorkspace` currently detects one build system per root. Monorepos with mixed build systems (e.g. a CMake C++ library + a Node.js frontend) need a recursive scan. -4. **Template versioning**: When the workspace changes (new preset, renamed script), how are baked templates kept in sync? A solution is to keep commands in `WorkspaceInfo` (auto-detected) rather than hard-coding them in profile prompts. +``` +User: "verify-build" + │ + ▼ +[A] analyzeWorkspace(rootPath) + → WorkspaceInfo { buildCommand="cmake --preset …", language="cmake", … } + │ + ▼ +[B] generatePlan("verify-build", workspaceInfo, registry, profileRegistry) + → Plan { + steps: [ + { description: "Run: cmake --preset …", + toolsNeeded: ["shell"], agentProfile: "build-verify" }, + { description: "Report build result", + toolsNeeded: [], agentProfile: "build-verify" } + ] + } + │ + ▼ +[C] executePlan(plan, registry, { verificationEnabled: true, task: "verify-build", workspaceInfo }) + │ + ├─ step 0: runStep() → shell("cmake --preset …") → output + │ verifyStep() → VerificationResult { passed, reasoning, issues } + │ └─ fail? → refinePlan(feedback) → re-execute + │ + └─ step 1: runStep() → agent synthesises report from step 0 output + verifyStep() → confirm report contains success/failure conclusion + │ + ▼ +ExecutionResult { stepResults, success, verificationResults } +``` + +The key properties of this flow: +- **Generic intent, concrete execution**: "verify-build" is never mapped to cmake commands in + config — the planner derives them from workspace analysis. +- **Self-correcting**: the PEV loop (Issue 3) catches silent failures and replans. +- **Extensible**: adding support for a new build system requires only updating + `analyzeWorkspace()` — no profile or template changes needed. +- **Composable**: the same flow applies to "run-tests", "lint", or any other lifecycle intent. + +--- + +### 7. Gap Summary Relative to Baseline Research + +| Baseline Issue | Pattern | Essential for "verify-build"? | Current status | +|---|---|---|---| +| Issue 3 | Plan-Execute-Verify loop | ✅ Critical — without it, "verify" is just "run" | ❌ Not yet implemented | +| Issue 3 | Dynamic replanning on verification failure | ✅ Critical — enables self-correction | ❌ Not yet implemented | +| Issue 4 | Dynamic task decomposition | ✅ Important — handles mid-execution surprises | ❌ Not yet implemented | +| Issue 4 | Hierarchical delegation | 🔶 Architectural — enables active workspace probing | ❌ Not yet implemented | +| Issue 5 | Toolbox Refiner | 🔶 Supporting — reduces noise in build agent | ❌ Not yet implemented | +| Issue 2 | Persistent memory | 🔶 Optional — cache workspace analysis across sessions | ❌ Not yet implemented | + +### 8. What Has Been Improved in This PR + +| Change | Effect | +|---|---| +| `analyzeWorkspace()` now detects CMake, Cargo, Gradle, Maven | Workspace analysis returns concrete lifecycle commands for more build systems | +| `buildPlannerTask()` includes `buildCommand`/`testCommand`/`lintCommand` | Planner receives concrete command strings → produces workspace-specific plan steps without hardcoding | +| `build-verify` and `test-runner` agent profiles | Execution environment for build/test steps — define which tools and temperature are appropriate | +| `build-verify` skill | Domain guidance injected into the build agent — how to identify the build system, interpret output, triage errors | + +These improvements advance Step [1] (workspace analysis) and Step [2] (planner context) of the +transformation pipeline. Steps [3] and [4] (verification and dynamic replanning) require the +Plan-Execute-Verify implementation from Issue 3 to be complete. diff --git a/src/__tests__/builtin-skills.test.ts b/src/__tests__/builtin-skills.test.ts index b2107d10..3c96455d 100644 --- a/src/__tests__/builtin-skills.test.ts +++ b/src/__tests__/builtin-skills.test.ts @@ -21,10 +21,9 @@ describe("built-in skill library", () => { "git-workflow", "security-auditor", "build-verify", - "cmake-workflow", ]; - it("loads all 7 built-in skills", () => { + it("loads all 6 built-in skills", () => { const names = registry.list().map((s) => s.name); for (const name of BUILTIN_NAMES) { expect(names).toContain(name); diff --git a/src/__tests__/fixtures/workspace-cargo/tests/.keep b/src/__tests__/fixtures/workspace-cargo/tests/.keep new file mode 100644 index 00000000..e69de29b diff --git a/src/__tests__/fixtures/workspace-cmake/tests/.keep b/src/__tests__/fixtures/workspace-cmake/tests/.keep new file mode 100644 index 00000000..e69de29b diff --git a/src/__tests__/fixtures/workspace-gradle-kotlin/src/test/.keep b/src/__tests__/fixtures/workspace-gradle-kotlin/src/test/.keep new file mode 100644 index 00000000..e69de29b diff --git a/src/__tests__/fixtures/workspace-gradle/src/test/.keep b/src/__tests__/fixtures/workspace-gradle/src/test/.keep new file mode 100644 index 00000000..e69de29b diff --git a/src/__tests__/fixtures/workspace-maven/src/test/.keep b/src/__tests__/fixtures/workspace-maven/src/test/.keep new file mode 100644 index 00000000..e69de29b diff --git a/src/skills/builtin/cmake-workflow.skill.md b/src/skills/builtin/cmake-workflow.skill.md deleted file mode 100644 index 856f0366..00000000 --- a/src/skills/builtin/cmake-workflow.skill.md +++ /dev/null @@ -1,82 +0,0 @@ ---- -name: cmake-workflow -description: CMake-specific build, test, and packaging patterns including preset-based workflows -version: 1.0.0 -slot: section ---- - -## CMake Workflow Guidelines - -### Project layout conventions - -- Source lives in `src/`; headers in `include/`; tests in `tests/` or `test/`. -- Out-of-source builds go in `build/` (excluded from version control via `.gitignore`). -- `CMakeLists.txt` at the repository root is the entry point; each subdirectory may have its own `CMakeLists.txt`. - -### Preset-based workflow (preferred when `CMakePresets.json` exists) - -```bash -# Configure -cmake --preset # e.g. linux-release, debug, ci - -# Build -cmake --build --preset [--parallel $(nproc)] - -# Test -ctest --preset [--output-on-failure] -``` - -List available presets: -```bash -cmake --list-presets # configure presets -cmake --build --list-presets # build presets -ctest --list-presets # test presets -``` - -### Classic out-of-source workflow (no presets) - -```bash -# Configure (Release build, Ninja generator recommended) -cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release - -# Build (parallel) -cmake --build build --parallel $(nproc) - -# Test -cd build && ctest --output-on-failure -``` - -### Dependency management - -- **Submodules**: always run `git submodule update --init --recursive` before configuring. -- **find_package**: ensure system libraries are installed (e.g. `sudo apt install libssl-dev`). -- **FetchContent / CPM.cmake**: dependencies are downloaded during configure; verify internet access or a local cache is available. -- **vcpkg / Conan**: run `vcpkg install` or `conan install .` before `cmake -S . -B build`. - -### Install-step dependencies pattern - -When a project ships a dependency-installation script (e.g. `scripts/install-linux-deps.sh`), run it *before* the CMake configure step: - -```bash -sudo bash scripts/install-linux-deps.sh -git submodule update --init --recursive -cmake --preset -cmake --build --preset --parallel $(nproc) -``` - -### Common CMake variables - -| Variable | Purpose | -|---|---| -| `CMAKE_BUILD_TYPE` | `Debug`, `Release`, `RelWithDebInfo`, `MinSizeRel` | -| `CMAKE_INSTALL_PREFIX` | Install destination (default `/usr/local`) | -| `CMAKE_TOOLCHAIN_FILE` | Cross-compile or vcpkg toolchain | -| `BUILD_SHARED_LIBS` | `ON` to build shared libraries by default | -| `CMAKE_EXPORT_COMPILE_COMMANDS` | `ON` to generate `compile_commands.json` for tooling | - -### Diagnosing build failures - -1. Check the **configure step** output first — missing dependencies abort here. -2. Look for the **first** error in compiler output; subsequent errors are often cascading. -3. Enable verbose output to see exact compiler flags: `cmake --build build --verbose` or `VERBOSE=1 make`. -4. Use `--fresh` flag to force a clean reconfigure: `cmake --fresh --preset `.