Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
6c3bb5e
Add stop_reason field to plan_status response (Proposal 114-I1)
neoneye Mar 11, 2026
812858c
Show "stopped" instead of "failed" for user-initiated stops in frontend
neoneye Mar 11, 2026
cb8de68
Update proposal 114 with I1 implementation notes
neoneye Mar 11, 2026
57d362d
Suppress redundant progress message for user-stopped plans
neoneye Mar 11, 2026
5ca6fc9
Show "stopped" in Flask Admin for user-stopped plans
neoneye Mar 11, 2026
73a57c3
Add dedicated PlanState.stopped enum value (Proposal 114-I1, Option A)
neoneye Mar 11, 2026
9865c5c
Fix stale Option B references in proposal 114
neoneye Mar 11, 2026
393ea1f
Update 114-I1 status in promising-directions.md
neoneye Mar 11, 2026
fa21bca
Fix stopped-state migration to use correct PostgreSQL enum type name
neoneye Mar 11, 2026
91c7366
Fix worker to transition user-stopped plans to PlanState.stopped
neoneye Mar 11, 2026
ddc99c2
Handle stopped status in progress polling iframe
neoneye Mar 11, 2026
3ee23e9
Fix whitespace in plan status text on plan_iframe
neoneye Mar 11, 2026
53b9b37
Remove outdated "Don't close this page" message from progress iframe
neoneye Mar 11, 2026
627ec34
Fix missing space between Status: and state text in plan_iframe
neoneye Mar 11, 2026
7af4b71
Fix spacing between Status: label and state text in plan_iframe
neoneye Mar 11, 2026
53c5195
Update docs to reflect PlanState.stopped across all references
neoneye Mar 11, 2026
0233e8c
Update proposal 114 I1 with implementation details
neoneye Mar 11, 2026
7c6ad53
Incorporate v2 agent perception feedback into proposal 114
neoneye Mar 11, 2026
6dd0ad0
Add 114-I10 (silent partial failures) to promising-directions.md
neoneye Mar 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions database_api/model_planitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ class PlanState(enum.Enum):
processing = 2
completed = 3
failed = 4
stopped = 5


class PlanItem(db.Model):
Expand Down
1 change: 1 addition & 0 deletions docs/mcp/autonomous_agent_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Poll every 5 minutes. State transitions:
- `processing` β†’ Pipeline is running
- `completed` β†’ Report is ready
- `failed` β†’ Terminal error (use `plan_resume` or `plan_retry`)
- `stopped` β†’ User called `plan_stop` (use `plan_resume` to continue or `plan_retry` to restart)

### Step 6: Handle failures

Expand Down
26 changes: 15 additions & 11 deletions docs/mcp/planexe_mcp_interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The plan is a **project plan**: a DAG of steps (Luigi pipeline stages) that prod
Implementors should expose the following to agents so they understand what PlanExe does:

- **What:** PlanExe turns a plain-English goal into a strategic project-plan draft (20+ sections) in ~10–20 min. Sections include executive summary, interactive Gantt charts, investor pitch, SWOT, governance, team profiles, work breakdown, scenario comparison, expert criticism, and adversarial sections (premortem, self-audit, premise attacks) that stress-test the plan. The output is a draft to refine, not an executable or final document β€” but it surfaces hard questions the prompter may not have considered.
- **Required interaction order:** Call `example_plans` (optional) and `example_prompts` first. Optional before `plan_create`: call `model_profiles` to inspect profile guidance and available models in each profile. Then complete a non-tool step: formulate a detailed prompt as flowing prose (not structured markdown), typically ~300-800 words, using the examples as a baseline; include objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria; get user approval. Only after approval, call `plan_create`. Then poll `plan_status` (about every 5 minutes); use `plan_download` (mcp_local helper) or `plan_file_info` (mcp_cloud tool) when complete (`pending`/`processing` = keep polling, `completed` = download now, `failed` = terminal). If a plan fails before completing all steps, call `plan_resume` to continue from where it left off without discarding completed work. Use `plan_retry` for a full restart (plan must be in failed state). Both accept the failed `plan_id` and optional `model_profile` (default `baseline`). To stop, call `plan_stop` with the `plan_id` from `plan_create`.
- **Required interaction order:** Call `example_plans` (optional) and `example_prompts` first. Optional before `plan_create`: call `model_profiles` to inspect profile guidance and available models in each profile. Then complete a non-tool step: formulate a detailed prompt as flowing prose (not structured markdown), typically ~300-800 words, using the examples as a baseline; include objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria; get user approval. Only after approval, call `plan_create`. Then poll `plan_status` (about every 5 minutes); use `plan_download` (mcp_local helper) or `plan_file_info` (mcp_cloud tool) when complete (`pending`/`processing` = keep polling, `completed` = download now, `failed` = terminal error, `stopped` = user called plan_stop). If a plan fails or is stopped before completing all steps, call `plan_resume` to continue from where it left off without discarding completed work. Use `plan_retry` for a full restart (plan must be in failed or stopped state). Both accept the `plan_id` and optional `model_profile` (default `baseline`). To stop, call `plan_stop` with the `plan_id` from `plan_create`.
- **Output:** Self-contained interactive HTML report (~700KB) with collapsible sections and interactive Gantt charts β€” open in a browser. The zip contains the intermediary pipeline files (md, json, csv) that fed the report.

### 1.3 Scope of this document
Expand Down Expand Up @@ -71,10 +71,10 @@ The interface is designed to support:

The MCP specification defines two different mechanisms:

- **MCP tools** (e.g. plan_create, plan_status, plan_stop, plan_retry, plan_resume): the server exposes named tools; the client calls them and receives a response. PlanExe's interface is **tool-based**: the agent calls plan_create β†’ receives plan_id β†’ polls plan_status β†’ optionally calls plan_resume or plan_retry on failed β†’ uses plan_file_info (and optionally plan_download via mcp_local). This document specifies those tools.
- **MCP tools** (e.g. plan_create, plan_status, plan_stop, plan_retry, plan_resume): the server exposes named tools; the client calls them and receives a response. PlanExe's interface is **tool-based**: the agent calls plan_create β†’ receives plan_id β†’ polls plan_status β†’ optionally calls plan_resume or plan_retry on failed/stopped β†’ uses plan_file_info (and optionally plan_download via mcp_local). This document specifies those tools.
- **MCP tasks protocol** ("Run as task" in some UIs): a separate mechanism where the client can run a tool "as a task" using RPC methods such as tasks/run, tasks/get, tasks/result, tasks/cancel, tasks/list, so the tool runs in the background and the client polls for results.

PlanExe **does not** use or advertise the MCP tasks protocol. Implementors and clients should use the **tools only**. Do not enable "Run as task" for PlanExe; many clients (e.g. Cursor) and the Python MCP SDK do not support the tasks protocol properly. Intended flow: optionally call `example_plans`; call `example_prompts`; optionally call `model_profiles`; perform the non-tool prompt drafting/approval step; call `plan_create`; poll `plan_status`; if failed call `plan_resume` to continue or `plan_retry` for a full restart (optional); then call `plan_file_info` (or `plan_download` via mcp_local) when completed.
PlanExe **does not** use or advertise the MCP tasks protocol. Implementors and clients should use the **tools only**. Do not enable "Run as task" for PlanExe; many clients (e.g. Cursor) and the Python MCP SDK do not support the tasks protocol properly. Intended flow: optionally call `example_plans`; call `example_prompts`; optionally call `model_profiles`; perform the non-tool prompt drafting/approval step; call `plan_create`; poll `plan_status`; if failed or stopped call `plan_resume` to continue or `plan_retry` for a full restart (optional); then call `plan_file_info` (or `plan_download` via mcp_local) when completed.

---

Expand All @@ -99,7 +99,7 @@ A single execution attempt inside a plan.

**Key properties**

- state: pending | processing | completed | failed
- state: pending | processing | completed | failed | stopped
- progress_percentage: computed progress percentage (float)
- started_at, ended_at

Expand Down Expand Up @@ -137,13 +137,16 @@ The public MCP `state` field is aligned with `PlanItem.state`:
- processing (picked up by a worker)
- completed
- failed
- stopped (user called `plan_stop`)

### 5.2 Allowed transitions

- pending β†’ processing when picked up by a worker
- processing β†’ completed via normal success
- processing β†’ failed via error
- processing β†’ stopped via `plan_stop`
- failed β†’ pending when `plan_retry` or `plan_resume` is accepted
- stopped β†’ pending when `plan_retry` or `plan_resume` is accepted

### 5.3 Invalid transitions

Expand Down Expand Up @@ -332,10 +335,11 @@ Returns plan status and progress. Used for progress bars and UI states. **Pollin
- `processing`: picked up by a worker and in progress. Keep polling.
- `completed`: terminal success. Download artifacts now.
- `failed`: terminal error. Do not keep polling for completion.
- `stopped`: user called `plan_stop`. Consider `plan_resume` to continue or `plan_retry` to restart.

**Terminal states**

- `completed`, `failed`
- `completed`, `failed`, `stopped`

**Response**

Expand Down Expand Up @@ -441,7 +445,7 @@ Retries a plan that is currently in `failed` state.

Resume a failed plan without discarding completed intermediary files. Plan generation restarts from the first incomplete step, skipping all steps that already produced output files.

Use `plan_resume` when `plan_status` shows `failed` and plan generation was interrupted before completing all steps (network drop, timeout, `plan_stop`, worker crash). For a full restart or to change `model_profile`, use `plan_retry` instead.
Use `plan_resume` when `plan_status` shows `failed` or `stopped` and plan generation was interrupted before completing all steps (network drop, timeout, `plan_stop`, worker crash). For a full restart or to change `model_profile`, use `plan_retry` instead.

**Request**

Expand Down Expand Up @@ -566,7 +570,7 @@ Recommended practice for MCP clients:
Additional semantics:

- Every `plan_create` call creates a new independent plan with a new `plan_id`.
- `plan_retry` and `plan_resume` reuse the existing failed `plan_id` (they do not create a new plan id).
- `plan_retry` and `plan_resume` reuse the existing failed or stopped `plan_id` (they do not create a new plan id).
- The server does not deduplicate β€œsame prompt” requests into a single shared plan.
- Keep your own plan registry/client state if you run multiple plans concurrently.

Expand Down Expand Up @@ -610,8 +614,8 @@ Cloud/core tool codes:
- `INVALID_TOOL`: unknown MCP tool name.
- `INTERNAL_ERROR`: uncaught server error.
- `PLAN_NOT_FOUND`: plan_id not found.
- `PLAN_NOT_FAILED`: plan_retry called for a plan that is not in failed state.
- `PLAN_NOT_RESUMABLE`: plan_resume called for a plan that is not in failed state.
- `PLAN_NOT_FAILED`: plan_retry called for a plan that is not in failed or stopped state.
- `PLAN_NOT_RESUMABLE`: plan_resume called for a plan that is not in failed or stopped state.
- `PIPELINE_VERSION_MISMATCH`: plan_resume snapshot was created by a different pipeline version; use plan_retry instead.
- `INVALID_USER_API_KEY`: provided user_api_key is invalid.
- `USER_API_KEY_REQUIRED`: deployment requires user_api_key for plan_create.
Expand All @@ -636,8 +640,8 @@ Local proxy specific codes:
- `USER_API_KEY_REQUIRED`
- `INSUFFICIENT_CREDITS`
- `INVALID_TOOL`
- For `PLAN_NOT_FAILED`: call `plan_retry` only after `plan_status.state == failed`.
- For `PLAN_NOT_RESUMABLE`: call `plan_resume` only after `plan_status.state == failed`.
- For `PLAN_NOT_FAILED`: call `plan_retry` only after `plan_status.state` is `failed` or `stopped`.
- For `PLAN_NOT_RESUMABLE`: call `plan_resume` only after `plan_status.state` is `failed` or `stopped`.
- For `PIPELINE_VERSION_MISMATCH`: the snapshot is incompatible with the current pipeline; use `plan_retry` for a clean restart.
- For `PLAN_NOT_FOUND`: verify plan_id source and stop polling that id.
- For `generation_failed`: treat as terminal failure and surface plan progress_message to user.
Expand Down
10 changes: 6 additions & 4 deletions docs/proposals/111-promising-directions.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@ Agents need PlanExe runs to complete reliably without human intervention. A fail
| **103** | Pipeline Hardening for Local Models | Fix silent truncation and context-window overflows. Critical for agents running local models where failures are subtle |
| **113** | LLM Error Traceability | βœ… **Implemented (PR #237)**. `LLMChatError` replaces generic `ValueError` across 38 call sites. Root cause preserved for error classification; `error_id` UUID enables log-to-metrics cross-referencing. Agents can programmatically diagnose failures |
| **101** | Luigi Resume Enhancements | Webhook hooks on task completion/failure β€” agents can subscribe to events instead of polling |
| **114-I1** | Stopped vs Failed State | `plan_stop` and worker crashes both produce `failed` β€” agents can't distinguish user-initiated stops from actual errors. Add `stop_reason` field or a new `stopped` state |
| **114-I2** | Failure Diagnostics in `plan_status` | When a plan fails, no `failure_reason`, `failed_step`, or `last_error` is returned. Biggest observability gap β€” agents can only say "it failed" without explaining why. Extends #113 to the MCP consumer surface |
| **114-I1** | Stopped vs Failed State | βœ… **Implemented**. Dedicated `PlanState.stopped` enum value β€” `plan_stop` transitions to `stopped`, not `failed`. Agents can now distinguish user-initiated stops from actual errors. `plan_retry` and `plan_resume` accept both `failed` and `stopped` |
| **114-I2** | Failure Diagnostics in `plan_status` | When a plan fails, no `failure_reason`, `failed_step`, `last_error`, or `recoverable` flag is returned. Biggest observability gap β€” agents can only say "it failed" without explaining why or recommending resume vs retry. Extends #113 to the MCP consumer surface |
| **114-I7** | Stalled-Plan Detection | No `last_progress_at` or `last_llm_call_at` timestamps. Agents can't distinguish "slow step" from "stuck worker". Complements #87 Β§8 |
| **114-I10** | Silent Partial Failures in Completed Plans | A plan can reach `completed` with empty or stub-quality sections (e.g. 2/8 experts responding). No `quality_summary` in `plan_status` β€” agents can't tell if `completed` means "all sections produced quality output" or just "all steps ran." Trust gap for autonomous workflows |

---

Expand Down Expand Up @@ -107,9 +108,10 @@ Phase 1: Reliable foundation (nearly complete)
β”œβ”€ #110 Usage metrics βœ… (PR #219, #236, #237)
β”œβ”€ #113 Error traceability βœ… (PR #237)
β”œβ”€ #58 Prompt boost βš™οΈ (open PR #222)
β”œβ”€ #114-I1 Stopped vs failed state ← next priority
β”œβ”€ #114-I1 Stopped vs failed state βœ…
β”œβ”€ #114-I2 Failure diagnostics in plan_status ← next priority (biggest gap)
└─ #114-I7 Stalled-plan detection
β”œβ”€ #114-I7 Stalled-plan detection
└─ #114-I10 Silent partial failures in completed plans

Phase 2: Agent-native interface (next)
β”œβ”€ #86 Remove agent friction points βœ… (PR #223)
Expand Down
Loading
Loading