PlanExeOrg · neoneye · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026
diff --git a/database_api/model_planitem.py b/database_api/model_planitem.py
@@ -37,6 +37,7 @@ class PlanState(enum.Enum):
     processing = 2
     completed = 3
     failed = 4
+    stopped = 5
 
 
 class PlanItem(db.Model):

diff --git a/docs/mcp/autonomous_agent_guide.md b/docs/mcp/autonomous_agent_guide.md
@@ -68,6 +68,7 @@ Poll every 5 minutes. State transitions:
 - `processing` → Pipeline is running
 - `completed` → Report is ready
 - `failed` → Terminal error (use `plan_resume` or `plan_retry`)
+- `stopped` → User called `plan_stop` (use `plan_resume` to continue or `plan_retry` to restart)
 
 ### Step 6: Handle failures
 

diff --git a/docs/mcp/planexe_mcp_interface.md b/docs/mcp/planexe_mcp_interface.md
@@ -15,7 +15,7 @@ The plan is a **project plan**: a DAG of steps (Luigi pipeline stages) that prod
 Implementors should expose the following to agents so they understand what PlanExe does:
 
 - **What:** PlanExe turns a plain-English goal into a strategic project-plan draft (20+ sections) in ~10–20 min. Sections include executive summary, interactive Gantt charts, investor pitch, SWOT, governance, team profiles, work breakdown, scenario comparison, expert criticism, and adversarial sections (premortem, self-audit, premise attacks) that stress-test the plan. The output is a draft to refine, not an executable or final document — but it surfaces hard questions the prompter may not have considered.
-- **Required interaction order:** Call `example_plans` (optional) and `example_prompts` first. Optional before `plan_create`: call `model_profiles` to inspect profile guidance and available models in each profile. Then complete a non-tool step: formulate a detailed prompt as flowing prose (not structured markdown), typically ~300-800 words, using the examples as a baseline; include objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria; get user approval. Only after approval, call `plan_create`. Then poll `plan_status` (about every 5 minutes); use `plan_download` (mcp_local helper) or `plan_file_info` (mcp_cloud tool) when complete (`pending`/`processing` = keep polling, `completed` = download now, `failed` = terminal). If a plan fails before completing all steps, call `plan_resume` to continue from where it left off without discarding completed work. Use `plan_retry` for a full restart (plan must be in failed state). Both accept the failed `plan_id` and optional `model_profile` (default `baseline`). To stop, call `plan_stop` with the `plan_id` from `plan_create`.
+- **Required interaction order:** Call `example_plans` (optional) and `example_prompts` first. Optional before `plan_create`: call `model_profiles` to inspect profile guidance and available models in each profile. Then complete a non-tool step: formulate a detailed prompt as flowing prose (not structured markdown), typically ~300-800 words, using the examples as a baseline; include objective, scope, constraints, timeline, stakeholders, budget/resources, and success criteria; get user approval. Only after approval, call `plan_create`. Then poll `plan_status` (about every 5 minutes); use `plan_download` (mcp_local helper) or `plan_file_info` (mcp_cloud tool) when complete (`pending`/`processing` = keep polling, `completed` = download now, `failed` = terminal error, `stopped` = user called plan_stop). If a plan fails or is stopped before completing all steps, call `plan_resume` to continue from where it left off without discarding completed work. Use `plan_retry` for a full restart (plan must be in failed or stopped state). Both accept the `plan_id` and optional `model_profile` (default `baseline`). To stop, call `plan_stop` with the `plan_id` from `plan_create`.
 - **Output:** Self-contained interactive HTML report (~700KB) with collapsible sections and interactive Gantt charts — open in a browser. The zip contains the intermediary pipeline files (md, json, csv) that fed the report.
 
 ### 1.3 Scope of this document
@@ -71,10 +71,10 @@ The interface is designed to support:
 
 The MCP specification defines two different mechanisms:
 
-- **MCP tools** (e.g. plan_create, plan_status, plan_stop, plan_retry, plan_resume): the server exposes named tools; the client calls them and receives a response. PlanExe's interface is **tool-based**: the agent calls plan_create → receives plan_id → polls plan_status → optionally calls plan_resume or plan_retry on failed → uses plan_file_info (and optionally plan_download via mcp_local). This document specifies those tools.
+- **MCP tools** (e.g. plan_create, plan_status, plan_stop, plan_retry, plan_resume): the server exposes named tools; the client calls them and receives a response. PlanExe's interface is **tool-based**: the agent calls plan_create → receives plan_id → polls plan_status → optionally calls plan_resume or plan_retry on failed/stopped → uses plan_file_info (and optionally plan_download via mcp_local). This document specifies those tools.
 - **MCP tasks protocol** ("Run as task" in some UIs): a separate mechanism where the client can run a tool "as a task" using RPC methods such as tasks/run, tasks/get, tasks/result, tasks/cancel, tasks/list, so the tool runs in the background and the client polls for results.
 
-PlanExe **does not** use or advertise the MCP tasks protocol. Implementors and clients should use the **tools only**. Do not enable "Run as task" for PlanExe; many clients (e.g. Cursor) and the Python MCP SDK do not support the tasks protocol properly. Intended flow: optionally call `example_plans`; call `example_prompts`; optionally call `model_profiles`; perform the non-tool prompt drafting/approval step; call `plan_create`; poll `plan_status`; if failed call `plan_resume` to continue or `plan_retry` for a full restart (optional); then call `plan_file_info` (or `plan_download` via mcp_local) when completed.
+PlanExe **does not** use or advertise the MCP tasks protocol. Implementors and clients should use the **tools only**. Do not enable "Run as task" for PlanExe; many clients (e.g. Cursor) and the Python MCP SDK do not support the tasks protocol properly. Intended flow: optionally call `example_plans`; call `example_prompts`; optionally call `model_profiles`; perform the non-tool prompt drafting/approval step; call `plan_create`; poll `plan_status`; if failed or stopped call `plan_resume` to continue or `plan_retry` for a full restart (optional); then call `plan_file_info` (or `plan_download` via mcp_local) when completed.
 
 ---
 
@@ -99,7 +99,7 @@ A single execution attempt inside a plan.
 
 **Key properties**
 
-- state: pending | processing | completed | failed
+- state: pending | processing | completed | failed | stopped
 - progress_percentage: computed progress percentage (float)
 - started_at, ended_at
 
@@ -137,13 +137,16 @@ The public MCP `state` field is aligned with `PlanItem.state`:
 - processing (picked up by a worker)
 - completed
 - failed
+- stopped (user called `plan_stop`)
 
 ### 5.2 Allowed transitions
 
 - pending → processing when picked up by a worker
 - processing → completed via normal success
 - processing → failed via error
+- processing → stopped via `plan_stop`
 - failed → pending when `plan_retry` or `plan_resume` is accepted
+- stopped → pending when `plan_retry` or `plan_resume` is accepted
 
 ### 5.3 Invalid transitions
 
@@ -332,10 +335,11 @@ Returns plan status and progress. Used for progress bars and UI states. **Pollin
 - `processing`: picked up by a worker and in progress. Keep polling.
 - `completed`: terminal success. Download artifacts now.
 - `failed`: terminal error. Do not keep polling for completion.
+- `stopped`: user called `plan_stop`. Consider `plan_resume` to continue or `plan_retry` to restart.
 
 **Terminal states**
 
-- `completed`, `failed`
+- `completed`, `failed`, `stopped`
 
 **Response**
 
@@ -441,7 +445,7 @@ Retries a plan that is currently in `failed` state.
 
 Resume a failed plan without discarding completed intermediary files. Plan generation restarts from the first incomplete step, skipping all steps that already produced output files.
 
-Use `plan_resume` when `plan_status` shows `failed` and plan generation was interrupted before completing all steps (network drop, timeout, `plan_stop`, worker crash). For a full restart or to change `model_profile`, use `plan_retry` instead.
+Use `plan_resume` when `plan_status` shows `failed` or `stopped` and plan generation was interrupted before completing all steps (network drop, timeout, `plan_stop`, worker crash). For a full restart or to change `model_profile`, use `plan_retry` instead.
 
 **Request**
 
@@ -566,7 +570,7 @@ Recommended practice for MCP clients:
 Additional semantics:
 
 - Every `plan_create` call creates a new independent plan with a new `plan_id`.
-- `plan_retry` and `plan_resume` reuse the existing failed `plan_id` (they do not create a new plan id).
+- `plan_retry` and `plan_resume` reuse the existing failed or stopped `plan_id` (they do not create a new plan id).
 - The server does not deduplicate “same prompt” requests into a single shared plan.
 - Keep your own plan registry/client state if you run multiple plans concurrently.
 
@@ -610,8 +614,8 @@ Cloud/core tool codes:
 - `INVALID_TOOL`: unknown MCP tool name.
 - `INTERNAL_ERROR`: uncaught server error.
 - `PLAN_NOT_FOUND`: plan_id not found.
-- `PLAN_NOT_FAILED`: plan_retry called for a plan that is not in failed state.
-- `PLAN_NOT_RESUMABLE`: plan_resume called for a plan that is not in failed state.
+- `PLAN_NOT_FAILED`: plan_retry called for a plan that is not in failed or stopped state.
+- `PLAN_NOT_RESUMABLE`: plan_resume called for a plan that is not in failed or stopped state.
 - `PIPELINE_VERSION_MISMATCH`: plan_resume snapshot was created by a different pipeline version; use plan_retry instead.
 - `INVALID_USER_API_KEY`: provided user_api_key is invalid.
 - `USER_API_KEY_REQUIRED`: deployment requires user_api_key for plan_create.
@@ -636,8 +640,8 @@ Local proxy specific codes:
   - `USER_API_KEY_REQUIRED`
   - `INSUFFICIENT_CREDITS`
   - `INVALID_TOOL`
-- For `PLAN_NOT_FAILED`: call `plan_retry` only after `plan_status.state == failed`.
-- For `PLAN_NOT_RESUMABLE`: call `plan_resume` only after `plan_status.state == failed`.
+- For `PLAN_NOT_FAILED`: call `plan_retry` only after `plan_status.state` is `failed` or `stopped`.
+- For `PLAN_NOT_RESUMABLE`: call `plan_resume` only after `plan_status.state` is `failed` or `stopped`.
 - For `PIPELINE_VERSION_MISMATCH`: the snapshot is incompatible with the current pipeline; use `plan_retry` for a clean restart.
 - For `PLAN_NOT_FOUND`: verify plan_id source and stop polling that id.
 - For `generation_failed`: treat as terminal failure and surface plan progress_message to user.

diff --git a/docs/proposals/111-promising-directions.md b/docs/proposals/111-promising-directions.md
@@ -24,9 +24,10 @@ Agents need PlanExe runs to complete reliably without human intervention. A fail
 | **103** | Pipeline Hardening for Local Models | Fix silent truncation and context-window overflows. Critical for agents running local models where failures are subtle |
 | **113** | LLM Error Traceability | ✅ **Implemented (PR #237)**. `LLMChatError` replaces generic `ValueError` across 38 call sites. Root cause preserved for error classification; `error_id` UUID enables log-to-metrics cross-referencing. Agents can programmatically diagnose failures |
 | **101** | Luigi Resume Enhancements | Webhook hooks on task completion/failure — agents can subscribe to events instead of polling |
-| **114-I1** | Stopped vs Failed State | `plan_stop` and worker crashes both produce `failed` — agents can't distinguish user-initiated stops from actual errors. Add `stop_reason` field or a new `stopped` state |
-| **114-I2** | Failure Diagnostics in `plan_status` | When a plan fails, no `failure_reason`, `failed_step`, or `last_error` is returned. Biggest observability gap — agents can only say "it failed" without explaining why. Extends #113 to the MCP consumer surface |
+| **114-I1** | Stopped vs Failed State | ✅ **Implemented**. Dedicated `PlanState.stopped` enum value — `plan_stop` transitions to `stopped`, not `failed`. Agents can now distinguish user-initiated stops from actual errors. `plan_retry` and `plan_resume` accept both `failed` and `stopped` |
+| **114-I2** | Failure Diagnostics in `plan_status` | When a plan fails, no `failure_reason`, `failed_step`, `last_error`, or `recoverable` flag is returned. Biggest observability gap — agents can only say "it failed" without explaining why or recommending resume vs retry. Extends #113 to the MCP consumer surface |
 | **114-I7** | Stalled-Plan Detection | No `last_progress_at` or `last_llm_call_at` timestamps. Agents can't distinguish "slow step" from "stuck worker". Complements #87 §8 |
+| **114-I10** | Silent Partial Failures in Completed Plans | A plan can reach `completed` with empty or stub-quality sections (e.g. 2/8 experts responding). No `quality_summary` in `plan_status` — agents can't tell if `completed` means "all sections produced quality output" or just "all steps ran." Trust gap for autonomous workflows |
 
 ---
 
@@ -107,9 +108,10 @@ Phase 1: Reliable foundation         (nearly complete)
   ├─ #110 Usage metrics ✅ (PR #219, #236, #237)
   ├─ #113 Error traceability ✅ (PR #237)
   ├─ #58  Prompt boost ⚙️ (open PR #222)
-  ├─ #114-I1 Stopped vs failed state        ← next priority
+  ├─ #114-I1 Stopped vs failed state ✅
   ├─ #114-I2 Failure diagnostics in plan_status  ← next priority (biggest gap)
-  └─ #114-I7 Stalled-plan detection
+  ├─ #114-I7 Stalled-plan detection
+  └─ #114-I10 Silent partial failures in completed plans
 
 Phase 2: Agent-native interface       (next)
   ├─ #86  Remove agent friction points ✅ (PR #223)