-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Sub-Agent task_complete Bug: Race Condition Causes Response Loss
Summary
When sub-agents (spawned via the task tool) are instructed to use the task_complete tool, their responses are not captured by the parent agent, resulting in "General-purpose agent did not produce a response." This is a race condition in the sub-agent framework that affects all Claude models (Opus 4.6, Sonnet 4, Sonnet 4.5, Haiku 4.5), not just Opus 4.6.
Root Cause
The Copilot CLI sub-agent framework extracts responses from the sub-agent's text output stream, not from the task_complete summary. When a sub-agent calls task_complete, it terminates the response stream. This creates a race condition:
Happy path (works):
- Sub-agent emits text response
- Sub-agent calls
task_complete - Parent captures text before stream terminates
- ✅ Response returned successfully
Bug path (fails):
- Sub-agent calls
task_completeas first/only action - Response stream terminates immediately
- Parent receives empty/null response
- ❌ "General-purpose agent did not produce a response"
Evidence
Experimental Results
Spawned sub-agents with identical tasks, varying only whether they're told to use task_complete:
| Model | With task_complete | Without task_complete |
|---|---|---|
| claude-opus-4.6 | 6/13 success (46%) | 2/2 success (100%) |
| claude-sonnet-4.5 | 0/1 fail | — |
| claude-sonnet-4 | 0/1 fail | 1/1 success |
| claude-haiku-4.5 | 0/1 fail | — |
| claude-opus-4.5 | 1/1 success | — |
| gpt-5.1 | 1/1 success | — |
Key findings:
- Bug affects all Claude models as sub-agents, not just Opus 4.6
- Failure is non-deterministic (46% success rate for Opus 4.6)
- GPT models appear unaffected
- Without task_complete instruction: 100% success rate
Non-Deterministic Behavior
Prompt intensity matters:
- Emphatic prompts ("CRITICAL MUST use task_complete ONLY") → 100% failure
- Moderate prompts ("Do X and use task_complete") → ~46% success for Opus 4.6
Why? The more the prompt emphasizes task_complete, the more likely the model calls it before or instead of emitting text, triggering the race condition.
Reproduction
Guaranteed Failure (100%)
task(
agent_type="general-purpose",
prompt="""
Your ONLY job is to call task_complete with summary "Test".
CRITICAL: Use task_complete tool. Do not write any other response.
""",
mode="sync",
model="claude-opus-4.6"
)
# Result: "General-purpose agent did not produce a response"Non-Deterministic Failure (~54%)
task(
agent_type="general-purpose",
prompt="""
Count files in scripts/ directory.
IMPORTANT: You MUST use task_complete when done.
""",
mode="sync",
model="claude-opus-4.6"
)
# Result: Sometimes works, sometimes "did not produce a response"Guaranteed Success (100%)
task(
agent_type="general-purpose",
prompt="""
Count files in scripts/ directory.
DO NOT USE task_complete. Return your findings directly.
""",
mode="sync",
model="claude-opus-4.6"
)
# Result: Always returns responseImpact
Affected:
- All Claude model sub-agents (Opus 4.6, Sonnet 4/4.5, Haiku 4.5)
- Both sync and background modes
- Parent agents cannot reliably use Claude sub-agents if prompts mention task_complete
Not affected:
- GPT models (gpt-5.1, gpt-5.2-codex)
- Claude Opus 4.5 (appears more reliable but not tested extensively)
- Direct agent usage (non-sub-agents)
Workaround
For sub-agent prompts: Never instruct sub-agents to use task_complete. The parent framework already handles completion signaling. Instruct them to return findings directly:
task(
agent_type="general-purpose",
prompt="""
Analyze X and report findings.
DO NOT USE task_complete tool. Return your response text directly.
""",
mode="sync",
model="claude-opus-4.6"
)Success rate: 100% with this workaround
Recommended Fix
The framework should:
- Buffer text output before allowing
task_completeto terminate the stream - Ensure response serialization completes before terminating agent context
- Consider making
task_completea soft signal rather than immediate termination - Or: Auto-suppress
task_completefor sub-agents since parent framework already tracks completion
The fundamental issue is that task_complete terminates execution before ensuring the response text is captured, creating an avoidable race condition.
Environment
- Copilot CLI version: 0.0.405
- OS: macOS (Darwin)
- Affected models: All Claude models (tested: Opus 4.6, Sonnet 4, Sonnet 4.5, Haiku 4.5)
- Working models: GPT-5.1, GPT-5.2-Codex
Related
This issue was initially reported as Opus 4.6-specific, but debugging revealed it affects all Claude models when used as sub-agents. The non-deterministic nature made it appear model-specific initially.