Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent#4275
Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent#4275LEDazzio01 wants to merge 3 commits intomicrosoft:mainfrom
Conversation
When a workflow (e.g., GroupChat) emits an AgentResponse containing the full conversation history (user + assistant messages), WorkflowAgent was converting ALL messages to output, causing user inputs to be re-emitted as assistant responses. This was visible as accumulated user inputs in each successive response. Fix: filter AgentResponse.messages to only include role="assistant" messages in both _convert_workflow_event_to_agent_response_updates (streaming) and _convert_workflow_events_to_agent_response (non-streaming). Fixes microsoft#4261
…4261) Add TestWorkflowAgentUserInputFiltering class with tests verifying that: - Streaming filters user messages from AgentResponse data - Non-streaming filters user messages from AgentResponse data - All-assistant AgentResponse works unchanged (no regression) - All-user AgentResponse produces no updates
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent WorkflowAgent from re-emitting user inputs when a workflow outputs an AgentResponse that contains full conversation history, by filtering AgentResponse.messages to role="assistant" before converting to streamed updates or non-streamed results.
Changes:
- Filter non-assistant messages out of
AgentResponse.messagesin the non-streaming conversion path. - Filter non-assistant messages out of
AgentResponse.messagesin the streaming conversion path. - Add tests validating the new filtering behavior for workflows that output
AgentResponse.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| python/packages/core/agent_framework/_workflows/_agent.py | Filters AgentResponse.messages to assistant-only in streaming and non-streaming conversions. |
| python/packages/core/tests/workflow/test_workflow_agent.py | Adds regression tests for assistant-only filtering when workflow output is an AgentResponse. |
| if isinstance(data, AgentResponse): | ||
| messages.extend(data.messages) | ||
| # Filter to only assistant messages to avoid re-emitting user input | ||
| # that may be included in the AgentResponse's conversation history | ||
| # (e.g., from GroupChat orchestrators that include the full conversation). | ||
| assistant_messages = [msg for msg in data.messages if msg.role == "assistant"] | ||
| messages.extend(assistant_messages) |
There was a problem hiding this comment.
This change filters non-assistant messages only when the workflow output event data is an AgentResponse, but the reported #4261 GroupChat behavior appears to emit list[Message] output (e.g., GroupChat orchestrator yields _full_conversation as list[Message]). In that common path, WorkflowAgent will still surface user-role messages unchanged, so user input can still be re-emitted and compound across turns. Consider either (a) applying equivalent role filtering when converting list[Message] (and possibly Message) outputs that represent full conversation history, or (b) changing GroupChat to yield only new assistant messages (or an AgentResponse containing only assistant messages) when used via workflow.as_agent().
| async def test_streaming_agent_response_filters_user_messages(self): | ||
| """Test that streaming filters out user messages from AgentResponse data. | ||
|
|
||
| When an executor yields an AgentResponse containing both user and assistant | ||
| messages (as GroupChat orchestrators do), only assistant messages should | ||
| appear in the streaming output. | ||
| """ | ||
|
|
||
| @executor | ||
| async def groupchat_like_executor( | ||
| messages: list[Message], ctx: WorkflowContext[Never, AgentResponse] | ||
| ) -> None: | ||
| # Simulate a GroupChat-like executor that emits full conversation history | ||
| response = AgentResponse( | ||
| messages=[ | ||
| Message(role="user", text="hi"), | ||
| Message(role="assistant", text="Hello! How can I help?", author_name="Principal"), | ||
| Message(role="user", text="what is 2+2?"), | ||
| Message(role="assistant", text="2+2 = 4", author_name="Maths Teacher"), | ||
| Message(role="assistant", text="The answer is 4.", author_name="Principal"), | ||
| ], | ||
| ) | ||
| await ctx.yield_output(response) | ||
|
|
||
| workflow = WorkflowBuilder(start_executor=groupchat_like_executor).build() | ||
| agent = workflow.as_agent("groupchat-agent") | ||
|
|
There was a problem hiding this comment.
These tests validate filtering for the AgentResponse output branch, but they don’t cover the GroupChat-style list[Message] output path (which is the likely source of #4261). If the goal is to prevent re-emitting user inputs for GroupChat workflows, add a regression test that runs a minimal GroupChat (or an executor that yields list[Message] conversation history) through workflow.as_agent() and asserts user-role messages are not surfaced.
LEDazzio01
left a comment
There was a problem hiding this comment.
Thanks for the review! Both comments raise the same valid point about the list[Message] and Message branches not being filtered.
Our rationale for scoping the fix to AgentResponse only:
The list[Message] and Message branches represent explicit executor output — the developer directly controls what they yield via ctx.yield_output() or ctx.send_message(). Silently filtering by role in those branches could break intentional behavior (e.g., an executor that deliberately emits a user-role message as part of its output).
In contrast, AgentResponse is a higher-level wrapper that commonly contains full conversation history (as GroupChat orchestrators do), where user messages are incidental rather than intentional output.
The issue reporter's repro (#4261) uses GroupChatBuilder which emits AgentResponse via AgentExecutor, so the fix directly addresses the reported behavior. If GroupChat also has a code path that yields list[Message] with the full conversation, that's arguably a bug in the GroupChat orchestrator itself (it should yield only new assistant messages), not something WorkflowAgent should silently fix.
Happy to extend the filter if maintainers prefer a more defensive approach — just wanted to call out the tradeoff.
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 87%
✓ Correctness
The diff correctly addresses the stated issue (#4261) of filtering out non-assistant messages from AgentResponse in both the streaming and non-streaming code paths. The filtering logic is consistent across both paths, edge cases (all-user, all-assistant, mixed) are tested, and metadata (usage_details, raw_representation, created_at) is still properly accumulated regardless of message filtering. The tests are well-structured and cover the key scenarios.
✓ Security Reliability
This diff is a security improvement that prevents user input from being re-emitted in agent responses when GroupChat-style orchestrators include full conversation history. The filtering uses a strict allowlist (role == "assistant"), which is the safer approach. The implementation is consistent across both streaming and non-streaming paths, and the tests cover normal, all-assistant, and all-user edge cases. No security or reliability issues found.
✓ Test Coverage
The new tests cover the primary streaming and non-streaming filtering paths well, including the happy path with mixed messages, a no-regression case with all-assistant messages, and the edge case of all-user messages. However, there are gaps: no test covers messages with roles other than 'user' and 'assistant' (e.g., 'system' or 'tool'), which the filter would silently drop; the non-streaming path lacks edge-case coverage (all-user, empty messages); and the streaming tests don't verify that author_name is correctly propagated through the filtering to AgentResponseUpdate objects, despite the production code explicitly mapping it.
Suggestions
- The filter excludes all non-assistant roles (not just "user"), which means "system" or "tool" role messages would also be dropped. If AgentResponse can legitimately contain tool-result or system messages that downstream consumers need, this could silently discard them. Consider whether a blocklist (
role != "user") would be safer than an allowlist (role == "assistant"), or document that only assistant messages are ever expected to be forwarded. - In the non-streaming path, when all messages are filtered out the result is an AgentResponse with an empty
messageslist but populatedusage_detailsandraw_representation. Consider whether downstream consumers handle empty message lists gracefully, or whether a log/warning would be helpful for debuggability. - Consider whether messages with roles other than "user" and "assistant" (e.g., "tool", "system") should be preserved. The current allowlist silently drops any non-assistant role, which could cause subtle data loss if future message types are introduced. If this is intentional, a brief comment noting it would help future maintainers.
- The edge case where all messages are filtered out (test_streaming_agent_response_empty_after_filtering) still appends to raw_representations and merges usage_details in the non-streaming path. Verify that downstream consumers handle an AgentResponse with an empty messages list but non-empty raw_representation gracefully.
- Add a test case with a 'system' or 'tool' role message in the AgentResponse to explicitly document whether filtering those out is intended behavior. The current filter drops everything except 'assistant', but only 'user' messages are tested.
- Add a non-streaming counterpart to test_streaming_agent_response_empty_after_filtering to verify the non-streaming path also handles an AgentResponse with only non-assistant messages gracefully (e.g., resulting in an empty messages list and no crash).
- In test_streaming_agent_response_filters_user_messages, verify that author_name is correctly propagated to the AgentResponseUpdate objects, since the production code maps msg.author_name to the update. For example: assert updates[0].author_name == 'Principal'
Automated review by moonbox3's agents
|
|
||
| # Verify the content is correct | ||
| texts = [u.text for u in updates] |
There was a problem hiding this comment.
Consider also asserting update.author_name for each update to verify the production code's author_name=msg.author_name mapping is preserved through filtering. Currently only role and text are verified on streaming updates.
| # Verify the content is correct | |
| texts = [u.text for u in updates] | |
| # Verify all updates are assistant role | |
| for update in updates: | |
| assert update.role == "assistant", f"Expected role='assistant', got role='{update.role}'" | |
| # Verify author_name is propagated correctly | |
| assert updates[0].author_name == "Principal" | |
| assert updates[1].author_name == "Maths Teacher" | |
| assert updates[2].author_name == "Principal" |
| updates.append(update) | ||
|
|
||
| # No assistant messages means no updates | ||
| assert len(updates) == 0 |
There was a problem hiding this comment.
Consider adding a test for non-assistant, non-user roles (e.g., 'system' or 'tool') to explicitly document that the filter drops those as well. This would prevent a future regression if someone expects system messages to pass through.
There was a problem hiding this comment.
Copy all feedback - I'll address tonight.
- Add author_name assertions to streaming filter test - Add test for system/tool role filtering - Add non-streaming edge case for empty-after-filtering
|
@moonbox3 Thanks for the detailed review! I've pushed a commit addressing all your feedback: Changes in
These tests are in a new file |
Summary
Fixes #4261
When a workflow (e.g., GroupChat) emits an
AgentResponsecontaining the full conversation history (user + assistant messages),WorkflowAgentwas converting all messages to output — including user messages. This caused user inputs to be re-emitted as part of the assistant response, with the problem compounding across successive turns as the conversation history grew.Root Cause
In
_workflows/_agent.py, both the streaming path (_convert_workflow_event_to_agent_response_updates) and the non-streaming path (_convert_workflow_events_to_agent_response) iterated over all messages inAgentResponse.messageswithout filtering by role. When GroupChat-style orchestrators emit their full conversation history as anAgentResponse, this meant user messages were surfaced as assistant output.Fix
Filter
AgentResponse.messagesto only include messages withrole="assistant"before converting them to output:Streaming path (
_convert_workflow_event_to_agent_response_updates):Non-streaming path (
_convert_workflow_events_to_agent_response):Tests Added
Added
TestWorkflowAgentUserInputFilteringclass with 4 test cases:AgentResponseappear in streaming outputAgentResponseonly contains assistant messagesAgentResponsecontains only user messagesScope
This fix targets only the
AgentResponsebranches. TheMessageandlist[Message]branches are intentionally left unchanged, as those represent explicit workflow output where the executor has control over what it yields (and existing tests validate this behavior).