feat(qwen-agent): add instrumentation for Qwen-Agent#154
feat(qwen-agent): add instrumentation for Qwen-Agent#154sipercai wants to merge 2 commits intoalibaba:mainfrom
Conversation
Change-Id: I5ab30c79fea5e6f2070ef182da505084092632b2 Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: I6e70c5771f05d4915546bf6c2d4b96dc50582ce7 Co-developed-by: Cursor <noreply@cursor.com>
There was a problem hiding this comment.
Pull request overview
Adds a new LoongSuite OpenTelemetry instrumentation package for the Qwen-Agent framework, integrating ExtendedTelemetryHandler to emit GenAI semantic-convention spans for agent runs, LLM chat calls, ReAct steps, and tool execution, along with CI/tox wiring and test coverage (unit + VCR-based integration).
Changes:
- Introduces
loongsuite-instrumentation-qwen-agentwith wrappers aroundAgent.run,BaseChatModel.chat,Agent._call_llm, andAgent._call_tool. - Adds unit tests validating span creation/hierarchy and VCR-backed integration tests with recorded HTTP cassettes.
- Wires the new package into
tox-loongsuite.iniand GitHub Actions test/lint workflows.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tox-loongsuite.ini | Adds tox envs/deps/commands for qwen-agent tests + lint. |
| .github/workflows/loongsuite_test_0.yml | Adds CI test jobs for qwen-agent across py39–py313 (oldest/latest). |
| .github/workflows/loongsuite_lint_0.yml | Adds CI lint job for qwen-agent package. |
| instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/pyproject.toml | New package metadata, deps, and opentelemetry_instrumentor entry point. |
| instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/src/opentelemetry/instrumentation/qwen_agent/init.py | Instrumentor implementation wiring wrapt wrappers + uninstrumentation. |
| instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/src/opentelemetry/instrumentation/qwen_agent/patch.py | Core wrappers emitting spans for agent/LLM/tools/ReAct. |
| instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/src/opentelemetry/instrumentation/qwen_agent/utils.py | Message/tool/provider conversion helpers for GenAI types. |
| instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/tests/* | Unit tests, VCR integration tests, conftest, and VCR cassettes. |
| instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/README.md | Usage/config docs for the new instrumentation. |
| instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/CHANGELOG.md | Changelog entry for initial release. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _close_active_react_step(handler) | ||
|
|
||
| handler.stop_invoke_agent(invocation) | ||
|
|
There was a problem hiding this comment.
wrap_agent_run starts the invoke_agent span but only calls stop_invoke_agent after the wrapped generator is fully exhausted. If a caller stops iterating early (breaks out, generator is GC’d, etc.), the span (and any open react_step) will remain open and context vars won’t be finalized correctly. Consider wrapping the iteration in a try/finally that always closes any active react_step and stops/fails the invoke_agent span when the generator is closed early (including GeneratorExit).
| except GeneratorExit as e: | |
| # Generator was closed early (e.g., consumer stopped iterating). | |
| # Ensure any open react_step and the invoke_agent span are finalized. | |
| _close_active_react_step(handler) | |
| handler.fail_invoke_agent( | |
| invocation, Error(message=str(e), type=type(e)) | |
| ) | |
| raise |
| def _wrap_streaming_llm_response( | ||
| response_iter: Iterator, invocation: Any, handler: ExtendedTelemetryHandler | ||
| ) -> Iterator: | ||
| """Wrap a streaming LLM response iterator to capture output on completion.""" | ||
| try: | ||
| last_response = None | ||
| first_token = True | ||
| for response in response_iter: | ||
| if first_token: | ||
| invocation.monotonic_first_token_s = timeit.default_timer() | ||
| first_token = False | ||
| last_response = response | ||
| yield response | ||
|
|
||
| if last_response: | ||
| invocation.output_messages = ( | ||
| convert_qwen_messages_to_output_messages(last_response) | ||
| ) | ||
| invocation.response_model_name = invocation.request_model | ||
| invocation.finish_reasons = ["stop"] | ||
|
|
||
| # Check for function calls | ||
| for msg in last_response: | ||
| fc = ( | ||
| msg.function_call | ||
| if hasattr(msg, "function_call") | ||
| else msg.get("function_call") | ||
| if isinstance(msg, dict) | ||
| else None | ||
| ) | ||
| if fc: | ||
| invocation.finish_reasons = ["tool_calls"] | ||
| break | ||
|
|
||
| handler.stop_llm(invocation) | ||
|
|
There was a problem hiding this comment.
_wrap_streaming_llm_response only calls handler.stop_llm(invocation) after the streaming iterator is fully consumed. If the user doesn’t exhaust the stream (common with streaming UIs) and the iterator is closed early, the LLM span will remain open and never be ended/failed. This should use a try/finally around the for loop (and handle GeneratorExit) to ensure stop_llm (or fail_llm) is always called.
| # Handle function role (tool response) | ||
| if role == "function" and content: | ||
| text = _extract_content_text(content) | ||
| parts.append( | ||
| ToolCallResponse( | ||
| id=name or "", |
There was a problem hiding this comment.
convert_qwen_messages_to_input_messages treats tool results only when role == "function", but qwen-agent/DashScope tool messages use role == "tool" (see the VCR cassettes in this PR). As written, tool responses will be recorded as plain Text parts, losing GenAI tool response semantics. Update conversion to recognize role == "tool" (and/or both), and populate ToolCallResponse.id from the tool call id (e.g., msg.id or msg.extra.function_id) rather than the tool name.
| # Handle function role (tool response) | |
| if role == "function" and content: | |
| text = _extract_content_text(content) | |
| parts.append( | |
| ToolCallResponse( | |
| id=name or "", | |
| # Handle function/tool role (tool response) | |
| if role in ("function", "tool") and content: | |
| text = _extract_content_text(content) | |
| # Prefer a stable tool call id from the message when available. | |
| tool_call_id: str = "" | |
| # Try direct id attribute/key. | |
| if hasattr(msg, "id"): | |
| tool_call_id = getattr(msg, "id", "") or "" | |
| elif isinstance(msg, dict): | |
| tool_call_id = msg.get("id") or "" | |
| # Fallback to extra.function_id if present. | |
| if not tool_call_id: | |
| extra = getattr(msg, "extra", None) | |
| if extra is None and isinstance(msg, dict): | |
| extra = msg.get("extra") | |
| if extra is not None: | |
| if isinstance(extra, dict): | |
| tool_call_id = extra.get("function_id") or "" | |
| else: | |
| tool_call_id = getattr(extra, "function_id", "") or "" | |
| # Final fallback: use the tool name (existing behavior) or empty. | |
| if not tool_call_id: | |
| tool_call_id = name or "" | |
| parts.append( | |
| ToolCallResponse( | |
| id=tool_call_id, |
| } | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "role": "tool", | ||
| "content": "An error occurred when calling tool `get_current_weather_test`:\nAttributeError: 'str' object has no attribute 'get'\nTraceback:\n File \"/Users/sipercai/miniforge3/lib/python3.12/site-packages/qwen_agent/agent.py\", line 192, in _call_tool\n tool_result = tool.call(tool_args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/sipercai/project/pyins/team-work/loongsuite-python-agent/instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/tests/test_real_api.py\", line 204, in call\n city = params.get(\"city\", \"unknown\")\n ^^^^^^^^^^\n", | ||
| "name": "get_current_weather_test", | ||
| "extra": { | ||
| "function_id": "call_355b7a9ccc024c71a743ab" | ||
| }, | ||
| "id": "call_355b7a9ccc024c71a743ab" | ||
| } |
There was a problem hiding this comment.
The VCR cassette contains a full local traceback with absolute paths under a developer’s home directory (e.g., /Users/.../instrumentation-loongsuite/.../test_real_api.py) embedded in the recorded request body. This is sensitive/host-specific data and will also make the cassette brittle if the tool output changes. Re-record the cassette after making the tool return a deterministic success value (or scrub tool error content/tracebacks before recording) so no local paths/tracebacks are checked into the repo.
| def call(self, params, **kwargs): | ||
| if isinstance(params, str): | ||
| try: | ||
| params = json.loads(params) | ||
| except Exception: | ||
| params = {"city": params} | ||
| city = ( | ||
| params.get("city", "unknown") | ||
| if isinstance(params, dict) | ||
| else "unknown" | ||
| ) | ||
| return ( | ||
| f"The weather in {city} is sunny and 22 degrees Celsius." | ||
| ) | ||
|
|
There was a problem hiding this comment.
GetCurrentWeatherTool.call() now returns a normal success string for string/dict params, but the recorded VCR cassette test_qwen_agent_with_tool_call.yaml still contains a tool error payload + traceback (it was recorded when this tool raised AttributeError: 'str' object has no attribute 'get'). With VCR replay, the tool output is part of the next request body, so this change will cause cassette mismatch and flaky/failing tests. Re-record/update the cassette to match the current deterministic tool behavior (and avoid recording tracebacks/local paths).
| pip install ./instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent | ||
|
|
||
| # Required: GenAI utilities used by the instrumentation | ||
| pip install ./util/opentelemetry-util-genai |
There was a problem hiding this comment.
This should be done before install loongsuite-instrumentation-qwen-agent.
| license = "Apache-2.0" | ||
| requires-python = ">=3.9" | ||
| authors = [ | ||
| { name = "LoongSuite Python Agent Authors", email = "qp467389@alibaba-inc.com" }, |
There was a problem hiding this comment.
Please fix the authors information, we need your information and opentelemetry authors' one here.
| qwen_agent = "opentelemetry.instrumentation.qwen_agent:QwenAgentInstrumentor" | ||
|
|
||
| [project.urls] | ||
| Homepage = "https://github.com/alibaba/loongsuite-python-agent" |
There was a problem hiding this comment.
| Homepage = "https://github.com/alibaba/loongsuite-python-agent" | |
| Homepage = "https://github.com/alibaba/loongsuite-python-agent/tree/main/instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent" |
| [project] | ||
| name = "loongsuite-instrumentation-qwen-agent" | ||
| dynamic = ["version"] | ||
| description = "OpenTelemetry Qwen-Agent Instrumentation" |
There was a problem hiding this comment.
| description = "OpenTelemetry Qwen-Agent Instrumentation" | |
| description = "LoongSuite Qwen-Agent Instrumentation" |
|
|
||
| ### Added | ||
|
|
||
| - Initial release of `loongsuite-instrumentation-qwen-agent` |
There was a problem hiding this comment.
The format of the changelog file should follow the established conventions. Please refer to:
https://github.com/alibaba/loongsuite-python-agent/blob/main/instrumentation-loongsuite/loongsuite-instrumentation-dashscope/CHANGELOG.md
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| __version__ = "0.1.0" |
There was a problem hiding this comment.
| __version__ = "0.1.0" | |
| __version__ = "0.4.0.dev" |
| return invocation | ||
|
|
||
|
|
||
| def create_agent_invocation( |
There was a problem hiding this comment.
These functions could be defined as internal ones.
| hasattr(agent_instance, "system_message") | ||
| and agent_instance.system_message | ||
| ): | ||
| invocation.system_instruction = [ |
There was a problem hiding this comment.
Could you check whether the system message obtained here conforms to the definition of the gen_ai.system_instructions?
|
|
||
|
|
||
| # Map qwen-agent model_type to provider name | ||
| _MODEL_TYPE_PROVIDER_MAP = { |
There was a problem hiding this comment.
Please double-check the provider mapping. The provider name depends on the vendor providing model service, not the developer of the model.
| try: | ||
| wrap_function_wrapper( | ||
| module=_LLM_MODULE, | ||
| name="BaseChatModel.chat", |
There was a problem hiding this comment.
If this is an abstract class, it may lead to duplicate spans reporting in Proxy/Wrapper scenarios. Other wrapper methods should also pay attention to this issue.
You can refer to #149 (comment) for a similar comment.
Description
This PR adds loongsuite-instrumentation-qwen-agent: OpenTelemetry instrumentation for Qwen-Agent using ExtendedTelemetryHandler from opentelemetry-util-genai. It emits GenAI semantic-convention traces for agent runs, LLM calls, ReAct steps, and tool execution (invoke_agent, chat, react step, execute_tool), registers an opentelemetry_instrumentor entry point, and includes unit tests plus VCR-based integration tests.
Fixes #11
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
pip install -e ".[test]" under instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent, then pytest on that package’s tests/
From repo root: tox -c tox-loongsuite.ini -e py312-test-loongsuite-instrumentation-qwen-agent-oldest, py312-test-loongsuite-instrumentation-qwen-agent-latest, lint-loongsuite-instrumentation-qwen-agent
Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.