Conversation
📝 WalkthroughWalkthroughThis pull request refactors HuggingFace and Transformers LLM/VLM tool-calling functionality by extracting shared utilities into dedicated modules, removing unnecessary dependencies from the example project configuration, and adding comprehensive test coverage and example scripts for tool-calling workflows. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| model_id = "Qwen/Qwen2.5-1.5B-Instruct" | ||
| logger.info(f"Loading model: {model_id}") | ||
|
|
||
| llm = TransformersLLM( |
There was a problem hiding this comment.
Why not show this as an example with the Agent class?
There was a problem hiding this comment.
For consistency with our examples as well
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
plugins/huggingface/tests/test_hf_tool_calling.py (2)
139-164: Prefer a tiny fakeLLMoverMagicMock/AsyncMockin these loop tests.The contract under test is small, and a fake object with real methods would catch signature drift that these loose mocks will not.
As per coding guidelines, "Never mock in tests; use pytest for testing."
Also applies to: 210-223
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 139 - 164, The test uses MagicMock/AsyncMock for llm (including llm._dedup_and_execute and llm._sanitize_tool_output) which can hide signature drift; replace this with a tiny fake LLM class that implements async def _dedup_and_execute(self, ...) returning the two planned side-effect tuples and def _sanitize_tool_output(self, v) returning str(v) so the test exercises real method signatures; update both occurrences (lines around the current llm setup and the similar block at 210-223) to instantiate the fake LLM instead of MagicMock/AsyncMock.
6-14: Avoid importing private helper modules directly from tests.These imports make internal module names part of the test contract, so a helper move or rename becomes a test-breaking API change. Please re-export the supported helpers or colocate these tests with the package-private modules instead of importing
_foomodules from the external test package.As per coding guidelines, "Never import from private modules (
_foo) outside of the package's own__init__.py. Use the public re-export instead."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 6 - 14, Tests are importing private helpers (_hf_tool_calling, _tool_call_loop) directly; instead either re-export the required symbols (accumulate_tool_call_chunk, convert_tools_to_hf_format, extract_tool_calls_from_hf_response, finalize_pending_tool_calls, run_tool_call_loop) from the package's public API (e.g., add them to vision_agents.plugins.huggingface.__init__ or another public module) and update the test imports to use that public path, or move the tests into the package-private module directory so they can import the private modules locally; ensure all references in tests use the public names rather than `_hf_tool_calling`/`_tool_call_loop`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The caller-provided kwargs are not being forwarded into
the HuggingFace request parameter dicts; update the request building in both
places where request_kwargs is constructed (the initial request in the function
and inside _generate_followup) to merge the incoming **kwargs into
request_kwargs (e.g., request_kwargs.update(kwargs) or by including **kwargs
when building the dict) so options like temperature/max_tokens/provider flags
are preserved; ensure you still explicitly set/override required keys like
"model" from kwargs when needed and keep the existing keys ("messages",
"stream", "tools") intact.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 95-121: The loop currently skips triples with a falsey
NormalizedToolCallItem.id so executed tools are dropped; instead, when cid is
missing generate a stable fallback id (e.g., using the loop index or uuid4) and
use that fallback for both assistant_tool_calls and tool_results so every
executed tool produces messages; remove the early continue and ensure the code
still uses tc["name"] and tc.get("arguments_json", {}) for the
assistant_tool_calls and llm._sanitize_tool_output(err if err is not None else
res) for tool_results, preserving the existing structures and returning
llm_response only after processing all triples.
In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py`:
- Around line 136-145: The refactor dropped caller-supplied generation kwargs
when routing through create_hf_response; update the call site in create_response
(where effective_model is computed) to pass through the remaining kwargs (e.g.,
temperature, max_tokens, stop) by merging kwargs into the payload passed to
create_hf_response instead of only sending messages/model/stream/tools; also
apply the same merge pattern in the tool-call follow-up builder in
_hf_tool_calling.py so the tool-call requests (in the function that builds
follow-up HF request payloads) include the original caller kwargs as well.
In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py`:
- Around line 143-165: The current success/failed branching trusts
response.exception but follow-up tool-call failures clear the exception and
return empty text, causing false VLMInferenceCompletedEvent emissions; fix by
ensuring follow-up errors propagate the exception (update the follow-up path in
_hf_tool_calling to set LLMResponseEvent.exception with the underlying error)
and make the caller here (create_response / this response handling block) treat
empty responses as failures too (change the conditional to check if
response.exception is not None OR response.text is falsy, and emit VLMErrorEvent
in that case; otherwise emit VLMInferenceCompletedEvent).
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 340-344: Detected re-entry into create_response() from inside
run_tool_call_loop via _handle_tool_calls which restarts the tool-call loop and
loses caller sampling/kwargs; instead, change the flow so that when
extract_tool_calls_from_text(result.text) yields tool_calls, you do NOT call
create_response() again. Modify _handle_tool_calls (and any helpers it uses) to
return a structured continuation (e.g., updated messages and any tool results or
a flag) back to the current run_tool_call_loop so the existing loop resumes
handling tool calls and preserves the original kwargs and max_new_tokens; remove
any calls to create_response() from within _handle_tool_calls/run_tool_call_loop
and ensure _handle_tool_calls updates messages/state and returns control to the
caller for the next iteration.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`:
- Around line 221-229: The current simple_response flow appends the user's
plain-text turn twice by adding it into image_content and then replaying the
conversation via _build_messages(); to fix, ensure the user text is only added
once: either stop appending the plain-text entry to image_content when the last
built message is already the same user turn, or add the multimodal image_content
as the single user message and avoid replaying/duplicating that same user text
in _build_messages(); locate the multimodal assembly in simple_response
(variables frames_snapshot, image_content, messages and method _build_messages)
and implement the conditional dedupe so the user text is not duplicated when
frames_snapshot is empty.
- Around line 374-379: The auto-dispatch currently returns await
self._handle_tool_calls(...) when tool calls are detected, which causes
recursive loops and loses the caller's sampling settings (e.g., max_new_tokens);
change this to use the shared non-recursive follow-up generator instead of
awaiting _handle_tool_calls directly: in the branch that detects tool_calls
(where extract_tool_calls_from_text is used) return or yield a follow-up from
the central run_tool_call_loop (or a new helper like
_enqueue_tool_calls_followup) that calls the tool-handling logic non-recursively
and forwards the original kwargs (preserving max_new_tokens and sampling
parameters) so run_tool_call_loop/create_response are not re-entered
recursively; also remove any direct create_response() calls from inside
run_tool_call_loop and refactor _handle_tool_calls to be invoked by the
follow-up orchestrator rather than awaited inline.
- Around line 302-305: The retry path in _build_processor_inputs that calls
apply_chat_template() without the tool schema must also clear the tools flag so
downstream logic doesn't treat the output as if tools were provided; update
_build_processor_inputs (where it retries without tools) to set tools_param =
None (or a clear boolean like tools_present=False) when it falls back, and
ensure the caller code that checks "if tools_param and output_text:" (or the
variable output_text) uses that updated flag so JSON-like model outputs aren't
misclassified as tool calls when no schema was shown.
---
Nitpick comments:
In `@plugins/huggingface/tests/test_hf_tool_calling.py`:
- Around line 139-164: The test uses MagicMock/AsyncMock for llm (including
llm._dedup_and_execute and llm._sanitize_tool_output) which can hide signature
drift; replace this with a tiny fake LLM class that implements async def
_dedup_and_execute(self, ...) returning the two planned side-effect tuples and
def _sanitize_tool_output(self, v) returning str(v) so the test exercises real
method signatures; update both occurrences (lines around the current llm setup
and the similar block at 210-223) to instantiate the fake LLM instead of
MagicMock/AsyncMock.
- Around line 6-14: Tests are importing private helpers (_hf_tool_calling,
_tool_call_loop) directly; instead either re-export the required symbols
(accumulate_tool_call_chunk, convert_tools_to_hf_format,
extract_tool_calls_from_hf_response, finalize_pending_tool_calls,
run_tool_call_loop) from the package's public API (e.g., add them to
vision_agents.plugins.huggingface.__init__ or another public module) and update
the test imports to use that public path, or move the tests into the
package-private module directory so they can import the private modules locally;
ensure all references in tests use the public names rather than
`_hf_tool_calling`/`_tool_call_loop`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f40713a7-b283-4df3-8695-abd2a2d4a477
⛔ Files ignored due to path filters (1)
plugins/huggingface/examples/transformers/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
plugins/huggingface/examples/transformers/pyproject.tomlplugins/huggingface/examples/transformers/transformers_tool_calling_example.pyplugins/huggingface/tests/test_hf_tool_calling.pyplugins/huggingface/tests/test_huggingface.pyplugins/huggingface/tests/test_transformers_llm.pyplugins/huggingface/tests/test_transformers_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.pyplugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py
💤 Files with no reviewable changes (1)
- plugins/huggingface/examples/transformers/pyproject.toml
| request_kwargs: dict[str, Any] = { | ||
| "messages": messages, | ||
| "model": model_id, | ||
| "stream": stream, | ||
| } | ||
| if tools: | ||
| request_kwargs["tools"] = tools | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the file structure and locate the file
find . -name "_hf_tool_calling.py" -type fRepository: GetStream/Vision-Agents
Length of output: 142
🏁 Script executed:
# Read the full file to understand context
cat -n plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.pyRepository: GetStream/Vision-Agents
Length of output: 15340
🏁 Script executed:
# Also check the calling code in huggingface_llm.py
find . -name "huggingface_llm.py" -type f | head -5Repository: GetStream/Vision-Agents
Length of output: 141
🏁 Script executed:
# Check how create_hf_response is called from huggingface_llm.py
cat -n plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py | head -200Repository: GetStream/Vision-Agents
Length of output: 8480
🏁 Script executed:
# Let me also check the git history to see what the previous version looked like
cd plugins/huggingface && git log --oneline -20 vision_agents/plugins/huggingface/_hf_tool_calling.py 2>/dev/null || echo "No git history available"Repository: GetStream/Vision-Agents
Length of output: 100
🏁 Script executed:
# Check if there are other callers of create_hf_response
rg "create_hf_response" plugins/huggingface/vision_agents/plugins/huggingface/ -nRepository: GetStream/Vision-Agents
Length of output: 715
🏁 Script executed:
# Verify that the HuggingFace API accepts these kwargs by checking any documentation or usage in the codebase
rg "client\.chat\.completions\.create" plugins/huggingface/ -B 3 -A 3Repository: GetStream/Vision-Agents
Length of output: 1722
🏁 Script executed:
# Check if there are any tests that verify kwargs are passed through
find plugins/huggingface -name "*test*.py" -type f | head -10Repository: GetStream/Vision-Agents
Length of output: 321
🏁 Script executed:
# Check the test file for hf_tool_calling
cat -n plugins/huggingface/tests/test_hf_tool_calling.py | head -150Repository: GetStream/Vision-Agents
Length of output: 6443
🏁 Script executed:
# Also check if there are any tests that pass kwargs like temperature or max_tokens
rg "temperature|max_tokens" plugins/huggingface/tests/ -nRepository: GetStream/Vision-Agents
Length of output: 49
🏁 Script executed:
# Let's check if there's documentation or examples showing what kwargs the HuggingFace API accepts
rg "temperature|max_tokens|top_p" plugins/huggingface/ --type py -n | head -20Repository: GetStream/Vision-Agents
Length of output: 2011
🏁 Script executed:
# Check the caller to see what kwargs might be passed
cat -n plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py | sed -n '180,210p'Repository: GetStream/Vision-Agents
Length of output: 1211
🏁 Script executed:
# Let's verify the HuggingFace Inference Client API signature to confirm it accepts these kwargs
rg "class.*InferenceClient" plugins/huggingface -A 20 | head -30Repository: GetStream/Vision-Agents
Length of output: 49
🏁 Script executed:
# Also check if there are integration tests that might show kwargs being passed
cat -n plugins/huggingface/tests/test_huggingface.py | head -100Repository: GetStream/Vision-Agents
Length of output: 3466
Propagate caller kwargs into both HuggingFace completion requests.
The function accepts **kwargs but neither the initial request at lines 116–122 nor the follow-up request at lines 328–335 spread these kwargs into request_kwargs. This silently drops any caller-provided options like temperature, max_tokens, or provider-specific flags. Note that line 330 extracts only the model key from kwargs, confirming the intent to use them.
Suggested fix
request_kwargs: dict[str, Any] = {
+ **kwargs,
"messages": messages,
"model": model_id,
"stream": stream,
}Apply the same change at lines 328–335 in _generate_followup:
request_kwargs: dict[str, Any] = {
+ **kwargs,
"messages": current_messages,
"model": kwargs.get("model", model_id),
"stream": True,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| request_kwargs: dict[str, Any] = { | |
| "messages": messages, | |
| "model": model_id, | |
| "stream": stream, | |
| } | |
| if tools: | |
| request_kwargs["tools"] = tools | |
| request_kwargs: dict[str, Any] = { | |
| **kwargs, | |
| "messages": messages, | |
| "model": model_id, | |
| "stream": stream, | |
| } | |
| if tools: | |
| request_kwargs["tools"] = tools | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
around lines 116 - 123, The caller-provided kwargs are not being forwarded into
the HuggingFace request parameter dicts; update the request building in both
places where request_kwargs is constructed (the initial request in the function
and inside _generate_followup) to merge the incoming **kwargs into
request_kwargs (e.g., request_kwargs.update(kwargs) or by including **kwargs
when building the dict) so options like temperature/max_tokens/provider flags
are preserved; ensure you still explicitly set/override required keys like
"model" from kwargs when needed and keep the existing keys ("messages",
"stream", "tools") intact.
| for tc, res, err in triples: | ||
| cid = tc.get("id") | ||
| if not cid: | ||
| continue | ||
|
|
||
| assistant_tool_calls.append( | ||
| { | ||
| "id": cid, | ||
| "type": "function", | ||
| "function": { | ||
| "name": tc["name"], | ||
| "arguments": json.dumps(tc.get("arguments_json", {})), | ||
| }, | ||
| } | ||
| ) | ||
| tool_results.append( | ||
| { | ||
| "role": "tool", | ||
| "tool_call_id": cid, | ||
| "content": llm._sanitize_tool_output( | ||
| err if err is not None else res | ||
| ), | ||
| } | ||
| ) | ||
|
|
||
| if not tool_results: | ||
| return llm_response |
There was a problem hiding this comment.
Don't discard executed tool calls when the provider omits an id.
NormalizedToolCallItem.id is optional, but this loop skips every triple whose id is falsey. In that case the tool still runs, yet no assistant/tool messages are appended and the loop returns the previous empty response instead of requesting the follow-up.
🧩 Suggested fix
- for tc, res, err in triples:
- cid = tc.get("id")
- if not cid:
- continue
+ for call_index, (tc, res, err) in enumerate(triples):
+ cid = tc.get("id") or f"tool_call_{round_num}_{call_index}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`
around lines 95 - 121, The loop currently skips triples with a falsey
NormalizedToolCallItem.id so executed tools are dropped; instead, when cid is
missing generate a stable fallback id (e.g., using the loop index or uuid4) and
use that fallback for both assistant_tool_calls and tool_results so every
executed tool produces messages; remove the early continue and ensure the code
still uses tc["name"] and tc.get("arguments_json", {}) for the
assistant_tool_calls and llm._sanitize_tool_output(err if err is not None else
res) for tool_results, preserving the existing structures and returning
llm_response only after processing all triples.
| effective_model = kwargs.pop("model", self.model) | ||
| return await create_hf_response( | ||
| self, | ||
| self._client, | ||
| effective_model, | ||
| PLUGIN_NAME, | ||
| messages, | ||
| tools_param, | ||
| stream, | ||
| **kwargs, |
There was a problem hiding this comment.
Preserve caller kwargs when routing through create_hf_response().
The new helper currently builds its HF request payloads from messages, model, stream, and tools only, so this refactor drops every other create_response() kwarg on both the initial request and the tool-call follow-up path. Calls that relied on temperature, max_tokens, stop, or similar options will silently stop taking effect.
🧩 Suggested follow-up in `plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
- request_kwargs: dict[str, Any] = {
- "messages": messages,
- "model": model_id,
- "stream": stream,
- }
+ request_kwargs: dict[str, Any] = dict(kwargs)
+ request_kwargs.update(
+ {"messages": messages, "model": model_id, "stream": stream}
+ )Apply the same merge pattern in the follow-up request builder as well so tool-call rounds honor the same generation settings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py`
around lines 136 - 145, The refactor dropped caller-supplied generation kwargs
when routing through create_hf_response; update the call site in create_response
(where effective_model is computed) to pass through the remaining kwargs (e.g.,
temperature, max_tokens, stop) by merging kwargs into the payload passed to
create_hf_response instead of only sending messages/model/stream/tools; also
apply the same merge pattern in the tool-call follow-up builder in
_hf_tool_calling.py so the tool-call requests (in the function that builds
follow-up HF request payloads) include the original caller kwargs as well.
| response = await self.create_response(messages=messages) | ||
| latency_ms = (time.perf_counter() - request_start_time) * 1000 | ||
|
|
||
| try: | ||
| response = await self._client.chat.completions.create( | ||
| messages=messages, | ||
| model=self.model, | ||
| stream=True, | ||
| ) | ||
| except Exception as e: | ||
| logger.exception(f'Failed to get a response from the model "{self.model}"') | ||
| if response.exception is not None: | ||
| self.events.send( | ||
| events.LLMErrorEvent( | ||
| VLMErrorEvent( | ||
| plugin_name=PLUGIN_NAME, | ||
| error_message=str(e), | ||
| event_data=e, | ||
| inference_id=inference_id, | ||
| error=response.exception, | ||
| context="generation", | ||
| ) | ||
| ) | ||
| else: | ||
| self.events.send( | ||
| VLMErrorEvent( | ||
| VLMInferenceCompletedEvent( | ||
| plugin_name=PLUGIN_NAME, | ||
| inference_id=inference_id, | ||
| error=e, | ||
| context="api_request", | ||
| model=self.model, | ||
| text=response.text, | ||
| latency_ms=latency_ms, | ||
| frames_processed=frames_count, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Tool-call follow-up failures are currently reported as successful VLM completions.
This branch trusts response.exception, but the shared follow-up error path returns LLMResponseEvent(original=None, text="") without preserving the exception. A failed follow-up after tool execution will therefore fall into the success branch and emit VLMInferenceCompletedEvent with empty text.
🧩 Suggested follow-up in `plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`
- return LLMResponseEvent(original=None, text=""), []
+ return LLMResponseEvent(original=None, text="", exception=e), []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py`
around lines 143 - 165, The current success/failed branching trusts
response.exception but follow-up tool-call failures clear the exception and
return empty text, causing false VLMInferenceCompletedEvent emissions; fix by
ensuring follow-up errors propagate the exception (update the follow-up path in
_hf_tool_calling to set LLMResponseEvent.exception with the underlying error)
and make the caller here (create_response / this response handling block) treat
empty responses as failures too (change the conditional to check if
response.exception is not None OR response.text is falsy, and emit VLMErrorEvent
in that case; otherwise emit VLMInferenceCompletedEvent).
| if tools_param and result.text: | ||
| tool_calls = self._extract_tool_calls_from_text(result.text) | ||
| tool_calls = extract_tool_calls_from_text(result.text) | ||
| if tool_calls: | ||
| return await self._handle_tool_calls( | ||
| tool_calls, messages, tools_param, kwargs | ||
| ) | ||
| return await self._handle_tool_calls(tool_calls, messages, kwargs) | ||
|
|
There was a problem hiding this comment.
Don't re-enter create_response() from inside run_tool_call_loop.
Line 545 calls create_response() again, but Line 341 already routes parsed tool text back into _handle_tool_calls. A second tool round therefore nests a fresh run_tool_call_loop, effectively resetting the round cap, and it also drops the caller's current max_new_tokens / sampling settings because only kwargs are threaded through here.
Also applies to: 542-549
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`
around lines 340 - 344, Detected re-entry into create_response() from inside
run_tool_call_loop via _handle_tool_calls which restarts the tool-call loop and
loses caller sampling/kwargs; instead, change the flow so that when
extract_tool_calls_from_text(result.text) yields tool_calls, you do NOT call
create_response() again. Modify _handle_tool_calls (and any helpers it uses) to
return a structured continuation (e.g., updated messages and any tool results or
a flag) back to the current run_tool_call_loop so the existing loop resumes
handling tool calls and preserves the original kwargs and max_new_tokens; remove
any calls to create_response() from within _handle_tool_calls/run_tool_call_loop
and ensure _handle_tool_calls updates messages/state and returns control to the
caller for the next iteration.
| frames_snapshot = list(self._frame_buffer) | ||
| image_count = min(len(frames_snapshot), self._max_frames) | ||
|
|
||
| messages = self._build_messages() | ||
| image_content: list[dict[str, Any]] = [ | ||
| {"type": "image"} for _ in range(image_count) | ||
| ] | ||
| image_content.append({"type": "text", "text": text or "Describe what you see."}) | ||
| messages.append({"role": "user", "content": image_content}) |
There was a problem hiding this comment.
simple_response() now sends the current user turn twice.
Line 383 replays the conversation, which already contains the just-added plain-text user message, and Lines 225-229 append the same text again inside the multimodal payload. When frames_snapshot is empty this becomes a literal duplicate prompt on every call.
💡 Suggested fix
messages = self._build_messages()
+ if participant is None and messages and messages[-1]["role"] == "user":
+ messages.pop()
+
image_content: list[dict[str, Any]] = [
{"type": "image"} for _ in range(image_count)
]Also applies to: 383-390
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`
around lines 221 - 229, The current simple_response flow appends the user's
plain-text turn twice by adding it into image_content and then replaying the
conversation via _build_messages(); to fix, ensure the user text is only added
once: either stop appending the plain-text entry to image_content when the last
built message is already the same user turn, or add the multimodal image_content
as the single user message and avoid replaying/duplicating that same user text
in _build_messages(); locate the multimodal assembly in simple_response
(variables frames_snapshot, image_content, messages and method _build_messages)
and implement the conditional dedupe so the user text is not duplicated when
frames_snapshot is empty.
| try: | ||
| inputs = await asyncio.to_thread( | ||
| self._build_processor_inputs, messages, frames, tools_param | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 191
🏁 Script executed:
wc -l plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 144
🏁 Script executed:
sed -n '295,310p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 730
🏁 Script executed:
sed -n '365,385p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 776
🏁 Script executed:
sed -n '425,460p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 1501
🏁 Script executed:
sed -n '380,430p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 1926
🏁 Script executed:
sed -n '370,382p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 461
🏁 Script executed:
sed -n '295,380p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 3122
🏁 Script executed:
sed -n '433,468p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 1602
🏁 Script executed:
sed -n '238,280p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 1481
🏁 Script executed:
sed -n '393,435p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.pyRepository: GetStream/Vision-Agents
Length of output: 1616
🏁 Script executed:
cat -n plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py | sed -n '302,320p'Repository: GetStream/Vision-Agents
Length of output: 938
🏁 Script executed:
cat -n plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py | sed -n '297,310p'Repository: GetStream/Vision-Agents
Length of output: 781
Clear the tool flag when template fallback removes tools.
When _build_processor_inputs() retries apply_chat_template() without the tool schema (line 440), the model generates output without seeing any tools. However, line 374 still checks if tools_param and output_text: assuming tools were provided to the model. This can cause JSON-looking answers to be misclassified as tool calls even though no schema was ever shown.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`
around lines 302 - 305, The retry path in _build_processor_inputs that calls
apply_chat_template() without the tool schema must also clear the tools flag so
downstream logic doesn't treat the output as if tools were provided; update
_build_processor_inputs (where it retries without tools) to set tools_param =
None (or a clear boolean like tools_present=False) when it falls back, and
ensure the caller code that checks "if tools_param and output_text:" (or the
variable output_text) uses that updated flag so JSON-like model outputs aren't
misclassified as tool calls when no schema was shown.
| if tools_param and output_text: | ||
| tool_calls = extract_tool_calls_from_text(output_text) | ||
| if tool_calls: | ||
| return await self._handle_tool_calls( | ||
| tool_calls, messages, frames, kwargs | ||
| ) |
There was a problem hiding this comment.
Use a non-recursive follow-up generator inside the shared tool loop.
Line 490 calls create_response() from inside run_tool_call_loop, but Line 374 already auto-dispatches _handle_tool_calls when tool text is detected. That can spin up a new loop for each round, and it also loses the caller's current max_new_tokens / sampling settings because only kwargs are preserved here.
Also applies to: 487-494
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`
around lines 374 - 379, The auto-dispatch currently returns await
self._handle_tool_calls(...) when tool calls are detected, which causes
recursive loops and loses the caller's sampling settings (e.g., max_new_tokens);
change this to use the shared non-recursive follow-up generator instead of
awaiting _handle_tool_calls directly: in the branch that detects tool_calls
(where extract_tool_calls_from_text is used) return or yield a follow-up from
the central run_tool_call_loop (or a new helper like
_enqueue_tool_calls_followup) that calls the tool-handling logic non-recursively
and forwards the original kwargs (preserving max_new_tokens and sampling
parameters) so run_tool_call_loop/create_response are not re-entered
recursively; also remove any direct create_response() calls from inside
run_tool_call_loop and refactor _handle_tool_calls to be invoked by the
follow-up orchestrator rather than awaited inline.
Summary by CodeRabbit
New Features
Tests
Chores