Refactor hf tool calling by maxkahan · Pull Request #455 · GetStream/Vision-Agents

maxkahan · 2026-03-26T18:00:20Z

Refactor all the tool calling in the HuggingFace plugin
Adds the tool calling to the VLM transformers vlm
Abstracts out the calling to a separate module

Summary by CodeRabbit

New Features
- Enabled tool calling functionality for HuggingFace Transformers-based language and vision models.
- Added example script demonstrating tool calling with local models.
Tests
- Added comprehensive test coverage for tool calling features including schema conversion and tool execution.
Chores
- Removed unused plugin dependencies from project configuration.

coderabbitai · 2026-03-26T18:00:41Z

📝 Walkthrough

Walkthrough

This pull request refactors HuggingFace and Transformers LLM/VLM tool-calling functionality by extracting shared utilities into dedicated modules, removing unnecessary dependencies from the example project configuration, and adding comprehensive test coverage and example scripts for tool-calling workflows.

Changes

Cohort / File(s)	Summary
Configuration & Dependencies `plugins/huggingface/examples/transformers/pyproject.toml`	Removed `vision-agents-plugins-getstream` and `vision-agents-plugins-deepgram` from project dependencies and tool.uv.sources.
Shared Tool-Calling Utilities `plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`, `plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`	Introduced new shared modules: `_hf_tool_calling.py` provides streaming/non-streaming HuggingFace Inference API integration with tool execution; `_tool_call_loop.py` centralizes tool format conversion and multi-round tool execution orchestration.
LLM Implementations `plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py`, `plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`	Refactored to delegate tool-calling logic to shared utilities; removed ~400 lines of local implementation; added `extract_tool_calls_from_text` parser; updated type annotations to use built-in generics.
VLM Implementations `plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py`, `plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`	Integrated tool-calling support via shared utilities; introduced `create_response` method centralizing response generation; added tool-aware processor input building with fallback handling; refactored `simple_response` to emit appropriate events.
Tests & Examples `plugins/huggingface/examples/transformers/transformers_tool_calling_example.py`, `plugins/huggingface/tests/test_hf_tool_calling.py`, `plugins/huggingface/tests/test_huggingface.py`, `plugins/huggingface/tests/test_transformers_llm.py`, `plugins/huggingface/tests/test_transformers_vlm.py`	Added new example script demonstrating tool calling with `TransformersLLM`; added comprehensive test module for shared tool-calling utilities; updated existing tests to cover tool-calling workflows, streaming behavior, and tool execution loops for both LLM and VLM variants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

add huggingface transformers plugin #350: Adds vision-agents-plugins-getstream and vision-agents-plugins-deepgram as dependencies in the same HuggingFace Transformers example project—inverse operation of the dependency removal in this PR.

Suggested labels

plugins, config, dependencies

Suggested reviewers

aliev

Poem

The tool calls extract themselves,
scattered into clean corridors of shared utility,
each delegation a small death,
each refactoring a birth. The mouth speaks
in follow-ups now, algorithmic and complete—
the old code bleeds into the new.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 39.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Refactor hf tool calling' directly relates to the main objective of the PR, which is to refactor HuggingFace tool calling and abstract it into separate modules.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor-hf-tool-calling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nash0x7E2 · 2026-03-26T18:04:29Z

plugins/huggingface/examples/transformers/transformers_tool_calling_example.py

+    model_id = "Qwen/Qwen2.5-1.5B-Instruct"
+    logger.info(f"Loading model: {model_id}")
+
+    llm = TransformersLLM(


Why not show this as an example with the Agent class?

For consistency with our examples as well

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (2)

plugins/huggingface/tests/test_hf_tool_calling.py (2)
139-164: Prefer a tiny fake LLM over MagicMock/AsyncMock in these loop tests.

The contract under test is small, and a fake object with real methods would catch signature drift that these loose mocks will not.

As per coding guidelines, "Never mock in tests; use pytest for testing."

Also applies to: 210-223
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 139 - 164,
The test uses MagicMock/AsyncMock for llm (including llm._dedup_and_execute and
llm._sanitize_tool_output) which can hide signature drift; replace this with a
tiny fake LLM class that implements async def _dedup_and_execute(self, ...)
returning the two planned side-effect tuples and def _sanitize_tool_output(self,
v) returning str(v) so the test exercises real method signatures; update both
occurrences (lines around the current llm setup and the similar block at
210-223) to instantiate the fake LLM instead of MagicMock/AsyncMock.
6-14: Avoid importing private helper modules directly from tests.

These imports make internal module names part of the test contract, so a helper move or rename becomes a test-breaking API change. Please re-export the supported helpers or colocate these tests with the package-private modules instead of importing _foo modules from the external test package.

As per coding guidelines, "Never import from private modules (_foo) outside of the package's own __init__.py. Use the public re-export instead."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_hf_tool_calling.py` around lines 6 - 14, Tests
are importing private helpers (_hf_tool_calling, _tool_call_loop) directly;
instead either re-export the required symbols (accumulate_tool_call_chunk,
convert_tools_to_hf_format, extract_tool_calls_from_hf_response,
finalize_pending_tool_calls, run_tool_call_loop) from the package's public API
(e.g., add them to vision_agents.plugins.huggingface.__init__ or another public
module) and update the test imports to use that public path, or move the tests
into the package-private module directory so they can import the private modules
locally; ensure all references in tests use the public names rather than
`_hf_tool_calling`/`_tool_call_loop`.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`:
- Around line 116-123: The caller-provided kwargs are not being forwarded into
the HuggingFace request parameter dicts; update the request building in both
places where request_kwargs is constructed (the initial request in the function
and inside _generate_followup) to merge the incoming **kwargs into
request_kwargs (e.g., request_kwargs.update(kwargs) or by including **kwargs
when building the dict) so options like temperature/max_tokens/provider flags
are preserved; ensure you still explicitly set/override required keys like
"model" from kwargs when needed and keep the existing keys ("messages",
"stream", "tools") intact.

In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py`:
- Around line 95-121: The loop currently skips triples with a falsey
NormalizedToolCallItem.id so executed tools are dropped; instead, when cid is
missing generate a stable fallback id (e.g., using the loop index or uuid4) and
use that fallback for both assistant_tool_calls and tool_results so every
executed tool produces messages; remove the early continue and ensure the code
still uses tc["name"] and tc.get("arguments_json", {}) for the
assistant_tool_calls and llm._sanitize_tool_output(err if err is not None else
res) for tool_results, preserving the existing structures and returning
llm_response only after processing all triples.

In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py`:
- Around line 136-145: The refactor dropped caller-supplied generation kwargs
when routing through create_hf_response; update the call site in create_response
(where effective_model is computed) to pass through the remaining kwargs (e.g.,
temperature, max_tokens, stop) by merging kwargs into the payload passed to
create_hf_response instead of only sending messages/model/stream/tools; also
apply the same merge pattern in the tool-call follow-up builder in
_hf_tool_calling.py so the tool-call requests (in the function that builds
follow-up HF request payloads) include the original caller kwargs as well.

In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py`:
- Around line 143-165: The current success/failed branching trusts
response.exception but follow-up tool-call failures clear the exception and
return empty text, causing false VLMInferenceCompletedEvent emissions; fix by
ensuring follow-up errors propagate the exception (update the follow-up path in
_hf_tool_calling to set LLMResponseEvent.exception with the underlying error)
and make the caller here (create_response / this response handling block) treat
empty responses as failures too (change the conditional to check if
response.exception is not None OR response.text is falsy, and emit VLMErrorEvent
in that case; otherwise emit VLMInferenceCompletedEvent).

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`:
- Around line 340-344: Detected re-entry into create_response() from inside
run_tool_call_loop via _handle_tool_calls which restarts the tool-call loop and
loses caller sampling/kwargs; instead, change the flow so that when
extract_tool_calls_from_text(result.text) yields tool_calls, you do NOT call
create_response() again. Modify _handle_tool_calls (and any helpers it uses) to
return a structured continuation (e.g., updated messages and any tool results or
a flag) back to the current run_tool_call_loop so the existing loop resumes
handling tool calls and preserves the original kwargs and max_new_tokens; remove
any calls to create_response() from within _handle_tool_calls/run_tool_call_loop
and ensure _handle_tool_calls updates messages/state and returns control to the
caller for the next iteration.

In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py`:
- Around line 221-229: The current simple_response flow appends the user's
plain-text turn twice by adding it into image_content and then replaying the
conversation via _build_messages(); to fix, ensure the user text is only added
once: either stop appending the plain-text entry to image_content when the last
built message is already the same user turn, or add the multimodal image_content
as the single user message and avoid replaying/duplicating that same user text
in _build_messages(); locate the multimodal assembly in simple_response
(variables frames_snapshot, image_content, messages and method _build_messages)
and implement the conditional dedupe so the user text is not duplicated when
frames_snapshot is empty.
- Around line 374-379: The auto-dispatch currently returns await
self._handle_tool_calls(...) when tool calls are detected, which causes
recursive loops and loses the caller's sampling settings (e.g., max_new_tokens);
change this to use the shared non-recursive follow-up generator instead of
awaiting _handle_tool_calls directly: in the branch that detects tool_calls
(where extract_tool_calls_from_text is used) return or yield a follow-up from
the central run_tool_call_loop (or a new helper like
_enqueue_tool_calls_followup) that calls the tool-handling logic non-recursively
and forwards the original kwargs (preserving max_new_tokens and sampling
parameters) so run_tool_call_loop/create_response are not re-entered
recursively; also remove any direct create_response() calls from inside
run_tool_call_loop and refactor _handle_tool_calls to be invoked by the
follow-up orchestrator rather than awaited inline.
- Around line 302-305: The retry path in _build_processor_inputs that calls
apply_chat_template() without the tool schema must also clear the tools flag so
downstream logic doesn't treat the output as if tools were provided; update
_build_processor_inputs (where it retries without tools) to set tools_param =
None (or a clear boolean like tools_present=False) when it falls back, and
ensure the caller code that checks "if tools_param and output_text:" (or the
variable output_text) uses that updated flag so JSON-like model outputs aren't
misclassified as tool calls when no schema was shown.

---

Nitpick comments:
In `@plugins/huggingface/tests/test_hf_tool_calling.py`:
- Around line 139-164: The test uses MagicMock/AsyncMock for llm (including
llm._dedup_and_execute and llm._sanitize_tool_output) which can hide signature
drift; replace this with a tiny fake LLM class that implements async def
_dedup_and_execute(self, ...) returning the two planned side-effect tuples and
def _sanitize_tool_output(self, v) returning str(v) so the test exercises real
method signatures; update both occurrences (lines around the current llm setup
and the similar block at 210-223) to instantiate the fake LLM instead of
MagicMock/AsyncMock.
- Around line 6-14: Tests are importing private helpers (_hf_tool_calling,
_tool_call_loop) directly; instead either re-export the required symbols
(accumulate_tool_call_chunk, convert_tools_to_hf_format,
extract_tool_calls_from_hf_response, finalize_pending_tool_calls,
run_tool_call_loop) from the package's public API (e.g., add them to
vision_agents.plugins.huggingface.__init__ or another public module) and update
the test imports to use that public path, or move the tests into the
package-private module directory so they can import the private modules locally;
ensure all references in tests use the public names rather than
`_hf_tool_calling`/`_tool_call_loop`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f40713a7-b283-4df3-8695-abd2a2d4a477

📥 Commits

Reviewing files that changed from the base of the PR and between 072ad74 and a740a8d.

⛔ Files ignored due to path filters (1)

plugins/huggingface/examples/transformers/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

plugins/huggingface/examples/transformers/pyproject.toml
plugins/huggingface/examples/transformers/transformers_tool_calling_example.py
plugins/huggingface/tests/test_hf_tool_calling.py
plugins/huggingface/tests/test_huggingface.py
plugins/huggingface/tests/test_transformers_llm.py
plugins/huggingface/tests/test_transformers_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py
plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

💤 Files with no reviewable changes (1)

plugins/huggingface/examples/transformers/pyproject.toml

coderabbitai · 2026-03-26T18:14:44Z

plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py

+    request_kwargs: dict[str, Any] = {
+        "messages": messages,
+        "model": model_id,
+        "stream": stream,
+    }
+    if tools:
+        request_kwargs["tools"] = tools
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's see the file structure and locate the file find . -name "_hf_tool_calling.py" -type f

Repository: GetStream/Vision-Agents

Length of output: 142

🏁 Script executed:

# Read the full file to understand context cat -n plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py

Repository: GetStream/Vision-Agents

Length of output: 15340

🏁 Script executed:

# Also check the calling code in huggingface_llm.py find . -name "huggingface_llm.py" -type f | head -5

Repository: GetStream/Vision-Agents

Length of output: 141

🏁 Script executed:

# Check how create_hf_response is called from huggingface_llm.py cat -n plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py | head -200

Repository: GetStream/Vision-Agents

Length of output: 8480

🏁 Script executed:

# Let me also check the git history to see what the previous version looked like cd plugins/huggingface && git log --oneline -20 vision_agents/plugins/huggingface/_hf_tool_calling.py 2>/dev/null || echo "No git history available"

Repository: GetStream/Vision-Agents

Length of output: 100

🏁 Script executed:

# Check if there are other callers of create_hf_response rg "create_hf_response" plugins/huggingface/vision_agents/plugins/huggingface/ -n

Repository: GetStream/Vision-Agents

Length of output: 715

🏁 Script executed:

# Verify that the HuggingFace API accepts these kwargs by checking any documentation or usage in the codebase rg "client\.chat\.completions\.create" plugins/huggingface/ -B 3 -A 3

Repository: GetStream/Vision-Agents

Length of output: 1722

🏁 Script executed:

# Check if there are any tests that verify kwargs are passed through find plugins/huggingface -name "*test*.py" -type f | head -10

Repository: GetStream/Vision-Agents

Length of output: 321

🏁 Script executed:

# Check the test file for hf_tool_calling cat -n plugins/huggingface/tests/test_hf_tool_calling.py | head -150

Repository: GetStream/Vision-Agents

Length of output: 6443

🏁 Script executed:

# Also check if there are any tests that pass kwargs like temperature or max_tokens rg "temperature|max_tokens" plugins/huggingface/tests/ -n

Repository: GetStream/Vision-Agents

Length of output: 49

🏁 Script executed:

# Let's check if there's documentation or examples showing what kwargs the HuggingFace API accepts rg "temperature|max_tokens|top_p" plugins/huggingface/ --type py -n | head -20

Repository: GetStream/Vision-Agents

Length of output: 2011

🏁 Script executed:

# Check the caller to see what kwargs might be passed cat -n plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py | sed -n '180,210p'

Repository: GetStream/Vision-Agents

Length of output: 1211

🏁 Script executed:

# Let's verify the HuggingFace Inference Client API signature to confirm it accepts these kwargs rg "class.*InferenceClient" plugins/huggingface -A 20 | head -30

Repository: GetStream/Vision-Agents

Length of output: 49

🏁 Script executed:

# Also check if there are integration tests that might show kwargs being passed cat -n plugins/huggingface/tests/test_huggingface.py | head -100

Repository: GetStream/Vision-Agents

Length of output: 3466

Propagate caller kwargs into both HuggingFace completion requests.

The function accepts **kwargs but neither the initial request at lines 116–122 nor the follow-up request at lines 328–335 spread these kwargs into request_kwargs. This silently drops any caller-provided options like temperature, max_tokens, or provider-specific flags. Note that line 330 extracts only the model key from kwargs, confirming the intent to use them.

Suggested fix

request_kwargs: dict[str, Any] = { + **kwargs, "messages": messages, "model": model_id, "stream": stream, }

Apply the same change at lines 328–335 in _generate_followup:

request_kwargs: dict[str, Any] = { + **kwargs, "messages": current_messages, "model": kwargs.get("model", model_id), "stream": True, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

request_kwargs: dict[str, Any] = {

"messages": messages,

"model": model_id,

"stream": stream,

}

if tools:

request_kwargs["tools"] = tools

request_kwargs: dict[str, Any] = {

**kwargs,

"messages": messages,

"model": model_id,

"stream": stream,

}

if tools:

request_kwargs["tools"] = tools

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py` around lines 116 - 123, The caller-provided kwargs are not being forwarded into the HuggingFace request parameter dicts; update the request building in both places where request_kwargs is constructed (the initial request in the function and inside _generate_followup) to merge the incoming **kwargs into request_kwargs (e.g., request_kwargs.update(kwargs) or by including **kwargs when building the dict) so options like temperature/max_tokens/provider flags are preserved; ensure you still explicitly set/override required keys like "model" from kwargs when needed and keep the existing keys ("messages", "stream", "tools") intact.

coderabbitai · 2026-03-26T18:14:44Z

plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py

+        for tc, res, err in triples:
+            cid = tc.get("id")
+            if not cid:
+                continue
+
+            assistant_tool_calls.append(
+                {
+                    "id": cid,
+                    "type": "function",
+                    "function": {
+                        "name": tc["name"],
+                        "arguments": json.dumps(tc.get("arguments_json", {})),
+                    },
+                }
+            )
+            tool_results.append(
+                {
+                    "role": "tool",
+                    "tool_call_id": cid,
+                    "content": llm._sanitize_tool_output(
+                        err if err is not None else res
+                    ),
+                }
+            )
+
+        if not tool_results:
+            return llm_response


⚠️ Potential issue | 🟠 Major

Don't discard executed tool calls when the provider omits an id.

NormalizedToolCallItem.id is optional, but this loop skips every triple whose id is falsey. In that case the tool still runs, yet no assistant/tool messages are appended and the loop returns the previous empty response instead of requesting the follow-up.

🧩 Suggested fix

- for tc, res, err in triples: - cid = tc.get("id") - if not cid: - continue + for call_index, (tc, res, err) in enumerate(triples): + cid = tc.get("id") or f"tool_call_{round_num}_{call_index}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/_tool_call_loop.py` around lines 95 - 121, The loop currently skips triples with a falsey NormalizedToolCallItem.id so executed tools are dropped; instead, when cid is missing generate a stable fallback id (e.g., using the loop index or uuid4) and use that fallback for both assistant_tool_calls and tool_results so every executed tool produces messages; remove the early continue and ensure the code still uses tc["name"] and tc.get("arguments_json", {}) for the assistant_tool_calls and llm._sanitize_tool_output(err if err is not None else res) for tool_results, preserving the existing structures and returning llm_response only after processing all triples.

coderabbitai · 2026-03-26T18:14:44Z

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py

+        effective_model = kwargs.pop("model", self.model)
+        return await create_hf_response(
+            self,
+            self._client,
+            effective_model,
+            PLUGIN_NAME,
+            messages,
+            tools_param,
+            stream,
            **kwargs,


⚠️ Potential issue | 🟠 Major

Preserve caller kwargs when routing through create_hf_response().

The new helper currently builds its HF request payloads from messages, model, stream, and tools only, so this refactor drops every other create_response() kwarg on both the initial request and the tool-call follow-up path. Calls that relied on temperature, max_tokens, stop, or similar options will silently stop taking effect.

🧩 Suggested follow-up in `plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`

- request_kwargs: dict[str, Any] = { - "messages": messages, - "model": model_id, - "stream": stream, - } + request_kwargs: dict[str, Any] = dict(kwargs) + request_kwargs.update( + {"messages": messages, "model": model_id, "stream": stream} + )

Apply the same merge pattern in the follow-up request builder as well so tool-call rounds honor the same generation settings.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_llm.py` around lines 136 - 145, The refactor dropped caller-supplied generation kwargs when routing through create_hf_response; update the call site in create_response (where effective_model is computed) to pass through the remaining kwargs (e.g., temperature, max_tokens, stop) by merging kwargs into the payload passed to create_hf_response instead of only sending messages/model/stream/tools; also apply the same merge pattern in the tool-call follow-up builder in _hf_tool_calling.py so the tool-call requests (in the function that builds follow-up HF request payloads) include the original caller kwargs as well.

coderabbitai · 2026-03-26T18:14:44Z

plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py

+        response = await self.create_response(messages=messages)
+        latency_ms = (time.perf_counter() - request_start_time) * 1000

-        try:
-            response = await self._client.chat.completions.create(
-                messages=messages,
-                model=self.model,
-                stream=True,
-            )
-        except Exception as e:
-            logger.exception(f'Failed to get a response from the model "{self.model}"')
+        if response.exception is not None:
            self.events.send(
-                events.LLMErrorEvent(
+                VLMErrorEvent(
                    plugin_name=PLUGIN_NAME,
-                    error_message=str(e),
-                    event_data=e,
+                    inference_id=inference_id,
+                    error=response.exception,
+                    context="generation",
                )
            )
+        else:
            self.events.send(
-                VLMErrorEvent(
+                VLMInferenceCompletedEvent(
                    plugin_name=PLUGIN_NAME,
                    inference_id=inference_id,
-                    error=e,
-                    context="api_request",
+                    model=self.model,
+                    text=response.text,
+                    latency_ms=latency_ms,
+                    frames_processed=frames_count,
                )
            )


⚠️ Potential issue | 🟠 Major

Tool-call follow-up failures are currently reported as successful VLM completions.

This branch trusts response.exception, but the shared follow-up error path returns LLMResponseEvent(original=None, text="") without preserving the exception. A failed follow-up after tool execution will therefore fall into the success branch and emit VLMInferenceCompletedEvent with empty text.

🧩 Suggested follow-up in `plugins/huggingface/vision_agents/plugins/huggingface/_hf_tool_calling.py`

- return LLMResponseEvent(original=None, text=""), [] + return LLMResponseEvent(original=None, text="", exception=e), []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/huggingface_vlm.py` around lines 143 - 165, The current success/failed branching trusts response.exception but follow-up tool-call failures clear the exception and return empty text, causing false VLMInferenceCompletedEvent emissions; fix by ensuring follow-up errors propagate the exception (update the follow-up path in _hf_tool_calling to set LLMResponseEvent.exception with the underlying error) and make the caller here (create_response / this response handling block) treat empty responses as failures too (change the conditional to check if response.exception is not None OR response.text is falsy, and emit VLMErrorEvent in that case; otherwise emit VLMInferenceCompletedEvent).

coderabbitai · 2026-03-26T18:14:44Z

plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py

        if tools_param and result.text:
-            tool_calls = self._extract_tool_calls_from_text(result.text)
+            tool_calls = extract_tool_calls_from_text(result.text)
            if tool_calls:
-                return await self._handle_tool_calls(
-                    tool_calls, messages, tools_param, kwargs
-                )
+                return await self._handle_tool_calls(tool_calls, messages, kwargs)



⚠️ Potential issue | 🔴 Critical

Don't re-enter create_response() from inside run_tool_call_loop.

Line 545 calls create_response() again, but Line 341 already routes parsed tool text back into _handle_tool_calls. A second tool round therefore nests a fresh run_tool_call_loop, effectively resetting the round cap, and it also drops the caller's current max_new_tokens / sampling settings because only kwargs are threaded through here.

Also applies to: 542-549

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py` around lines 340 - 344, Detected re-entry into create_response() from inside run_tool_call_loop via _handle_tool_calls which restarts the tool-call loop and loses caller sampling/kwargs; instead, change the flow so that when extract_tool_calls_from_text(result.text) yields tool_calls, you do NOT call create_response() again. Modify _handle_tool_calls (and any helpers it uses) to return a structured continuation (e.g., updated messages and any tool results or a flag) back to the current run_tool_call_loop so the existing loop resumes handling tool calls and preserves the original kwargs and max_new_tokens; remove any calls to create_response() from within _handle_tool_calls/run_tool_call_loop and ensure _handle_tool_calls updates messages/state and returns control to the caller for the next iteration.

coderabbitai · 2026-03-26T18:14:44Z

plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

+        frames_snapshot = list(self._frame_buffer)
+        image_count = min(len(frames_snapshot), self._max_frames)
+
+        messages = self._build_messages()
+        image_content: list[dict[str, Any]] = [
+            {"type": "image"} for _ in range(image_count)
+        ]
+        image_content.append({"type": "text", "text": text or "Describe what you see."})
+        messages.append({"role": "user", "content": image_content})


⚠️ Potential issue | 🟠 Major

simple_response() now sends the current user turn twice.

Line 383 replays the conversation, which already contains the just-added plain-text user message, and Lines 225-229 append the same text again inside the multimodal payload. When frames_snapshot is empty this becomes a literal duplicate prompt on every call.

💡 Suggested fix

messages = self._build_messages() + if participant is None and messages and messages[-1]["role"] == "user": + messages.pop() + image_content: list[dict[str, Any]] = [ {"type": "image"} for _ in range(image_count) ]

Also applies to: 383-390

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py` around lines 221 - 229, The current simple_response flow appends the user's plain-text turn twice by adding it into image_content and then replaying the conversation via _build_messages(); to fix, ensure the user text is only added once: either stop appending the plain-text entry to image_content when the last built message is already the same user turn, or add the multimodal image_content as the single user message and avoid replaying/duplicating that same user text in _build_messages(); locate the multimodal assembly in simple_response (variables frames_snapshot, image_content, messages and method _build_messages) and implement the conditional dedupe so the user text is not duplicated when frames_snapshot is empty.

coderabbitai · 2026-03-26T18:14:45Z

plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

+        try:
+            inputs = await asyncio.to_thread(
+                self._build_processor_inputs, messages, frames, tools_param
+            )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 191

🏁 Script executed:

wc -l plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 144

🏁 Script executed:

sed -n '295,310p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 730

🏁 Script executed:

sed -n '365,385p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 776

🏁 Script executed:

sed -n '425,460p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 1501

🏁 Script executed:

sed -n '380,430p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 1926

🏁 Script executed:

sed -n '370,382p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 461

🏁 Script executed:

sed -n '295,380p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 3122

🏁 Script executed:

sed -n '433,468p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 1602

🏁 Script executed:

sed -n '238,280p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 1481

🏁 Script executed:

sed -n '393,435p' plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

Repository: GetStream/Vision-Agents

Length of output: 1616

🏁 Script executed:

cat -n plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py | sed -n '302,320p'

Repository: GetStream/Vision-Agents

Length of output: 938

🏁 Script executed:

cat -n plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py | sed -n '297,310p'

Repository: GetStream/Vision-Agents

Length of output: 781

Clear the tool flag when template fallback removes tools.

When _build_processor_inputs() retries apply_chat_template() without the tool schema (line 440), the model generates output without seeing any tools. However, line 374 still checks if tools_param and output_text: assuming tools were provided to the model. This can cause JSON-looking answers to be misclassified as tool calls even though no schema was ever shown.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py` around lines 302 - 305, The retry path in _build_processor_inputs that calls apply_chat_template() without the tool schema must also clear the tools flag so downstream logic doesn't treat the output as if tools were provided; update _build_processor_inputs (where it retries without tools) to set tools_param = None (or a clear boolean like tools_present=False) when it falls back, and ensure the caller code that checks "if tools_param and output_text:" (or the variable output_text) uses that updated flag so JSON-like model outputs aren't misclassified as tool calls when no schema was shown.

coderabbitai · 2026-03-26T18:14:45Z

plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py

+        if tools_param and output_text:
+            tool_calls = extract_tool_calls_from_text(output_text)
+            if tool_calls:
+                return await self._handle_tool_calls(
+                    tool_calls, messages, frames, kwargs
+                )


⚠️ Potential issue | 🔴 Critical

Use a non-recursive follow-up generator inside the shared tool loop.

Line 490 calls create_response() from inside run_tool_call_loop, but Line 374 already auto-dispatches _handle_tool_calls when tool text is detected. That can spin up a new loop for each round, and it also loses the caller's current max_new_tokens / sampling settings because only kwargs are preserved here.

Also applies to: 487-494

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_vlm.py` around lines 374 - 379, The auto-dispatch currently returns await self._handle_tool_calls(...) when tool calls are detected, which causes recursive loops and loses the caller's sampling settings (e.g., max_new_tokens); change this to use the shared non-recursive follow-up generator instead of awaiting _handle_tool_calls directly: in the branch that detects tool_calls (where extract_tool_calls_from_text is used) return or yield a follow-up from the central run_tool_call_loop (or a new helper like _enqueue_tool_calls_followup) that calls the tool-handling logic non-recursively and forwards the original kwargs (preserving max_new_tokens and sampling parameters) so run_tool_call_loop/create_response are not re-entered recursively; also remove any direct create_response() calls from inside run_tool_call_loop and refactor _handle_tool_calls to be invoked by the follow-up orchestrator rather than awaited inline.

maxkahan added 3 commits March 26, 2026 13:26

move hf tool calling into separate module

c3688d5

remove unnecessary test mocks

74f9d4f

refactor function calling

a740a8d

github-actions bot added dependencies plugins config labels Mar 26, 2026

Nash0x7E2 assigned maxkahan Mar 26, 2026

Nash0x7E2 self-requested a review March 26, 2026 18:03

Nash0x7E2 requested changes Mar 26, 2026

View reviewed changes

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

Conversation

maxkahan commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Nash0x7E2 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Nash0x7E2 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxkahan commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading