feat: computer use plugin by d3xvn · Pull Request #411 · GetStream/Vision-Agents

d3xvn · 2026-03-11T11:46:09Z

Adds a new computer-use plugin that lets any VLM control the user's desktop via screen share. Tools include click, double_click, type_text, key_press, scroll, mouse_move, and open_path, all backed by PyAutoGUI.
Includes a GridOverlayProcessor that draws a labeled grid (default 15x15, columns A-O, rows 1-15) on screen share frames, so the model can target UI elements by cell reference (e.g. cell="H8") with optional sub-cell positioning (position="top-right"). Grid dimensions are customizable via cols/rows params.
Registration follows existing codebase patterns — a plain computer_use.register(llm) function for tools and computer_use.GridOverlayProcessor(fps=2) for the processor.
Adds a full example (examples/10_computer_use_example/) using Gemini Realtime with the grid overlay.

Summary by CodeRabbit

New Features
- Added Computer Use plugin for desktop automation (click, double-click, type, key press, scroll, move, open)
- Grid-based overlay and cell-targeting for precise on-screen actions
- Desktop assistant example demonstrating real-time screen control and call workflow
Documentation
- Plugin README and example guide with setup, prerequisites, actions, and safety notes
Tests
- Comprehensive tests for grid behavior, tool schemas, and action execution
Chores
- Ignored example .env files; workspace/project config updated to include the new plugin

New plugin that registers desktop automation tools (click, double_click, type_text, key_press, scroll, mouse_move, open_path) on any LLM via the standard FunctionRegistry. Backed by pyautogui and designed to work with Realtime models that receive screen-share frames. Made-with: Cursor

coderabbitai · 2026-03-11T11:46:15Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a Computer Use plugin and example: a configurable grid overlay on shared video frames, LLM tool registration for grid-targeted desktop actions, pyautogui-backed action implementations, tests, docs, and workspace/pyproject updates to include the plugin and example.

Changes

Cohort / File(s)	Summary
Example Application `examples/10_computer_use_example/.gitignore`, `examples/10_computer_use_example/README.md`, `examples/10_computer_use_example/computer_use_example.py`, `examples/10_computer_use_example/instructions.md`, `examples/10_computer_use_example/pyproject.toml`	Adds a runnable desktop-assistant example: README, instructions, CLI runner, .env ignore entry, and project metadata with local workspace sources.
Plugin Core Implementation `plugins/computer_use/vision_agents/plugins/computer_use/_grid.py`, `plugins/computer_use/vision_agents/plugins/computer_use/_processor.py`, `plugins/computer_use/vision_agents/plugins/computer_use/_actions.py`, `plugins/computer_use/vision_agents/plugins/computer_use/_toolkit.py`, `plugins/computer_use/vision_agents/plugins/computer_use/__init__.py`	Implements Grid (parsing, cell->virtual coords, draw_overlay), GridOverlayProcessor (frame overlay and forwarding), pyautogui-backed async actions and helpers, LLM registration of tools, and package exports.
Plugin Packaging & Docs `plugins/computer_use/README.md`, `plugins/computer_use/pyproject.toml`	Adds plugin README and pyproject configuration (build metadata, deps, workspace integration, packaging).
Tests `plugins/computer_use/tests/test_computer_use.py`	Adds unit/integration tests for Grid validation/labels, tool registration and schemas, and action integration expectations.
Workspace Integration `pyproject.toml`, `agents-core/pyproject.toml`	Updates workspace and agents-core pyproject entries to include the new computer-use plugin as a workspace source and dependency.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent
    participant LLM as LLM
    participant Grid as Grid
    participant Processor as GridOverlayProcessor
    participant Track as VideoTrack
    participant PyAutoGUI as PyAutoGUI

    Agent->>LLM: register computer-use tools (cols, rows)
    LLM->>Grid: instantiate/configure grid
    Agent->>Processor: process_video(track, shared_forwarder?)
    Track->>Processor: deliver raw video frame
    Processor->>Grid: draw_overlay(frame)
    Grid-->>Processor: annotated frame
    Processor->>Track: enqueue annotated frame
    Agent->>LLM: request action (e.g., click cell="C2")
    LLM-->>Agent: selected tool + params
    Agent->>PyAutoGUI: execute action at computed screen coords
    PyAutoGUI-->>Agent: result/confirmation

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I chart the grid in ink that is a heat,
small letters like teeth along the glass,
a hand unmoving learns to be a lever, click,
the cursor penetrates the soft and empty room,
and silence types the answer to itself.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title "feat: computer use plugin" clearly and concisely summarizes the main change—adding a new computer-use plugin for desktop control via VLM.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/computer-use-plugin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…istration - Replace x/y virtual coordinates with cell-based targeting (e.g. "H8") - Add configurable Grid class with cols/rows params (default 15x15) - Add GridOverlayProcessor to draw labeled grid on screen share frames - Add sub-cell positioning (top-left, center, bottom-right, etc.) - Replace ComputerUseToolkit class with plain `register(llm)` function - Add computer use example with Gemini Realtime + grid overlay - Update README with new API and grid documentation Made-with: Cursor

Made-with: Cursor

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (1)

plugins/computer_use/tests/test_computer_use.py (1)
7-11: Test the public package surface, not _actions directly.

This couples the tests to a private module path and turns internal refactors into test breakage. If these helpers are meant to be supported, re-export them from vision_agents.plugins.computer_use; otherwise, exercise them through register() / FunctionRegistry instead.

As per coding guidelines, "Never import from private modules (_foo) outside of the package's own __init__.py. Use the public re-export instead."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/tests/test_computer_use.py` around lines 7 - 11, Tests
are importing private helpers (key_press, make_grid_actions, type_text) from
vision_agents.plugins.computer_use._actions; replace those direct imports by
using the package's public surface—either import the re-exported symbols from
vision_agents.plugins.computer_use (if the package __init__ exposes
key_press/make_grid_actions/type_text) or exercise the behavior via the public
registration API (call register() / retrieve implementations through
FunctionRegistry) so tests target the public API rather than the _actions
module.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/10_computer_use_example/instructions.md`:
- Line 5: The Critical Rule's tool list is missing the double_click tool; update
the sentence that currently enumerates click, mouse_move, type_text, key_press,
scroll, open_path to also include double_click and add a brief example (e.g.,
"If the user says 'double-click on X', call the `double_click` tool") so it
matches the plugin README and the implemented toolkit; modify the wording around
the rule and any example lines that reference clicking to include `double_click`
(refer to the sentence listing the tools and examples like "If the user says
'click on X'").

In `@examples/10_computer_use_example/README.md`:
- Around line 33-38: Update the fenced code block for the `.env` example so it
includes a language specifier (e.g., change the opening backticks to ```dotenv)
to ensure proper rendering and linting; locate the `.env` example code block in
README.md (the block showing GOOGLE_API_KEY, STREAM_API_KEY, STREAM_API_SECRET)
and add the language label to the opening fence.
- Around line 50-58: Update the "Available actions" table to match the plugin's
actual cell-based API: replace coordinate-based signatures with the real
function signatures such as click(cell, position, button), double_click(cell,
position, button), type_text(text) (or type_text(cell, text) if the plugin
targets a cell), key_press(keys) (unchanged), scroll(cell, direction, clicks,
position), mouse_move(cell, position), and open_path(path); ensure each table
row uses the exact function names and parameter order used by the plugin (e.g.,
click, double_click, type_text, key_press, scroll, mouse_move, open_path) so the
README aligns with the plugin README and instructions.md.

In `@plugins/computer_use/tests/test_computer_use.py`:
- Around line 7-11: The test module currently imports desktop automation helpers
(key_press, make_grid_actions, type_text) at collection time which triggers real
mouse/keyboard actions; change this by making the tests explicitly opt-in to a
real GUI session: remove those top-level imports and instead import key_press,
make_grid_actions and type_text inside the individual tests that need them (or
call pytest.importorskip for the plugin), and add a pytest marker or a pre-check
(e.g., `@pytest.mark.gui` or a fixture like require_display) that calls
pytest.skip when no GUI/display is available so tests only run when a real GUI
is explicitly requested.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_actions.py`:
- Around line 176-197: The open_path function should validate the provided path
before spawning the subprocess: ensure the path is absolute (use os.path.isabs
or pathlib.Path.is_absolute), verify it exists (os.path.exists or Path.exists)
and is a file or directory as expected (Path.is_file/Path.is_dir), and
optionally resolve symlinks (os.path.realpath or Path.resolve()) to normalize
it; if validation fails, return a clear error string (e.g., "Invalid path: must
be absolute and exist: {path}") instead of invoking the OS opener. Keep these
checks at the top of open_path, and only build/execute the cmd and
create_subprocess_exec after the path passes validation. Ensure you reference
open_path when making changes.
- Around line 153-160: The logger currently emits raw typed text in type_text
and full filesystem paths in other logger.debug calls, which can leak secrets;
create and use a small redaction helper (e.g., redact_text(value: str) and
redact_path(path: str)) that returns a short preview (first N chars) plus a
"<redacted>" marker or masks the middle, then replace direct uses of text/path
in logger.debug calls (e.g., the logger.debug in type_text and any
logger.debug/logger.error calls that print paths around lines 198-203) to log
redact_text(text) or redact_path(path) instead; apply the same helper
consistently across the module.
- Around line 16-17: The module currently sets pyautogui.FAILSAFE and
pyautogui.PAUSE at import time — remove those top-level assignments so importing
_actions no longer flips the process-wide failsafe; instead, set pyautogui.PAUSE
and (only if absolutely necessary) pyautogui.FAILSAFE inside an explicit runtime
initializer or at the start of the action functions that perform automation
(e.g., inside the functions in this module that call pyautogui), or expose a
configuration/init function (e.g., init_pyautogui) the plugin activation path
can call; ensure tests that import _actions do not change global failsafe state.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py`:
- Around line 44-48: The published QueuedVideoTrack returned by
publish_video_track currently can remain waiting after shutdown; update the
shutdown flow to explicitly terminate the track so downstream consumers stop
waiting: add a shutdown handler that calls a termination method on the track
(e.g., a stop/terminate/end_of_stream method on self._video_track) as part of
setting self._shutdown True, and ensure QueuedVideoTrack exposes and uses that
method to inject an end-of-stream frame/stop the async frame generator (so
close() alone is not relied on). Keep publish_video_track returning the same
track but ensure shutdown triggers the track's termination logic so consumers
receive EOF and stop waiting.

---

Nitpick comments:
In `@plugins/computer_use/tests/test_computer_use.py`:
- Around line 7-11: Tests are importing private helpers (key_press,
make_grid_actions, type_text) from vision_agents.plugins.computer_use._actions;
replace those direct imports by using the package's public surface—either import
the re-exported symbols from vision_agents.plugins.computer_use (if the package
__init__ exposes key_press/make_grid_actions/type_text) or exercise the behavior
via the public registration API (call register() / retrieve implementations
through FunctionRegistry) so tests target the public API rather than the
_actions module.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b84127f4-edf7-4595-896a-02ce62d8ef8f

📥 Commits

Reviewing files that changed from the base of the PR and between 6522e3b and 37048f7.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (15)

examples/10_computer_use_example/.gitignore
examples/10_computer_use_example/README.md
examples/10_computer_use_example/computer_use_example.py
examples/10_computer_use_example/instructions.md
examples/10_computer_use_example/pyproject.toml
plugins/computer_use/README.md
plugins/computer_use/pyproject.toml
plugins/computer_use/tests/__init__.py
plugins/computer_use/tests/test_computer_use.py
plugins/computer_use/vision_agents/plugins/computer_use/__init__.py
plugins/computer_use/vision_agents/plugins/computer_use/_actions.py
plugins/computer_use/vision_agents/plugins/computer_use/_grid.py
plugins/computer_use/vision_agents/plugins/computer_use/_processor.py
plugins/computer_use/vision_agents/plugins/computer_use/_toolkit.py
pyproject.toml

examples/10_computer_use_example/instructions.md

examples/10_computer_use_example/README.md

plugins/computer_use/tests/test_computer_use.py

plugins/computer_use/vision_agents/plugins/computer_use/_actions.py

coderabbitai · 2026-03-11T12:06:19Z

plugins/computer_use/vision_agents/plugins/computer_use/_actions.py

+async def type_text(text: str) -> str:
+    """Type a string of text into the currently focused element.
+
+    Args:
+        text: The text to type.
+    """
+    await _run_sync(pyautogui.write, text, interval=0.03)
+    logger.debug("type_text(%r)", text[:80])


⚠️ Potential issue | 🟠 Major

Redact raw text and path values from logs.

Line 160 logs the typed payload and Lines 200-203 log the full filesystem path. Both can contain secrets or PII, so normal tool use will leak sensitive data into debug/error logs.

Suggested fix

async def type_text(text: str) -> str: @@ - logger.debug("type_text(%r)", text[:80]) + logger.debug("type_text(len=%d)", len(text)) @@ - logger.error("open_path(%r) failed: %s", path, err) + logger.error("open_path failed: %s", err) @@ - logger.debug("open_path(%r)", path) + logger.debug("open_path() succeeded")

Also applies to: 198-203

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/computer_use/vision_agents/plugins/computer_use/_actions.py` around lines 153 - 160, The logger currently emits raw typed text in type_text and full filesystem paths in other logger.debug calls, which can leak secrets; create and use a small redaction helper (e.g., redact_text(value: str) and redact_path(path: str)) that returns a short preview (first N chars) plus a "<redacted>" marker or masks the middle, then replace direct uses of text/path in logger.debug calls (e.g., the logger.debug in type_text and any logger.debug/logger.error calls that print paths around lines 198-203) to log redact_text(text) or redact_path(path) instead; apply the same helper consistently across the module.

coderabbitai · 2026-03-11T12:06:19Z

plugins/computer_use/vision_agents/plugins/computer_use/_actions.py

+async def open_path(path: str) -> str:
+    """Open a file or folder using the OS default handler.
+
+    Args:
+        path: Absolute path to the file or folder to open.
+    """
+    system = platform.system()
+    if system == "Darwin":
+        cmd = ["open", path]
+    elif system == "Linux":
+        cmd = ["xdg-open", path]
+    elif system == "Windows":
+        cmd = ["explorer", path]
+    else:
+        return f"Unsupported platform: {system}"
+
+    proc = await asyncio.create_subprocess_exec(
+        *cmd,
+        stdout=subprocess.DEVNULL,
+        stderr=subprocess.PIPE,
+    )
+    _, stderr = await proc.communicate()


⚠️ Potential issue | 🟠 Major

Validate path before invoking the OS opener.

The docstring and tool description say this action accepts an absolute path, but there is no check before the subprocess is spawned. A relative value will resolve against the agent's working directory, which makes the tool non-deterministic and easy for the model to mis-target.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/computer_use/vision_agents/plugins/computer_use/_actions.py` around lines 176 - 197, The open_path function should validate the provided path before spawning the subprocess: ensure the path is absolute (use os.path.isabs or pathlib.Path.is_absolute), verify it exists (os.path.exists or Path.exists) and is a file or directory as expected (Path.is_file/Path.is_dir), and optionally resolve symlinks (os.path.realpath or Path.resolve()) to normalize it; if validation fails, return a clear error string (e.g., "Invalid path: must be absolute and exist: {path}") instead of invoking the OS opener. Keep these checks at the top of open_path, and only build/execute the cmd and create_subprocess_exec after the path passes validation. Ensure you reference open_path when making changes.

coderabbitai · 2026-03-11T12:06:19Z

plugins/computer_use/vision_agents/plugins/computer_use/_processor.py

+        self._video_track = QueuedVideoTrack()
+        self._shutdown = False
+
+    def publish_video_track(self) -> aiortc.VideoStreamTrack:
+        return self._video_track


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the file mentioned in the review find . -type f -name "_processor.py" | grep -i computer_use

Repository: GetStream/Vision-Agents

Length of output: 138

🏁 Script executed:

# Get the full content of the processor file to understand the shutdown logic cat -n plugins/computer_use/vision_agents/plugins/computer_use/_processor.py

Repository: GetStream/Vision-Agents

Length of output: 3837

🏁 Script executed:

# Search for the QueuedVideoTrack definition and usage rg "QueuedVideoTrack" --type py -B 2 -A 5

Repository: GetStream/Vision-Agents

Length of output: 27213

🏁 Script executed:

# Check the QueuedVideoTrack implementation and stop() method cat -n agents-core/vision_agents/core/utils/video_track.py | head -100

Repository: GetStream/Vision-Agents

Length of output: 3687

🏁 Script executed:

# Look for the recv() method in QueuedVideoTrack to understand behavior after stop rg "async def recv" agents-core/vision_agents/core/utils/video_track.py -A 10

Repository: GetStream/Vision-Agents

Length of output: 981

🏁 Script executed:

# Check if any other similar processors implement track.stop() in their close methods rg "async def close" --type py -A 3 | grep -A 3 "_video_track.stop"

Repository: GetStream/Vision-Agents

Length of output: 408

🏁 Script executed:

# Verify the line numbers mentioned in "Also applies to: 94-96" sed -n '88,96p' plugins/computer_use/vision_agents/plugins/computer_use/_processor.py

Repository: GetStream/Vision-Agents

Length of output: 436

Stop the published track during shutdown.

Line 48 returns a long-lived QueuedVideoTrack, but close() only detaches the frame handler. After shutdown, downstream consumers can keep waiting on a track that will never produce another frame.

Suggested fix

async def close(self) -> None: self._shutdown = True await self.stop_processing() + self._video_track.stop()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py` around lines 44 - 48, The published QueuedVideoTrack returned by publish_video_track currently can remain waiting after shutdown; update the shutdown flow to explicitly terminate the track so downstream consumers stop waiting: add a shutdown handler that calls a termination method on the track (e.g., a stop/terminate/end_of_stream method on self._video_track) as part of setting self._shutdown True, and ensure QueuedVideoTrack exposes and uses that method to inject an end-of-stream frame/stop the async frame generator (so close() alone is not relied on). Keep publish_video_track returning the same track but ensure shutdown triggers the track's termination logic so consumers receive EOF and stop waiting.

Made-with: Cursor

… to instructions Made-with: Cursor

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

examples/10_computer_use_example/README.md (1)
33-38: ⚠️ Potential issue | 🟡 Minor

Use dotenv language identifier for the .env code block.

The code block at line 34 shows .env file content but uses bash as the language identifier. Since this is environment variable syntax (not bash commands), use dotenv, env, or properties for accurate syntax highlighting and linting compliance.
📝 Suggested fix
 3. Set up your `.env`:
-   ```bash
+   ```dotenv
    GOOGLE_API_KEY=your_google_key
    STREAM_API_KEY=your_stream_key
    STREAM_API_SECRET=your_stream_secret
    ```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/10_computer_use_example/README.md` around lines 33 - 38, Update the
fenced code block that currently uses "bash" for the .env snippet to use an
environment-file language identifier (e.g., "dotenv" or "env") so syntax
highlighting/linting is correct; locate the fenced block containing the three
keys (GOOGLE_API_KEY, STREAM_API_KEY, STREAM_API_SECRET) and replace the opening
backticks language tag from "bash" to "dotenv".
examples/10_computer_use_example/instructions.md (1)
5-5: ⚠️ Potential issue | 🟡 Minor

Add an explicit double_click example.

The tool list now includes double_click, but the behavioral examples still only teach click and mouse_move. Adding a "double-click on X" example here will make the prompt more likely to select the dedicated tool.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/10_computer_use_example/instructions.md` at line 5, Add an explicit
double-click example sentence alongside the existing click/mouse_move examples
so the prompt demonstrates calling the double_click tool; specifically, add a
line such as "If the user says 'double-click on X', call the `double_click` tool
(do not just describe it)" and ensure the surrounding guidance still enforces
using tool functions like `click`, `double_click`, `mouse_move`, `type_text`,
`key_press`, `scroll`, and `open_path` rather than describing actions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/10_computer_use_example/instructions.md`:
- Line 18: The example "Use keyboard shortcuts." currently uses macOS-only
examples (cmd+tab, cmd+space, Spotlight) which will be wrong on other OSes;
update the instruction around the "Use keyboard shortcuts." line to either (A)
explicitly scope the guidance to macOS by adding a note like "On macOS: use cmd
shortcuts (e.g. cmd+c, cmd+tab, cmd+space to open Spotlight)" or (B) make it
OS-neutral by referencing the `key_press` action and using generic modifier
placeholders (e.g. "use modifier+key (Ctrl/⌘/Alt) such as copy: Ctrl/⌘+C, app
switch: Alt+Tab") and remove Spotlight-specific references so the guidance
applies across platforms.
- Line 17: Update the guideline that currently advises "Prefer open_path for
files and folders" to specify that open_path should only be used with absolute
filesystem paths (reference the open_path tool name in the rule), and change the
agent behavior: when the user supplies only a name or relative identifier the
agent must either ask for the absolute path or use alternative tools/strategies
(e.g., file browser/navigation actions) rather than calling open_path with an
invalid argument; ensure the rule text makes clear the contract is absolute-path
based and instructs prompting for the absolute path when missing.

---

Duplicate comments:
In `@examples/10_computer_use_example/instructions.md`:
- Line 5: Add an explicit double-click example sentence alongside the existing
click/mouse_move examples so the prompt demonstrates calling the double_click
tool; specifically, add a line such as "If the user says 'double-click on X',
call the `double_click` tool (do not just describe it)" and ensure the
surrounding guidance still enforces using tool functions like `click`,
`double_click`, `mouse_move`, `type_text`, `key_press`, `scroll`, and
`open_path` rather than describing actions.

In `@examples/10_computer_use_example/README.md`:
- Around line 33-38: Update the fenced code block that currently uses "bash" for
the .env snippet to use an environment-file language identifier (e.g., "dotenv"
or "env") so syntax highlighting/linting is correct; locate the fenced block
containing the three keys (GOOGLE_API_KEY, STREAM_API_KEY, STREAM_API_SECRET)
and replace the opening backticks language tag from "bash" to "dotenv".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 08c0f006-90e5-4695-bf4d-28ff211a0407

📥 Commits

Reviewing files that changed from the base of the PR and between 5c3cd84 and 64fcc62.

📒 Files selected for processing (2)

examples/10_computer_use_example/README.md
examples/10_computer_use_example/instructions.md

coderabbitai · 2026-03-11T12:16:35Z

examples/10_computer_use_example/instructions.md

+
+1. **Always use tools.** When asked to perform an action, call the tool immediately. Say briefly what you'll do, then call the tool.
+2. **Use cell references.** Look at the grid labels on screen and pass the `cell` parameter (e.g. "C2") for coordinate-based tools.
+3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.


⚠️ Potential issue | 🟠 Major

Restrict open_path guidance to absolute paths.

This rule currently tells the model to use open_path even when the user only provides a file/folder name, but the tool contract is absolute-path based. That will push the agent toward invalid calls instead of asking for the path or navigating another way.

Suggested wording

-3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon. +3. **Prefer open_path for files and folders.** If the user provides an absolute file or folder path, use `open_path`. If they only provide a name, ask for the path or locate it through the UI instead of guessing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.

3. **Prefer open_path for files and folders.** If the user provides an absolute file or folder path, use `open_path`. If they only provide a name, ask for the path or locate it through the UI instead of guessing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/10_computer_use_example/instructions.md` at line 17, Update the guideline that currently advises "Prefer open_path for files and folders" to specify that open_path should only be used with absolute filesystem paths (reference the open_path tool name in the rule), and change the agent behavior: when the user supplies only a name or relative identifier the agent must either ask for the absolute path or use alternative tools/strategies (e.g., file browser/navigation actions) rather than calling open_path with an invalid argument; ensure the rule text makes clear the contract is absolute-path based and instructs prompting for the absolute path when missing.

coderabbitai · 2026-03-11T12:16:35Z

examples/10_computer_use_example/instructions.md

+1. **Always use tools.** When asked to perform an action, call the tool immediately. Say briefly what you'll do, then call the tool.
+2. **Use cell references.** Look at the grid labels on screen and pass the `cell` parameter (e.g. "C2") for coordinate-based tools.
+3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.
+4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus (e.g. `cmd+c` to copy, `cmd+tab` to switch apps, `cmd+space` to open Spotlight).


⚠️ Potential issue | 🟠 Major

Avoid macOS-only shortcut examples in a generic prompt.

cmd+tab, cmd+space, and Spotlight are macOS-specific, so this guidance will produce wrong actions on other desktops unless the example is explicitly macOS-only. Either scope the example to macOS up front or make the shortcut advice OS-neutral.

Suggested wording

-4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus (e.g. `cmd+c` to copy, `cmd+tab` to switch apps, `cmd+space` to open Spotlight). +4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus, using shortcuts appropriate for the current OS (e.g. `cmd+c` on macOS or `ctrl+c` on Windows/Linux).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus (e.g. `cmd+c` to copy, `cmd+tab` to switch apps, `cmd+space` to open Spotlight).

4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus, using shortcuts appropriate for the current OS (e.g. `cmd+c` on macOS or `ctrl+c` on Windows/Linux).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/10_computer_use_example/instructions.md` at line 18, The example "Use keyboard shortcuts." currently uses macOS-only examples (cmd+tab, cmd+space, Spotlight) which will be wrong on other OSes; update the instruction around the "Use keyboard shortcuts." line to either (A) explicitly scope the guidance to macOS by adding a note like "On macOS: use cmd shortcuts (e.g. cmd+c, cmd+tab, cmd+space to open Spotlight)" or (B) make it OS-neutral by referencing the `key_press` action and using generic modifier placeholders (e.g. "use modifier+key (Ctrl/⌘/Alt) such as copy: Ctrl/⌘+C, app switch: Alt+Tab") and remove Spotlight-specific references so the guidance applies across platforms.

Both now accept an optional `grid=` parameter so tools and overlay share a single source of truth for grid dimensions. Made-with: Cursor

Made-with: Cursor

This reverts commit 9980adc.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

plugins/computer_use/tests/test_computer_use.py (1)

101-102: Prefer public API over private attribute access.

Line 101 reaches into _functions directly. Consider deriving registered names from the public get_tool_schemas() method for consistency with the rest of the test class.

♻️ Suggested refactor

-        registered = set(llm.function_registry._functions.keys())
+        schemas = llm.function_registry.get_tool_schemas()
+        registered = {s["name"] for s in schemas}
         assert registered == EXPECTED_TOOLS

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/tests/test_computer_use.py` around lines 101 - 102, The
test is accessing the private attribute _functions on llm.function_registry;
instead, call the public get_tool_schemas() on function_registry, extract the
tool names from its returned schemas (e.g., map schema["name"] or equivalent
key) to build the registered set, and assert that equals EXPECTED_TOOLS; update
the assertion that currently uses registered =
set(llm.function_registry._functions.keys()) to derive registered from
llm.function_registry.get_tool_schemas() so the test uses the public API (keep
EXPECTED_TOOLS unchanged).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py`:
- Around line 91-93: Replace the bare except in the draw_overlay call with a
targeted exception tuple: catch the likely errors raised by PIL and av (e.g.,
AttributeError, TypeError, ValueError, OSError, PIL.UnidentifiedImageError,
av.AVError) and bind the exception to a variable so you can log it via
logger.exception when falling back to annotated = frame; update imports to
reference PIL.UnidentifiedImageError and av.AVError so the except clause refers
to concrete exception classes instead of Exception.

---

Nitpick comments:
In `@plugins/computer_use/tests/test_computer_use.py`:
- Around line 101-102: The test is accessing the private attribute _functions on
llm.function_registry; instead, call the public get_tool_schemas() on
function_registry, extract the tool names from its returned schemas (e.g., map
schema["name"] or equivalent key) to build the registered set, and assert that
equals EXPECTED_TOOLS; update the assertion that currently uses registered =
set(llm.function_registry._functions.keys()) to derive registered from
llm.function_registry.get_tool_schemas() so the test uses the public API (keep
EXPECTED_TOOLS unchanged).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eeb7291a-42fc-4a8e-8dc4-10eab919f130

📥 Commits

Reviewing files that changed from the base of the PR and between 64fcc62 and b681299.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

examples/10_computer_use_example/computer_use_example.py
plugins/computer_use/README.md
plugins/computer_use/tests/test_computer_use.py
plugins/computer_use/vision_agents/plugins/computer_use/_processor.py
plugins/computer_use/vision_agents/plugins/computer_use/_toolkit.py

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/computer_use/README.md

coderabbitai · 2026-03-11T12:42:06Z

plugins/computer_use/vision_agents/plugins/computer_use/_processor.py

+        except Exception:
+            logger.exception("draw_overlay failed, forwarding original frame")
+            annotated = frame


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check what exceptions draw_overlay can raise ast-grep --pattern $'def draw_overlay($_, $_): $$$ '

Repository: GetStream/Vision-Agents

Length of output: 49

🏁 Script executed:

# Search for draw_overlay method definition rg "def draw_overlay" --type py -A 5

Repository: GetStream/Vision-Agents

Length of output: 707

🏁 Script executed:

# Examine the _processor.py file around the exception handler fd "_processor.py" --type f -x cat -n {} \; | head -100

Repository: GetStream/Vision-Agents

Length of output: 4560

🏁 Script executed:

# Search for what grid object is and its draw_overlay method rg "self\._grid\s*=" --type py -B 2 -A 2

Repository: GetStream/Vision-Agents

Length of output: 612

🏁 Script executed:

# Get the full draw_overlay method from _grid.py fd "_grid.py" --type f | xargs cat -n | head -200

Repository: GetStream/Vision-Agents

Length of output: 5666

🏁 Script executed:

# Search for exceptions that could be raised in image processing operations rg "raise|except" plugins/computer_use/vision_agents/plugins/computer_use/_grid.py

Repository: GetStream/Vision-Agents

Length of output: 439

🏁 Script executed:

# Look at the full context of the try-except in _processor.py around line 91 cat -n plugins/computer_use/vision_agents/plugins/computer_use/_processor.py | sed -n '85,98p'

Repository: GetStream/Vision-Agents

Length of output: 697

Catch specific exceptions instead of bare Exception.

The code violates the coding guideline that prohibits except Exception. Since draw_overlay relies on PIL and av library calls that can raise various exceptions (AttributeError, TypeError, ValueError, OSError, etc.), you should catch specific exception types rather than a catch-all.

Suggested fix

try: annotated = self._grid.draw_overlay(frame) - except Exception: + except (AttributeError, TypeError, ValueError, OSError): logger.exception("draw_overlay failed, forwarding original frame") annotated = frame

Adjust the exception tuple based on what PIL and av actually raise for the operations in draw_overlay.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py` around lines 91 - 93, Replace the bare except in the draw_overlay call with a targeted exception tuple: catch the likely errors raised by PIL and av (e.g., AttributeError, TypeError, ValueError, OSError, PIL.UnidentifiedImageError, av.AVError) and bind the exception to a variable so you can log it via logger.exception when falling back to annotated = frame; update imports to reference PIL.UnidentifiedImageError and av.AVError so the except clause refers to concrete exception classes instead of Exception.

github-actions bot added dependencies plugins examples config docs project-info labels Mar 11, 2026

d3xvn force-pushed the feat/computer-use-plugin branch from a91015c to 37048f7 Compare March 11, 2026 11:49

d3xvn marked this pull request as ready for review March 11, 2026 11:50

Add computer-use plugin to core optional dependencies

d9bf77a

Made-with: Cursor

github-actions bot added the agents-core label Mar 11, 2026

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

d3xvn added 2 commits March 11, 2026 13:10

Guard pyautogui import for headless CI environments

5c3cd84

Made-with: Cursor

Fix review feedback: update example README table and add double_click…

64fcc62

… to instructions Made-with: Cursor

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

d3xvn added 3 commits March 11, 2026 13:24

Add shared Grid instance support to register() and GridOverlayProcessor

8f73ecc

Both now accept an optional `grid=` parameter so tools and overlay share a single source of truth for grid dimensions. Made-with: Cursor

Make position a required parameter for grid-based actions

9980adc

Made-with: Cursor

Revert "Make position a required parameter for grid-based actions"

b681299

This reverts commit 9980adc.

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

	3. Prefer open_path for files and folders. If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.
	3. Prefer open_path for files and folders. If the user provides an absolute file or folder path, use `open_path`. If they only provide a name, ask for the path or locate it through the UI instead of guessing.

	4. Use keyboard shortcuts. When possible, prefer `key_press` over clicking through menus (e.g. `cmd+c` to copy, `cmd+tab` to switch apps, `cmd+space` to open Spotlight).
	4. Use keyboard shortcuts. When possible, prefer `key_press` over clicking through menus, using shortcuts appropriate for the current OS (e.g. `cmd+c` on macOS or `ctrl+c` on Windows/Linux).

Conversation

d3xvn commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

d3xvn commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading