Skip to content

feat: computer use plugin#411

Open
d3xvn wants to merge 8 commits intomainfrom
feat/computer-use-plugin
Open

feat: computer use plugin#411
d3xvn wants to merge 8 commits intomainfrom
feat/computer-use-plugin

Conversation

@d3xvn
Copy link
Copy Markdown
Contributor

@d3xvn d3xvn commented Mar 11, 2026

  • Adds a new computer-use plugin that lets any VLM control the user's desktop via screen share. Tools include click, double_click, type_text, key_press, scroll, mouse_move, and open_path, all backed by PyAutoGUI.
  • Includes a GridOverlayProcessor that draws a labeled grid (default 15x15, columns A-O, rows 1-15) on screen share frames, so the model can target UI elements by cell reference (e.g. cell="H8") with optional sub-cell positioning (position="top-right"). Grid dimensions are customizable via cols/rows params.
  • Registration follows existing codebase patterns — a plain computer_use.register(llm) function for tools and computer_use.GridOverlayProcessor(fps=2) for the processor.
  • Adds a full example (examples/10_computer_use_example/) using Gemini Realtime with the grid overlay.

Summary by CodeRabbit

  • New Features

    • Added Computer Use plugin for desktop automation (click, double-click, type, key press, scroll, move, open)
    • Grid-based overlay and cell-targeting for precise on-screen actions
    • Desktop assistant example demonstrating real-time screen control and call workflow
  • Documentation

    • Plugin README and example guide with setup, prerequisites, actions, and safety notes
  • Tests

    • Comprehensive tests for grid behavior, tool schemas, and action execution
  • Chores

    • Ignored example .env files; workspace/project config updated to include the new plugin

New plugin that registers desktop automation tools (click, double_click,
type_text, key_press, scroll, mouse_move, open_path) on any LLM via the
standard FunctionRegistry. Backed by pyautogui and designed to work with
Realtime models that receive screen-share frames.

Made-with: Cursor
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Computer Use plugin and example: a configurable grid overlay on shared video frames, LLM tool registration for grid-targeted desktop actions, pyautogui-backed action implementations, tests, docs, and workspace/pyproject updates to include the plugin and example.

Changes

Cohort / File(s) Summary
Example Application
examples/10_computer_use_example/.gitignore, examples/10_computer_use_example/README.md, examples/10_computer_use_example/computer_use_example.py, examples/10_computer_use_example/instructions.md, examples/10_computer_use_example/pyproject.toml
Adds a runnable desktop-assistant example: README, instructions, CLI runner, .env ignore entry, and project metadata with local workspace sources.
Plugin Core Implementation
plugins/computer_use/vision_agents/plugins/computer_use/_grid.py, plugins/computer_use/vision_agents/plugins/computer_use/_processor.py, plugins/computer_use/vision_agents/plugins/computer_use/_actions.py, plugins/computer_use/vision_agents/plugins/computer_use/_toolkit.py, plugins/computer_use/vision_agents/plugins/computer_use/__init__.py
Implements Grid (parsing, cell->virtual coords, draw_overlay), GridOverlayProcessor (frame overlay and forwarding), pyautogui-backed async actions and helpers, LLM registration of tools, and package exports.
Plugin Packaging & Docs
plugins/computer_use/README.md, plugins/computer_use/pyproject.toml
Adds plugin README and pyproject configuration (build metadata, deps, workspace integration, packaging).
Tests
plugins/computer_use/tests/test_computer_use.py
Adds unit/integration tests for Grid validation/labels, tool registration and schemas, and action integration expectations.
Workspace Integration
pyproject.toml, agents-core/pyproject.toml
Updates workspace and agents-core pyproject entries to include the new computer-use plugin as a workspace source and dependency.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent
    participant LLM as LLM
    participant Grid as Grid
    participant Processor as GridOverlayProcessor
    participant Track as VideoTrack
    participant PyAutoGUI as PyAutoGUI

    Agent->>LLM: register computer-use tools (cols, rows)
    LLM->>Grid: instantiate/configure grid
    Agent->>Processor: process_video(track, shared_forwarder?)
    Track->>Processor: deliver raw video frame
    Processor->>Grid: draw_overlay(frame)
    Grid-->>Processor: annotated frame
    Processor->>Track: enqueue annotated frame
    Agent->>LLM: request action (e.g., click cell="C2")
    LLM-->>Agent: selected tool + params
    Agent->>PyAutoGUI: execute action at computed screen coords
    PyAutoGUI-->>Agent: result/confirmation
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I chart the grid in ink that is a heat,
small letters like teeth along the glass,
a hand unmoving learns to be a lever, click,
the cursor penetrates the soft and empty room,
and silence types the answer to itself.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "feat: computer use plugin" clearly and concisely summarizes the main change—adding a new computer-use plugin for desktop control via VLM.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/computer-use-plugin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…istration

- Replace x/y virtual coordinates with cell-based targeting (e.g. "H8")
- Add configurable Grid class with cols/rows params (default 15x15)
- Add GridOverlayProcessor to draw labeled grid on screen share frames
- Add sub-cell positioning (top-left, center, bottom-right, etc.)
- Replace ComputerUseToolkit class with plain `register(llm)` function
- Add computer use example with Gemini Realtime + grid overlay
- Update README with new API and grid documentation

Made-with: Cursor
@d3xvn d3xvn force-pushed the feat/computer-use-plugin branch from a91015c to 37048f7 Compare March 11, 2026 11:49
@d3xvn d3xvn marked this pull request as ready for review March 11, 2026 11:50
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
plugins/computer_use/tests/test_computer_use.py (1)

7-11: Test the public package surface, not _actions directly.

This couples the tests to a private module path and turns internal refactors into test breakage. If these helpers are meant to be supported, re-export them from vision_agents.plugins.computer_use; otherwise, exercise them through register() / FunctionRegistry instead.

As per coding guidelines, "Never import from private modules (_foo) outside of the package's own __init__.py. Use the public re-export instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/tests/test_computer_use.py` around lines 7 - 11, Tests
are importing private helpers (key_press, make_grid_actions, type_text) from
vision_agents.plugins.computer_use._actions; replace those direct imports by
using the package's public surface—either import the re-exported symbols from
vision_agents.plugins.computer_use (if the package __init__ exposes
key_press/make_grid_actions/type_text) or exercise the behavior via the public
registration API (call register() / retrieve implementations through
FunctionRegistry) so tests target the public API rather than the _actions
module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/10_computer_use_example/instructions.md`:
- Line 5: The Critical Rule's tool list is missing the double_click tool; update
the sentence that currently enumerates click, mouse_move, type_text, key_press,
scroll, open_path to also include double_click and add a brief example (e.g.,
"If the user says 'double-click on X', call the `double_click` tool") so it
matches the plugin README and the implemented toolkit; modify the wording around
the rule and any example lines that reference clicking to include `double_click`
(refer to the sentence listing the tools and examples like "If the user says
'click on X'").

In `@examples/10_computer_use_example/README.md`:
- Around line 33-38: Update the fenced code block for the `.env` example so it
includes a language specifier (e.g., change the opening backticks to ```dotenv)
to ensure proper rendering and linting; locate the `.env` example code block in
README.md (the block showing GOOGLE_API_KEY, STREAM_API_KEY, STREAM_API_SECRET)
and add the language label to the opening fence.
- Around line 50-58: Update the "Available actions" table to match the plugin's
actual cell-based API: replace coordinate-based signatures with the real
function signatures such as click(cell, position, button), double_click(cell,
position, button), type_text(text) (or type_text(cell, text) if the plugin
targets a cell), key_press(keys) (unchanged), scroll(cell, direction, clicks,
position), mouse_move(cell, position), and open_path(path); ensure each table
row uses the exact function names and parameter order used by the plugin (e.g.,
click, double_click, type_text, key_press, scroll, mouse_move, open_path) so the
README aligns with the plugin README and instructions.md.

In `@plugins/computer_use/tests/test_computer_use.py`:
- Around line 7-11: The test module currently imports desktop automation helpers
(key_press, make_grid_actions, type_text) at collection time which triggers real
mouse/keyboard actions; change this by making the tests explicitly opt-in to a
real GUI session: remove those top-level imports and instead import key_press,
make_grid_actions and type_text inside the individual tests that need them (or
call pytest.importorskip for the plugin), and add a pytest marker or a pre-check
(e.g., `@pytest.mark.gui` or a fixture like require_display) that calls
pytest.skip when no GUI/display is available so tests only run when a real GUI
is explicitly requested.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_actions.py`:
- Around line 176-197: The open_path function should validate the provided path
before spawning the subprocess: ensure the path is absolute (use os.path.isabs
or pathlib.Path.is_absolute), verify it exists (os.path.exists or Path.exists)
and is a file or directory as expected (Path.is_file/Path.is_dir), and
optionally resolve symlinks (os.path.realpath or Path.resolve()) to normalize
it; if validation fails, return a clear error string (e.g., "Invalid path: must
be absolute and exist: {path}") instead of invoking the OS opener. Keep these
checks at the top of open_path, and only build/execute the cmd and
create_subprocess_exec after the path passes validation. Ensure you reference
open_path when making changes.
- Around line 153-160: The logger currently emits raw typed text in type_text
and full filesystem paths in other logger.debug calls, which can leak secrets;
create and use a small redaction helper (e.g., redact_text(value: str) and
redact_path(path: str)) that returns a short preview (first N chars) plus a
"<redacted>" marker or masks the middle, then replace direct uses of text/path
in logger.debug calls (e.g., the logger.debug in type_text and any
logger.debug/logger.error calls that print paths around lines 198-203) to log
redact_text(text) or redact_path(path) instead; apply the same helper
consistently across the module.
- Around line 16-17: The module currently sets pyautogui.FAILSAFE and
pyautogui.PAUSE at import time — remove those top-level assignments so importing
_actions no longer flips the process-wide failsafe; instead, set pyautogui.PAUSE
and (only if absolutely necessary) pyautogui.FAILSAFE inside an explicit runtime
initializer or at the start of the action functions that perform automation
(e.g., inside the functions in this module that call pyautogui), or expose a
configuration/init function (e.g., init_pyautogui) the plugin activation path
can call; ensure tests that import _actions do not change global failsafe state.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py`:
- Around line 44-48: The published QueuedVideoTrack returned by
publish_video_track currently can remain waiting after shutdown; update the
shutdown flow to explicitly terminate the track so downstream consumers stop
waiting: add a shutdown handler that calls a termination method on the track
(e.g., a stop/terminate/end_of_stream method on self._video_track) as part of
setting self._shutdown True, and ensure QueuedVideoTrack exposes and uses that
method to inject an end-of-stream frame/stop the async frame generator (so
close() alone is not relied on). Keep publish_video_track returning the same
track but ensure shutdown triggers the track's termination logic so consumers
receive EOF and stop waiting.

---

Nitpick comments:
In `@plugins/computer_use/tests/test_computer_use.py`:
- Around line 7-11: Tests are importing private helpers (key_press,
make_grid_actions, type_text) from vision_agents.plugins.computer_use._actions;
replace those direct imports by using the package's public surface—either import
the re-exported symbols from vision_agents.plugins.computer_use (if the package
__init__ exposes key_press/make_grid_actions/type_text) or exercise the behavior
via the public registration API (call register() / retrieve implementations
through FunctionRegistry) so tests target the public API rather than the
_actions module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b84127f4-edf7-4595-896a-02ce62d8ef8f

📥 Commits

Reviewing files that changed from the base of the PR and between 6522e3b and 37048f7.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • examples/10_computer_use_example/.gitignore
  • examples/10_computer_use_example/README.md
  • examples/10_computer_use_example/computer_use_example.py
  • examples/10_computer_use_example/instructions.md
  • examples/10_computer_use_example/pyproject.toml
  • plugins/computer_use/README.md
  • plugins/computer_use/pyproject.toml
  • plugins/computer_use/tests/__init__.py
  • plugins/computer_use/tests/test_computer_use.py
  • plugins/computer_use/vision_agents/plugins/computer_use/__init__.py
  • plugins/computer_use/vision_agents/plugins/computer_use/_actions.py
  • plugins/computer_use/vision_agents/plugins/computer_use/_grid.py
  • plugins/computer_use/vision_agents/plugins/computer_use/_processor.py
  • plugins/computer_use/vision_agents/plugins/computer_use/_toolkit.py
  • pyproject.toml

Comment on lines +153 to +160
async def type_text(text: str) -> str:
"""Type a string of text into the currently focused element.

Args:
text: The text to type.
"""
await _run_sync(pyautogui.write, text, interval=0.03)
logger.debug("type_text(%r)", text[:80])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact raw text and path values from logs.

Line 160 logs the typed payload and Lines 200-203 log the full filesystem path. Both can contain secrets or PII, so normal tool use will leak sensitive data into debug/error logs.

Suggested fix
 async def type_text(text: str) -> str:
@@
-    logger.debug("type_text(%r)", text[:80])
+    logger.debug("type_text(len=%d)", len(text))
@@
-        logger.error("open_path(%r) failed: %s", path, err)
+        logger.error("open_path failed: %s", err)
@@
-    logger.debug("open_path(%r)", path)
+    logger.debug("open_path() succeeded")

Also applies to: 198-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_actions.py` around
lines 153 - 160, The logger currently emits raw typed text in type_text and full
filesystem paths in other logger.debug calls, which can leak secrets; create and
use a small redaction helper (e.g., redact_text(value: str) and
redact_path(path: str)) that returns a short preview (first N chars) plus a
"<redacted>" marker or masks the middle, then replace direct uses of text/path
in logger.debug calls (e.g., the logger.debug in type_text and any
logger.debug/logger.error calls that print paths around lines 198-203) to log
redact_text(text) or redact_path(path) instead; apply the same helper
consistently across the module.

Comment on lines +176 to +197
async def open_path(path: str) -> str:
"""Open a file or folder using the OS default handler.

Args:
path: Absolute path to the file or folder to open.
"""
system = platform.system()
if system == "Darwin":
cmd = ["open", path]
elif system == "Linux":
cmd = ["xdg-open", path]
elif system == "Windows":
cmd = ["explorer", path]
else:
return f"Unsupported platform: {system}"

proc = await asyncio.create_subprocess_exec(
*cmd,
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
)
_, stderr = await proc.communicate()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate path before invoking the OS opener.

The docstring and tool description say this action accepts an absolute path, but there is no check before the subprocess is spawned. A relative value will resolve against the agent's working directory, which makes the tool non-deterministic and easy for the model to mis-target.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_actions.py` around
lines 176 - 197, The open_path function should validate the provided path before
spawning the subprocess: ensure the path is absolute (use os.path.isabs or
pathlib.Path.is_absolute), verify it exists (os.path.exists or Path.exists) and
is a file or directory as expected (Path.is_file/Path.is_dir), and optionally
resolve symlinks (os.path.realpath or Path.resolve()) to normalize it; if
validation fails, return a clear error string (e.g., "Invalid path: must be
absolute and exist: {path}") instead of invoking the OS opener. Keep these
checks at the top of open_path, and only build/execute the cmd and
create_subprocess_exec after the path passes validation. Ensure you reference
open_path when making changes.

Comment on lines +44 to +48
self._video_track = QueuedVideoTrack()
self._shutdown = False

def publish_video_track(self) -> aiortc.VideoStreamTrack:
return self._video_track
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the file mentioned in the review
find . -type f -name "_processor.py" | grep -i computer_use

Repository: GetStream/Vision-Agents

Length of output: 138


🏁 Script executed:

# Get the full content of the processor file to understand the shutdown logic
cat -n plugins/computer_use/vision_agents/plugins/computer_use/_processor.py

Repository: GetStream/Vision-Agents

Length of output: 3837


🏁 Script executed:

# Search for the QueuedVideoTrack definition and usage
rg "QueuedVideoTrack" --type py -B 2 -A 5

Repository: GetStream/Vision-Agents

Length of output: 27213


🏁 Script executed:

# Check the QueuedVideoTrack implementation and stop() method
cat -n agents-core/vision_agents/core/utils/video_track.py | head -100

Repository: GetStream/Vision-Agents

Length of output: 3687


🏁 Script executed:

# Look for the recv() method in QueuedVideoTrack to understand behavior after stop
rg "async def recv" agents-core/vision_agents/core/utils/video_track.py -A 10

Repository: GetStream/Vision-Agents

Length of output: 981


🏁 Script executed:

# Check if any other similar processors implement track.stop() in their close methods
rg "async def close" --type py -A 3 | grep -A 3 "_video_track.stop"

Repository: GetStream/Vision-Agents

Length of output: 408


🏁 Script executed:

# Verify the line numbers mentioned in "Also applies to: 94-96"
sed -n '88,96p' plugins/computer_use/vision_agents/plugins/computer_use/_processor.py

Repository: GetStream/Vision-Agents

Length of output: 436


Stop the published track during shutdown.

Line 48 returns a long-lived QueuedVideoTrack, but close() only detaches the frame handler. After shutdown, downstream consumers can keep waiting on a track that will never produce another frame.

Suggested fix
 async def close(self) -> None:
     self._shutdown = True
     await self.stop_processing()
+    self._video_track.stop()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py` around
lines 44 - 48, The published QueuedVideoTrack returned by publish_video_track
currently can remain waiting after shutdown; update the shutdown flow to
explicitly terminate the track so downstream consumers stop waiting: add a
shutdown handler that calls a termination method on the track (e.g., a
stop/terminate/end_of_stream method on self._video_track) as part of setting
self._shutdown True, and ensure QueuedVideoTrack exposes and uses that method to
inject an end-of-stream frame/stop the async frame generator (so close() alone
is not relied on). Keep publish_video_track returning the same track but ensure
shutdown triggers the track's termination logic so consumers receive EOF and
stop waiting.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
examples/10_computer_use_example/README.md (1)

33-38: ⚠️ Potential issue | 🟡 Minor

Use dotenv language identifier for the .env code block.

The code block at line 34 shows .env file content but uses bash as the language identifier. Since this is environment variable syntax (not bash commands), use dotenv, env, or properties for accurate syntax highlighting and linting compliance.

📝 Suggested fix
 3. Set up your `.env`:
-   ```bash
+   ```dotenv
    GOOGLE_API_KEY=your_google_key
    STREAM_API_KEY=your_stream_key
    STREAM_API_SECRET=your_stream_secret
    ```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/10_computer_use_example/README.md` around lines 33 - 38, Update the
fenced code block that currently uses "bash" for the .env snippet to use an
environment-file language identifier (e.g., "dotenv" or "env") so syntax
highlighting/linting is correct; locate the fenced block containing the three
keys (GOOGLE_API_KEY, STREAM_API_KEY, STREAM_API_SECRET) and replace the opening
backticks language tag from "bash" to "dotenv".
examples/10_computer_use_example/instructions.md (1)

5-5: ⚠️ Potential issue | 🟡 Minor

Add an explicit double_click example.

The tool list now includes double_click, but the behavioral examples still only teach click and mouse_move. Adding a "double-click on X" example here will make the prompt more likely to select the dedicated tool.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/10_computer_use_example/instructions.md` at line 5, Add an explicit
double-click example sentence alongside the existing click/mouse_move examples
so the prompt demonstrates calling the double_click tool; specifically, add a
line such as "If the user says 'double-click on X', call the `double_click` tool
(do not just describe it)" and ensure the surrounding guidance still enforces
using tool functions like `click`, `double_click`, `mouse_move`, `type_text`,
`key_press`, `scroll`, and `open_path` rather than describing actions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/10_computer_use_example/instructions.md`:
- Line 18: The example "Use keyboard shortcuts." currently uses macOS-only
examples (cmd+tab, cmd+space, Spotlight) which will be wrong on other OSes;
update the instruction around the "Use keyboard shortcuts." line to either (A)
explicitly scope the guidance to macOS by adding a note like "On macOS: use cmd
shortcuts (e.g. cmd+c, cmd+tab, cmd+space to open Spotlight)" or (B) make it
OS-neutral by referencing the `key_press` action and using generic modifier
placeholders (e.g. "use modifier+key (Ctrl/⌘/Alt) such as copy: Ctrl/⌘+C, app
switch: Alt+Tab") and remove Spotlight-specific references so the guidance
applies across platforms.
- Line 17: Update the guideline that currently advises "Prefer open_path for
files and folders" to specify that open_path should only be used with absolute
filesystem paths (reference the open_path tool name in the rule), and change the
agent behavior: when the user supplies only a name or relative identifier the
agent must either ask for the absolute path or use alternative tools/strategies
(e.g., file browser/navigation actions) rather than calling open_path with an
invalid argument; ensure the rule text makes clear the contract is absolute-path
based and instructs prompting for the absolute path when missing.

---

Duplicate comments:
In `@examples/10_computer_use_example/instructions.md`:
- Line 5: Add an explicit double-click example sentence alongside the existing
click/mouse_move examples so the prompt demonstrates calling the double_click
tool; specifically, add a line such as "If the user says 'double-click on X',
call the `double_click` tool (do not just describe it)" and ensure the
surrounding guidance still enforces using tool functions like `click`,
`double_click`, `mouse_move`, `type_text`, `key_press`, `scroll`, and
`open_path` rather than describing actions.

In `@examples/10_computer_use_example/README.md`:
- Around line 33-38: Update the fenced code block that currently uses "bash" for
the .env snippet to use an environment-file language identifier (e.g., "dotenv"
or "env") so syntax highlighting/linting is correct; locate the fenced block
containing the three keys (GOOGLE_API_KEY, STREAM_API_KEY, STREAM_API_SECRET)
and replace the opening backticks language tag from "bash" to "dotenv".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 08c0f006-90e5-4695-bf4d-28ff211a0407

📥 Commits

Reviewing files that changed from the base of the PR and between 5c3cd84 and 64fcc62.

📒 Files selected for processing (2)
  • examples/10_computer_use_example/README.md
  • examples/10_computer_use_example/instructions.md


1. **Always use tools.** When asked to perform an action, call the tool immediately. Say briefly what you'll do, then call the tool.
2. **Use cell references.** Look at the grid labels on screen and pass the `cell` parameter (e.g. "C2") for coordinate-based tools.
3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restrict open_path guidance to absolute paths.

This rule currently tells the model to use open_path even when the user only provides a file/folder name, but the tool contract is absolute-path based. That will push the agent toward invalid calls instead of asking for the path or navigating another way.

Suggested wording
-3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.
+3. **Prefer open_path for files and folders.** If the user provides an absolute file or folder path, use `open_path`. If they only provide a name, ask for the path or locate it through the UI instead of guessing.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.
3. **Prefer open_path for files and folders.** If the user provides an absolute file or folder path, use `open_path`. If they only provide a name, ask for the path or locate it through the UI instead of guessing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/10_computer_use_example/instructions.md` at line 17, Update the
guideline that currently advises "Prefer open_path for files and folders" to
specify that open_path should only be used with absolute filesystem paths
(reference the open_path tool name in the rule), and change the agent behavior:
when the user supplies only a name or relative identifier the agent must either
ask for the absolute path or use alternative tools/strategies (e.g., file
browser/navigation actions) rather than calling open_path with an invalid
argument; ensure the rule text makes clear the contract is absolute-path based
and instructs prompting for the absolute path when missing.

1. **Always use tools.** When asked to perform an action, call the tool immediately. Say briefly what you'll do, then call the tool.
2. **Use cell references.** Look at the grid labels on screen and pass the `cell` parameter (e.g. "C2") for coordinate-based tools.
3. **Prefer open_path for files and folders.** If the user asks to open something by name or path, use `open_path` instead of trying to find and double-click an icon.
4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus (e.g. `cmd+c` to copy, `cmd+tab` to switch apps, `cmd+space` to open Spotlight).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid macOS-only shortcut examples in a generic prompt.

cmd+tab, cmd+space, and Spotlight are macOS-specific, so this guidance will produce wrong actions on other desktops unless the example is explicitly macOS-only. Either scope the example to macOS up front or make the shortcut advice OS-neutral.

Suggested wording
-4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus (e.g. `cmd+c` to copy, `cmd+tab` to switch apps, `cmd+space` to open Spotlight).
+4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus, using shortcuts appropriate for the current OS (e.g. `cmd+c` on macOS or `ctrl+c` on Windows/Linux).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus (e.g. `cmd+c` to copy, `cmd+tab` to switch apps, `cmd+space` to open Spotlight).
4. **Use keyboard shortcuts.** When possible, prefer `key_press` over clicking through menus, using shortcuts appropriate for the current OS (e.g. `cmd+c` on macOS or `ctrl+c` on Windows/Linux).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/10_computer_use_example/instructions.md` at line 18, The example
"Use keyboard shortcuts." currently uses macOS-only examples (cmd+tab,
cmd+space, Spotlight) which will be wrong on other OSes; update the instruction
around the "Use keyboard shortcuts." line to either (A) explicitly scope the
guidance to macOS by adding a note like "On macOS: use cmd shortcuts (e.g.
cmd+c, cmd+tab, cmd+space to open Spotlight)" or (B) make it OS-neutral by
referencing the `key_press` action and using generic modifier placeholders (e.g.
"use modifier+key (Ctrl/⌘/Alt) such as copy: Ctrl/⌘+C, app switch: Alt+Tab") and
remove Spotlight-specific references so the guidance applies across platforms.

d3xvn added 3 commits March 11, 2026 13:24
Both now accept an optional `grid=` parameter so tools and overlay
share a single source of truth for grid dimensions.

Made-with: Cursor
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
plugins/computer_use/tests/test_computer_use.py (1)

101-102: Prefer public API over private attribute access.

Line 101 reaches into _functions directly. Consider deriving registered names from the public get_tool_schemas() method for consistency with the rest of the test class.

♻️ Suggested refactor
-        registered = set(llm.function_registry._functions.keys())
+        schemas = llm.function_registry.get_tool_schemas()
+        registered = {s["name"] for s in schemas}
         assert registered == EXPECTED_TOOLS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/tests/test_computer_use.py` around lines 101 - 102, The
test is accessing the private attribute _functions on llm.function_registry;
instead, call the public get_tool_schemas() on function_registry, extract the
tool names from its returned schemas (e.g., map schema["name"] or equivalent
key) to build the registered set, and assert that equals EXPECTED_TOOLS; update
the assertion that currently uses registered =
set(llm.function_registry._functions.keys()) to derive registered from
llm.function_registry.get_tool_schemas() so the test uses the public API (keep
EXPECTED_TOOLS unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py`:
- Around line 91-93: Replace the bare except in the draw_overlay call with a
targeted exception tuple: catch the likely errors raised by PIL and av (e.g.,
AttributeError, TypeError, ValueError, OSError, PIL.UnidentifiedImageError,
av.AVError) and bind the exception to a variable so you can log it via
logger.exception when falling back to annotated = frame; update imports to
reference PIL.UnidentifiedImageError and av.AVError so the except clause refers
to concrete exception classes instead of Exception.

---

Nitpick comments:
In `@plugins/computer_use/tests/test_computer_use.py`:
- Around line 101-102: The test is accessing the private attribute _functions on
llm.function_registry; instead, call the public get_tool_schemas() on
function_registry, extract the tool names from its returned schemas (e.g., map
schema["name"] or equivalent key) to build the registered set, and assert that
equals EXPECTED_TOOLS; update the assertion that currently uses registered =
set(llm.function_registry._functions.keys()) to derive registered from
llm.function_registry.get_tool_schemas() so the test uses the public API (keep
EXPECTED_TOOLS unchanged).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eeb7291a-42fc-4a8e-8dc4-10eab919f130

📥 Commits

Reviewing files that changed from the base of the PR and between 64fcc62 and b681299.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • examples/10_computer_use_example/computer_use_example.py
  • plugins/computer_use/README.md
  • plugins/computer_use/tests/test_computer_use.py
  • plugins/computer_use/vision_agents/plugins/computer_use/_processor.py
  • plugins/computer_use/vision_agents/plugins/computer_use/_toolkit.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/computer_use/README.md

Comment on lines +91 to +93
except Exception:
logger.exception("draw_overlay failed, forwarding original frame")
annotated = frame
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check what exceptions draw_overlay can raise
ast-grep --pattern $'def draw_overlay($_, $_):
  $$$
'

Repository: GetStream/Vision-Agents

Length of output: 49


🏁 Script executed:

# Search for draw_overlay method definition
rg "def draw_overlay" --type py -A 5

Repository: GetStream/Vision-Agents

Length of output: 707


🏁 Script executed:

# Examine the _processor.py file around the exception handler
fd "_processor.py" --type f -x cat -n {} \; | head -100

Repository: GetStream/Vision-Agents

Length of output: 4560


🏁 Script executed:

# Search for what grid object is and its draw_overlay method
rg "self\._grid\s*=" --type py -B 2 -A 2

Repository: GetStream/Vision-Agents

Length of output: 612


🏁 Script executed:

# Get the full draw_overlay method from _grid.py
fd "_grid.py" --type f | xargs cat -n | head -200

Repository: GetStream/Vision-Agents

Length of output: 5666


🏁 Script executed:

# Search for exceptions that could be raised in image processing operations
rg "raise|except" plugins/computer_use/vision_agents/plugins/computer_use/_grid.py

Repository: GetStream/Vision-Agents

Length of output: 439


🏁 Script executed:

# Look at the full context of the try-except in _processor.py around line 91
cat -n plugins/computer_use/vision_agents/plugins/computer_use/_processor.py | sed -n '85,98p'

Repository: GetStream/Vision-Agents

Length of output: 697


Catch specific exceptions instead of bare Exception.

The code violates the coding guideline that prohibits except Exception. Since draw_overlay relies on PIL and av library calls that can raise various exceptions (AttributeError, TypeError, ValueError, OSError, etc.), you should catch specific exception types rather than a catch-all.

Suggested fix
         try:
             annotated = self._grid.draw_overlay(frame)
-        except Exception:
+        except (AttributeError, TypeError, ValueError, OSError):
             logger.exception("draw_overlay failed, forwarding original frame")
             annotated = frame

Adjust the exception tuple based on what PIL and av actually raise for the operations in draw_overlay.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/computer_use/vision_agents/plugins/computer_use/_processor.py` around
lines 91 - 93, Replace the bare except in the draw_overlay call with a targeted
exception tuple: catch the likely errors raised by PIL and av (e.g.,
AttributeError, TypeError, ValueError, OSError, PIL.UnidentifiedImageError,
av.AVError) and bind the exception to a variable so you can log it via
logger.exception when falling back to annotated = frame; update imports to
reference PIL.UnidentifiedImageError and av.AVError so the except clause refers
to concrete exception classes instead of Exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant