Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
9dcba1f
feat: add open-operator — AI agent browser automation
jackwener Mar 30, 2026
827ef6d
fix(agent): fallback to JS injection when CDP click/type fails
jackwener Mar 30, 2026
c919694
feat(agent): rich trace capture + LLM-powered TS skill generation
jackwener Mar 30, 2026
4046ddf
fix(agent): syntax validation for generated TS adapters
jackwener Mar 30, 2026
934212d
fix(agent): security and robustness fixes for skill generation
jackwener Mar 30, 2026
760ee8f
feat(agent): major capability upgrade — planning, loop detection, pro…
jackwener Mar 30, 2026
1d52406
fix(agent): review fixes — AX tree, loop detection, compaction, injec…
jackwener Mar 30, 2026
ea14094
fix(agent): package imports, review fixes, and cleanup
jackwener Mar 31, 2026
916b4a3
fix(agent): critical security + important robustness fixes
jackwener Mar 31, 2026
20d4bbb
fix(agent): scroll element into view before click/type
jackwener Mar 31, 2026
b9c7516
fix(extension): use about:blank instead of data: URI for blank tabs
jackwener Mar 31, 2026
0df5650
fix(agent): final review fixes — cost tracking, macOS, focus verifica…
jackwener Mar 31, 2026
9d0b789
fix(agent): use scrollIntoView nearest instead of center
jackwener Mar 31, 2026
c62fcb1
feat(agent): autocomplete protocol, compaction guard, final response,…
jackwener Mar 31, 2026
234f705
docs: add AutoResearch design spec for operate optimization
jackwener Mar 31, 2026
b356bf0
feat: add AutoResearch framework for operate optimization
jackwener Mar 31, 2026
f5892a9
autoresearch: 18/20 — click reliability + eval parser robustness
jackwener Mar 31, 2026
6031118
autoresearch: expand task set to 59 tasks (49 train + 10 test, 8 cate…
jackwener Mar 31, 2026
c784622
autoresearch: 52/59 baseline — fix 3 task definitions (nav-multi-step…
jackwener Mar 31, 2026
5d4d5b2
autoresearch: 53/59 — task fixes, test set 10/10 perfect score
jackwener Mar 31, 2026
24e0ee7
fix(extension): retry debugger attach up to 3 times with delay
jackwener Mar 31, 2026
4825cc6
fix(extension): two-layer retry for extension interference
jackwener Mar 31, 2026
5a98e65
fix: PR hygiene, docs, retry scoping, and cleanup
jackwener Mar 31, 2026
126d204
feat(agent): multi-provider LLM support (Anthropic + OpenAI) + doctor…
jackwener Mar 31, 2026
9dc17c5
refactor: remove legacy env var fallbacks, use OPENCLI_* only
jackwener Mar 31, 2026
e131e35
refactor(agent): split LLM config into OPENCLI_PROVIDER + OPENCLI_MODEL
jackwener Mar 31, 2026
0fa0efd
fix(agent): detect HTML response from misconfigured API proxy
jackwener Mar 31, 2026
33335ab
docs: update OPERATE.md, README, and SKILL.md for multi-provider config
jackwener Mar 31, 2026
6f9df35
fix: remove stale --model flag and ANTHROPIC_API_KEY references from …
jackwener Mar 31, 2026
cb510bd
fix: sync package-lock.json with openai dependency (CI was failing)
jackwener Mar 31, 2026
c24d168
docs: add OPENCLI_BASE_URL to Quick Start configuration
jackwener Mar 31, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ docs/.vitepress/cache

# Database files
*.db
autoresearch/results/
extension/dist/
140 changes: 140 additions & 0 deletions OPERATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# opencli operate — AI Browser Automation

`opencli operate` lets an AI agent autonomously control your browser to complete tasks described in natural language. It reuses your existing Chrome login sessions, so no passwords needed.

## Quick Start

```bash
# 1. Configure LLM provider
export OPENCLI_PROVIDER=anthropic # or openai
export OPENCLI_MODEL=sonnet # alias or full model ID
export OPENCLI_API_KEY=sk-ant-... # your API key
export OPENCLI_BASE_URL=https://... # optional: API proxy (add /v1 for OpenAI proxies)

# 2. Run
opencli operate "go to Hacker News and extract the top 5 stories"
opencli operate --url https://github.com/trending "extract the top 3 trending repos"
opencli operate -v "search for flights from NYC to LA on Google Flights"
```

## How It Works

```
You describe a task in natural language
→ Agent observes the page (DOM snapshot)
→ LLM decides what to do (click, type, scroll, extract...)
→ Actions execute in your browser
→ Agent observes the result
→ Repeat until done
```

The agent uses your existing Chrome browser session through the OpenCLI extension, so it has access to all your logged-in accounts (Twitter, GitHub, Gmail, etc.) without needing passwords.

## Options

| Option | Default | Description |
|--------|---------|-------------|
| `--url <url>` | — | Starting URL (agent navigates if omitted) |
| `--max-steps <n>` | 50 | Maximum agent steps before timeout |
| `--screenshot` | false | Include screenshots in LLM context (more accurate but more expensive) |
| `--record` | false | Record action trace for debugging |
| `--save-as <site/name>` | — | Save successful operation as reusable CLI skill |
| `-v, --verbose` | false | Show step-by-step reasoning |

## Configuration

### Environment Variables

```bash
# Required
export OPENCLI_PROVIDER=anthropic # Provider: anthropic or openai
export OPENCLI_API_KEY=sk-ant-... # API key for your provider

# Optional
export OPENCLI_MODEL=sonnet # Model alias or full ID (default: sonnet)
export OPENCLI_BASE_URL=https://... # API proxy URL (must include /v1 for OpenAI proxies)
```

### Model Aliases

| Provider | Aliases | Default |
|----------|---------|---------|
| anthropic | `sonnet`, `opus`, `haiku` | sonnet |
| openai | `gpt-5.4`, `gpt-4.1`, `gpt-4o`, `o3`, `o4-mini` | gpt-4o |

You can also use full model IDs (e.g., `claude-sonnet-4-20250514`, `gpt-5.4`).

### Verify Configuration

```bash
opencli doctor # Shows LLM provider, model, and connectivity status
```

### Chrome Extension

The OpenCLI browser extension must be installed and connected. Run `opencli doctor` to check.

## Save as Skill

After a successful operation, save it as a reusable CLI command that runs **without AI**:

```bash
# First run: AI agent completes the task
opencli operate --save-as hn/top "get the top 5 Hacker News stories" --url https://news.ycombinator.com

# Future runs: deterministic, no LLM needed
opencli hn top
```

The `--save-as` flag analyzes the agent's actions and captured network requests, then uses the LLM to generate an optimized TypeScript adapter. If the agent discovered an API during execution, the generated skill will call the API directly instead of replaying UI actions.

## Cost Estimate

Each `operate` run costs approximately **$0.01–$0.50** depending on task complexity:

| Task Type | Typical Steps | Estimated Cost |
|-----------|--------------|----------------|
| Simple extract (page title) | 1–2 | $0.01 |
| Search + extract | 3–6 | $0.05–0.15 |
| Form filling | 3–8 | $0.05–0.20 |
| Multi-step navigation | 5–10 | $0.10–0.50 |

Using `--save-as` adds one additional LLM call ($0.05–0.20) for skill generation.

## Troubleshooting

### "OPENCLI_API_KEY is not set"
Configure your LLM provider:
```bash
export OPENCLI_PROVIDER=anthropic
export OPENCLI_API_KEY=sk-ant-...
```

### "LLM returned HTML instead of JSON"
Your `OPENCLI_BASE_URL` is pointing to the proxy's dashboard, not its API endpoint. Add `/v1`:
```bash
export OPENCLI_BASE_URL='https://your-proxy.com/v1'
```

### "Extension not connected"
Run `opencli doctor` to diagnose. Make sure the OpenCLI extension is installed and enabled in Chrome.

### "attach failed: Cannot access a chrome-extension:// URL"
Another Chrome extension (usually 1Password or a debugger extension) is interfering. The agent retries automatically (up to 5 times for operate commands), but if it persists, temporarily disable the conflicting extension.

### "LLM returned empty response"
Your API proxy may be truncating responses, or the model name may not be supported by your proxy. Check `OPENCLI_MODEL` and `OPENCLI_BASE_URL`.

### Agent fills wrong fields or misses content below the fold
The agent scrolls elements into view before interacting, but complex pages with many dynamic elements can sometimes cause issues. Try running with `-v` to see what the agent sees and does.

## AutoResearch (Experimental)

OpenCLI includes an AutoResearch framework that automatically optimizes the agent's performance:

```bash
# Run automated optimization (requires Claude Code)
./autoresearch/run.sh
```

This uses Claude Code to iteratively modify the agent's code, evaluate against a test suite of 59 tasks, and commit only improvements. See `docs/superpowers/specs/2026-03-31-autoresearch-operate-design.md` for details.
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,31 @@ opencli hackernews top --limit 5 # Public API, no browser needed
opencli bilibili hot --limit 5 # Browser command (requires Extension)
```

### 4. AI Agent (New!)

Let an AI agent operate your browser with natural language. Supports Anthropic and OpenAI:

```bash
# Configure (one-time)
export OPENCLI_PROVIDER=anthropic # or openai
export OPENCLI_MODEL=sonnet # model alias
export OPENCLI_API_KEY=sk-ant-... # your API key
export OPENCLI_BASE_URL=https://... # optional: API proxy

# Run
opencli operate "go to Hacker News and extract the top 5 stories"
opencli operate --url https://github.com/trending "extract top 3 trending repos"
```

Save successful operations as reusable commands (no AI needed for replay):

```bash
opencli operate --save-as hn/top "get top 5 HN stories" --url https://news.ycombinator.com
opencli hn top # Runs without AI from now on
```

See [OPERATE.md](./OPERATE.md) for full documentation, configuration, and troubleshooting.

### Update

```bash
Expand Down
12 changes: 12 additions & 0 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,18 @@ opencli record <url> --out .opencli/record/x # 自定义输出目录
# .opencli/record/<site>/captured.json ← 原始捕获数据(带 url/method/body)
# .opencli/record/<site>/candidates/*.yaml ← 高置信度候选适配器(score ≥ 8,有 array 结果)

# Operate: AI agent autonomously controls the browser to complete tasks
# Supports Anthropic (Claude) and OpenAI (GPT) models
# Requires: OPENCLI_PROVIDER, OPENCLI_API_KEY, optionally OPENCLI_MODEL, OPENCLI_BASE_URL
opencli operate "go to HN and extract the top 5 stories"
opencli operate --url https://github.com/trending "extract top 3 repos"
opencli operate -v "fill the form with test data" # verbose: see each step
opencli operate --save-as hn/top "get top HN stories" # save as reusable skill
opencli operate --screenshot "describe this page layout" # include screenshots for LLM
opencli operate --max-steps 20 "quick task" # limit step count
# After --save-as, the skill runs without AI:
# opencli hn top

# Strategy Cascade: auto-probe PUBLIC → COOKIE → HEADER
opencli cascade <api-url>

Expand Down
1 change: 1 addition & 0 deletions autoresearch/baseline.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
53/59
Loading
Loading