Skip to content

Adding "winapp ui" for UI automation#380

Merged
nmetulev merged 49 commits intomainfrom
nmetulev/ui-command
Apr 8, 2026
Merged

Adding "winapp ui" for UI automation#380
nmetulev merged 49 commits intomainfrom
nmetulev/ui-command

Conversation

@nmetulev
Copy link
Copy Markdown
Member

@nmetulev nmetulev commented Apr 2, 2026

Description

Adds a new winapp ui command group that enables UI automation for Windows apps using Microsoft UI Automation (UIA). This allows
developers and AI agents to programmatically discover, inspect, and interact with UI elements across any Windows application.

Subcommands

Command Description
ui list-windows List visible windows for an app
ui status Connect to an app and print session info
ui inspect Dump the UI tree (or ancestors/subtree)
ui search Find elements by text query
ui invoke Activate an element via UIA patterns (Invoke, Toggle, Select, ExpandCollapse)
ui click Click via mouse input at screen coordinates (fallback for non-invocable controls)
ui focus Move keyboard focus to an element
ui set-value Set text/value on editable controls
ui get-property Read UIA property values from an element
ui get-focused Report the currently focused element
ui screenshot Save a PNG of a window or element
ui scroll Scroll a container (up/down/left/right/top/bottom)
ui scroll-into-view Scroll a target element into view
ui wait-for Wait for element appearance, disappearance, or property change

Key design decisions

  • Semantic slug selectors — inspect and search output stable, shell-safe element slugs (e.g., btn-close-d1a2) that can be passed
    directly to other commands
  • Flexible app targeting — Target apps by process name, window title, PID, or HWND (--window for stability)
  • Invokable ancestor fallback — search and invoke can surface/use a parent element that supports invoke when the matched child
    does not
  • Mouse input fallback — click uses real mouse injection (SetCursorPos + SendInput) for controls that don't expose UIA invoke
    patterns
  • Testable architecture — UIA and session services are behind interfaces with fakes for unit testing

Example usage

winapp ui inspect --app notepad
winapp ui search "Save" --app myapp
winapp ui invoke btn-save-d1a2 --app myapp
winapp ui screenshot --app myapp --output screen.png
winapp ui wait-for btn-done-f3c1 --app myapp --timeout 10

Type of Change

  • ✨ New feature

Checklist

  • New tests added for new functionality (if applicable)
  • Tested locally on Windows
  • Main README.md updated (if applicable)
  • docs/usage.md updated (if CLI commands changed)
  • Language-specific guides updated (if applicable)
  • Sample projects updated to reflect changes (if applicable)
  • Agent skill templates updated in docs/fragments/skills/ (if CLI commands/workflows changed)

nmetulev and others added 11 commits April 2, 2026 15:25
…374)

## Description

<!-- Briefly describe what this PR does and why -->
- Add `winapp ui click` command that uses `SendInput` mouse simulation
to click UI elements
- Supports `--double` (double-click) and `--right` (right-click) options
- Solves the problem where `ui invoke` fails on controls that don't
support `InvokePattern` (e.g., column headers,
  list items)

## Usage Example
Single click on a control that doesn't support InvokePattern

winapp ui click btn-column1-a3f2 -a myapp # single click by slug
winapp ui click "Column1" -a myapp # single click by text search
winapp ui click btn-column1-a3f2 -a myapp --double      # double-click
winapp ui click btn-column1-a3f2 -a myapp --right       # right-click
## Type of Change

<!-- Keep the applicable line(s), delete the rest -->

- ✨ New feature

## Checklist
<!-- Delete the ones that do not apply to your changes -->

- [x] Tested locally on Windows
- [x] [docs/usage.md](../docs/usage.md) updated (if CLI commands
changed)

## AI Description

<!-- ai-description-start -->
_This section is auto-generated by AI when the PR is opened or updated.
To opt out, delete this entire section including the marker comments._
<!-- ai-description-end -->
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new winapp ui command group to the WinApp CLI (and the npm wrapper/docs) for cross-app UI automation via Microsoft UI Automation (UIA), including inspection, search, invocation, screenshots, scrolling, focus, and waiting.

Changes:

  • Introduces winapp ui commands + shared options, registers them in DI/command routing, and adds UIA-backed services/models/helpers.
  • Adds npm API wrappers (uiClick, uiInspect, uiSearch, etc.) and expands documentation (usage, dedicated UI automation doc, skill templates/schema).
  • Adds unit tests with fake UI services for the new command handlers.

Reviewed changes

Copilot reviewed 46 out of 46 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/winapp-npm/src/winapp-commands.ts Adds npm wrapper functions/options for winapp ui subcommands.
src/winapp-CLI/WinApp.Cli/WinApp.Cli.csproj Enables app manifest + unsafe blocks for UIA/Win32 interop.
src/winapp-CLI/WinApp.Cli/Services/UiSessionService.cs Implements app/window resolution to create UI automation sessions.
src/winapp-CLI/WinApp.Cli/Services/SlugGenerator.cs Adds semantic slug generation/parsing for stable element selectors.
src/winapp-CLI/WinApp.Cli/Services/SelectorService.cs Parses selector strings into either slug or query.
src/winapp-CLI/WinApp.Cli/Services/IUiSessionService.cs Defines UI session service contract.
src/winapp-CLI/WinApp.Cli/Services/IUiAutomationService.cs Defines UIA operations surface (inspect/search/invoke/etc.).
src/winapp-CLI/WinApp.Cli/Services/ISelectorService.cs Defines selector parsing contract.
src/winapp-CLI/WinApp.Cli/NativeMethods.txt Adds CsWin32 UIA + Win32 input/screenshot APIs.
src/winapp-CLI/WinApp.Cli/Models/UiSessionInfo.cs Adds session model for PID/HWND targeting.
src/winapp-CLI/WinApp.Cli/Models/UiElement.cs Adds UI element model including selector + pattern state fields.
src/winapp-CLI/WinApp.Cli/Models/SelectorExpression.cs Adds selector representation (slug vs query).
src/winapp-CLI/WinApp.Cli/Helpers/UiJsonContext.cs Adds source-generated JSON context + --json output models.
src/winapp-CLI/WinApp.Cli/Helpers/MouseInput.cs Adds SendInput-based mouse click helper for ui click.
src/winapp-CLI/WinApp.Cli/Helpers/HostBuilderExtensions.cs Registers UI services and commands in DI + command pipeline.
src/winapp-CLI/WinApp.Cli/Commands/WinAppRootCommand.cs Adds ui to root subcommands + help grouping.
src/winapp-CLI/WinApp.Cli/Commands/UiWaitForCommand.cs Adds ui wait-for command.
src/winapp-CLI/WinApp.Cli/Commands/UiStatusCommand.cs Adds ui status command.
src/winapp-CLI/WinApp.Cli/Commands/UiSetValueCommand.cs Adds ui set-value command.
src/winapp-CLI/WinApp.Cli/Commands/UiSearchCommand.cs Adds ui search command.
src/winapp-CLI/WinApp.Cli/Commands/UiScrollIntoViewCommand.cs Adds ui scroll-into-view command.
src/winapp-CLI/WinApp.Cli/Commands/UiScrollCommand.cs Adds ui scroll command.
src/winapp-CLI/WinApp.Cli/Commands/UiScreenshotCommand.cs Adds ui screenshot command + PNG encoding path.
src/winapp-CLI/WinApp.Cli/Commands/UiListWindowsCommand.cs Adds ui list-windows command.
src/winapp-CLI/WinApp.Cli/Commands/UiInvokeCommand.cs Adds ui invoke command.
src/winapp-CLI/WinApp.Cli/Commands/UiInspectCommand.cs Adds ui inspect command.
src/winapp-CLI/WinApp.Cli/Commands/UiGetPropertyCommand.cs Adds ui get-property command.
src/winapp-CLI/WinApp.Cli/Commands/UiGetFocusedCommand.cs Adds ui get-focused command.
src/winapp-CLI/WinApp.Cli/Commands/UiFocusCommand.cs Adds ui focus command.
src/winapp-CLI/WinApp.Cli/Commands/UiCommand.cs Adds ui command group and wires subcommands.
src/winapp-CLI/WinApp.Cli/Commands/UiClickCommand.cs Adds ui click command using mouse simulation.
src/winapp-CLI/WinApp.Cli/Commands/SharedUiOptions.cs Adds shared options/arguments used across ui subcommands.
src/winapp-CLI/WinApp.Cli/app.manifest Adds PerMonitorV2 DPI awareness for accurate coordinates/screenshot behavior.
src/winapp-CLI/WinApp.Cli.Tests/UiCommandTests.cs Adds handler-level tests for ui subcommands using fakes.
src/winapp-CLI/WinApp.Cli.Tests/SelectorServiceTests.cs Adds tests for slug-vs-query selector parsing.
src/winapp-CLI/WinApp.Cli.Tests/FakeUiServices.cs Adds fake UIA + session services for unit tests.
scripts/generate-llm-docs.ps1 Adds ui-automation skill mapping + generation inputs.
llms.txt Documents new ui-automation skill presence.
docs/usage.md Adds ui section to general CLI usage docs.
docs/ui-automation.md Adds full UI automation documentation page.
docs/npm-usage.md Documents new npm wrapper APIs/options.
docs/fragments/skills/winapp-cli/ui-automation.md Adds hand-authored skill template content for UI automation.
docs/cli-schema.json Adds CLI schema entries for the new ui command group.
.github/plugin/skills/winapp-cli/ui-automation/SKILL.md Adds generated Copilot skill doc for UI automation.
.github/plugin/agents/winapp.agent.md Expands agent scope/docs to include UI automation workflows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/winapp-CLI/WinApp.Cli/Commands/SharedUiOptions.cs
Comment thread src/winapp-CLI/WinApp.Cli/Services/SlugGenerator.cs
Comment thread src/winapp-CLI/WinApp.Cli/Services/SlugGenerator.cs
Comment thread src/winapp-CLI/WinApp.Cli/Services/ISelectorService.cs
Comment thread docs/ui-automation.md Outdated
Comment thread docs/usage.md Outdated
Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiWaitForCommand.cs
Comment thread src/winapp-CLI/WinApp.Cli/Commands/UiListWindowsCommand.cs Outdated
Comment thread src/winapp-CLI/WinApp.Cli/Services/IUiSessionService.cs
Comment thread docs/usage.md
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Build Metrics Report

Binary Sizes

Artifact Baseline Current Delta
CLI (ARM64) 28.46 MB 29.22 MB 📈 +773.5 KB (+2.65%)
CLI (x64) 28.92 MB 29.67 MB 📈 +766.5 KB (+2.59%)
MSIX (ARM64) 12.05 MB 12.31 MB 📈 +266.9 KB (+2.16%)
MSIX (x64) 12.81 MB 13.09 MB 📈 +287.5 KB (+2.19%)
NPM Package 25.07 MB 25.63 MB 📈 +566.0 KB (+2.20%)
NuGet Package 25.16 MB 25.70 MB 📈 +554.8 KB (+2.15%)

Test Results

690 passed out of 690 tests in 327.3s (+28 tests, -10.6s vs. baseline)

Test Coverage

20.9% line coverage, 37% branch coverage · ⚠️ -10.0% vs. baseline

CLI Startup Time

39ms median (x64, winapp --version) · ✅ +9ms vs. baseline


Updated 2026-04-08 06:20:35 UTC · commit f5b9dc6 · workflow run

nmetulev and others added 15 commits April 2, 2026 16:04
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…dance

Phase 1 of UI automation improvements based on agent feedback:

- set-value: remove --text flag, make value a positional argument
  New syntax: winapp ui set-value <selector> <value> -a <app>
  Eliminates the #1 syntax confusion (8+ agent trials wasted)

- Create Helpers/UiErrors.cs with standardized error templates
  applied consistently across all 14 Ui*Command.cs files:
  MissingApp, MissingSelector, ElementNotFound, StaleElement, GenericError

- Error messages now recommend AutomationId for stable targeting

- Update all docs: ui-automation.md, usage.md, npm-usage.md,
  cli-schema.json (regenerated), SKILL.md (regenerated),
  agent.md, skill fragment template

- Update npm wrapper: text property -> value property

- Update tests for new positional syntax

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Promote unique AutomationIds to selectors in inspect/search output:
- After building tree, FindAll(TrueCondition) checks AutomationId
  uniqueness across the full UIA tree
- Unique AutomationIds replace generated slugs as selectors
- Adds exact AutomationId match in FindSingleElementAsync for
  faster, unambiguous resolution

Improve inspect output clarity:
- Wrap selectors in [brackets] for visual distinction
- Add 2-line header explaining format and selector types
- Skip quoted name when it equals the selector (no redundancy)
- Apply brackets consistently in inspect, search, get-focused

Update docs: selectors section rewritten to explain AutomationId
vs slug selectors, new inspect output format documented.
Regenerate cli-schema.json and SKILL.md files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace brackets with colored output for visual clarity:
- Selector: bold cyan (most important, what to copy)
- Name: green (display text)
- value: yellow (editable content)
- State/bounds: gray (secondary metadata)

Move legend from header to footer, merged with element count:
  'Found 10 elements (depth 3). Use the first word as selector,
   e.g.: winapp ui invoke TabView -a terminal'

Footer dynamically picks first interactive element for the example.
Drop brackets to avoid agents copying them as part of the selector.
Apply same color scheme to inspect, search, and get-focused output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Spectre.Console uses [[ and ]] to escape literal brackets in markup,
not backslash. Fixes 'Could not find color or style' crash when
elements have [collapsed], [disabled], [scroll:v] etc. state markers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When --app matches multiple windows, auto-select the best window
and proceed with a warning instead of throwing a blocking error.
This eliminates a wasted tool call for agents.

Selection heuristic:
- Prefer the foreground window (GetForegroundWindow)
- Fall back to the largest window (GetWindowRect area)

Warning output shows all windows with metadata:
- Label: window/popup/dialog (from Win32 class name)
- Size: from GetWindowRect
- Foreground marker
- Owner HWND: from GetWindow(GW_OWNER) — works across processes,
  linking WinUI 3 file pickers (PickerHost.exe) to parent app
- Win32 class name in brackets (debug info)

Also update list-windows command to show same metadata per window.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When --app matches multiple windows, auto-select the best window
and proceed with a warning instead of throwing a blocking error.
This eliminates a wasted tool call for agents.

Selection heuristic:
- Prefer the foreground window (GetForegroundWindow)
- Fall back to the largest window (GetWindowRect area)

Warning output shows all windows with colored metadata:
- HWND in cyan, selected line bold with (selected) label
- Label: window/popup/dialog (from Win32 class name)
- Size: from GetWindowRect
- Foreground marker in green
- Owner HWND: from GetWindow(GW_OWNER) — works across processes,
  linking WinUI 3 file pickers (PickerHost.exe) to parent app
- Win32 class name in brackets (debug info)

Also update list-windows command to show same colored metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace manual bracket escaping with Spectre's built-in
Markup.Escape() which handles all special characters.
Fixes broken output when element names, values, or window
titles contain characters that Spectre interprets as markup.

Applied to: UiInspectCommand, UiSearchCommand, UiGetFocusedCommand,
UiListWindowsCommand, UiSessionService.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nmetulev and others added 23 commits April 6, 2026 22:15
Element values from UIA can contain carriage returns, newlines,
and tabs (e.g., Notepad's Document element value='terminal\r').
These break Spectre MarkupLine rendering mid-line.

Replace \r\n, \r, \n with ↵ and \t with → for display.
Applied to both inspect and search output via EscapeMarkup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the WinUI 3 content-pane drill-down heuristic from
GetRootElement. The heuristic picked the largest child Pane
as the root, which skipped menu bars, title bars, status bars,
and other chrome elements. For Notepad, this showed only 2
elements (editor pane + document) instead of the full tree.

Now ElementFromHandle returns the window element directly,
giving the complete UI tree including menus and chrome.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When screenshot targets an app with multiple windows (e.g., app +
open dialog), capture each window to a separate file:
  screenshot.png — main window (largest)
  screenshot-dialog-1.png — dialog/popup

Cross-process dialog detection via GetWindow(GW_OWNER) finds file
pickers in PickerHost.exe and other owned windows.

Single window behavior unchanged. Direct -w targeting unchanged.
Element cropping (selector arg) uses single-window path.

JSON output returns array of UiScreenshotResult when multiple
windows are captured.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
set-value, focus, scroll-into-view, and get-property were logging
the internal sequential ID (e0, e1) instead of the selector
(AutomationId or slug). Now uses element.Selector ?? element.Id
consistently, matching invoke and click commands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
set-value, focus, scroll-into-view, and get-property were logging
the internal sequential ID (e0, e1) instead of the selector
(AutomationId or slug). Now uses element.Selector ?? element.Id
consistently, matching invoke and click commands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New command: winapp ui get-text <selector> -a <app>

Reads text from elements using 3-tier fallback:
1. TextPattern.DocumentRange.GetText(-1) — full text from
   RichEditBox/Document controls (not accessible via ValuePattern)
2. ValuePattern.CurrentValue — simple text from TextBox/ComboBox
3. element.Name — display text from labels/static elements

This fills a real gap: RichEditBox controls expose text only via
TextPattern, which get-property doesn't query. Agents previously
had no way to read rich text content.

Registered in UiCommand, HostBuilderExtensions, UiJsonContext.
Added GetTextAsync to IUiAutomationService + implementation.
Added IUIAutomationTextPattern/IUIAutomationTextRange to CsWin32.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Docs:
- Add get-text command to ui-automation.md, usage.md, skill fragment
- Update screenshot section with multi-window dialog example
- Add get-text to skill command map in generate-llm-docs.ps1
- Add ui click to skill command map (was missing)
- Regenerate cli-schema.json and SKILL.md

Tests:
- Add GetText_ReturnsText test (JSON output)
- Add GetText_WithoutSelector_ReturnsError test
- Total: 18 tests (was 16)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Property values from UIA (e.g., Document text content) can contain
newlines and carriage returns that break the single-line property
listing. Replace control chars with visual symbols (↵ for newlines,
→ for tabs), matching the sanitization already in inspect output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename the command from 'get-text' to 'get-value' to pair
naturally with 'set-value'. Broader meaning covers text content,
slider values, and any element's current value.

Same 3-tier fallback: TextPattern → ValuePattern → Name.

Updated: command file, UiCommand, HostBuilderExtensions,
UiJsonContext, tests, all docs, skill command map, regenerated
cli-schema.json and SKILL.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- samples/winui-app: WinUI 3 sample app with testable controls
  (counter button, text input, checkbox, submit flow) and
  AutomationProperties for UI automation
- scripts/test-e2e-winui-ui.ps1: E2E test script that exercises
  all winapp ui commands (35 tests, ~6s) using --json assertions
  and wait-for property value checks. Validates screenshots are
  non-empty.
- .github/workflows/build-package.yml: Added e2e-test-ui job
- Fix get-property --json: Changed UiPropertyResult.Properties from
  Dictionary<string, object?> to Dictionary<string, string?> since
  source-gen JSON serialization can't handle object? in NativeAOT

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On CI the first-run marker doesn't exist, so the banner fires on
the first command. Run --version first to create the marker file,
ensuring subsequent --json commands get clean stdout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- TreatUnmatchedTokensAsErrors=true on root command: unknown options
  like --text now error instead of being silently ignored
- search returns exit code 1 when no matches found (was always 0)
- scroll command: added --json support with UiScrollResult model,
  making all 14 ui commands support --json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On CI, the first run may produce extra output (banner, error JSON)
before the actual launch JSON. Use regex to extract the JSON object
containing ProcessId instead of assuming clean single-object stdout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Logs --version warmup output, first-run marker status, and raw
stdout/stderr from winapp run --detach --json line by line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Push-Location to sample dir before winapp run so that
   FetchDotNetPackageListAsync can find the .csproj and resolve
   the WinAppSDK runtime package for installation.
2. Remove duplicate PrintJson in RunCommand error path —
   StatusService.ExecuteWithStatusAsync already writes JSON error
   output when --json is active.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StatusService and RunCommand both write JSON errors - this is by
design since StatusService writes generic format and RunCommand
writes command-specific format. The e2e script handles multiple
JSON objects via regex extraction.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StatusService was writing its own JsonErrorOutput when a task returned
non-zero in --json mode, then the command handler (e.g. RunCommand)
would also write its own JSON error — producing two JSON objects on
stdout.

Fix: StatusService no longer writes JSON for handled task errors
(non-zero return code). It only writes JSON for unhandled exceptions
where no command handler will get a chance to respond. Command
handlers own their JSON output schema.

690/690 unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI passes a relative path like 'artifacts/cli/win-x64/winapp.exe'.
After Push-Location to the sample dir, the relative path breaks.
Resolve-Path at startup ensures it works from any directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nmetulev nmetulev merged commit e6868f3 into main Apr 8, 2026
12 checks passed
@nmetulev nmetulev deleted the nmetulev/ui-command branch April 8, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants