peekaboo see
peekaboo see captures the current macOS UI, extracts accessibility metadata, and (optionally) saves annotated screenshots. CLI and agent flows rely on these UI maps to find element IDs (elem_123), bounds, labels, and snapshot IDs.
# Capture frontmost window, print JSON, and save an annotated PNG
peekaboo see --json --annotate --path /tmp/see.png
# Target a specific app or window title
peekaboo see --app "Google Chrome" --window-title "Login"
#When to use
- Before issuing
click/typecommands so you have stable element IDs. - When debugging automation failures—
--jsonincludes raw bounds, labels, and snapshot IDs. - To snapshot UI regressions (pass
--annotate+--path).
#Key options
| Flag | Description | |||
|---|---|---|---|---|
--app, --window-title, --pid | Limit capture to a known app/window/process. | |||
| `--mode screen | window | frontmost | multi` | Override the auto target picker. area is intentionally rejected because see does not expose rectangle coordinates; use peekaboo image --mode area --region x,y,width,height for raw region screenshots. |
--annotate | Overlay element bounds/IDs on the output image. | |||
--path <file> | Save the screenshot/annotation to disk. | |||
--json | Emit structured metadata (recommended for scripting). | |||
--menubar | Capture menu bar popovers via window list + OCR (useful for status-item settings panels). When --app is set, the app name is used as an OCR hint for popover selection. | |||
--timeout-seconds <seconds> | Increase overall timeout for large/complex windows (defaults to 20s, or 60s with --analyze). | |||
--no-web-focus | Skip the automatic web-content focus retry (useful if the page reacts badly to synthetic clicks). |
Note: --app menubar captures only the menu bar strip; --menubar attempts to find the active popover and OCR its text.
#Automatic web focus fallback (Nov 2025)
Modern browsers sometimes keep keyboard focus in the omnibox, which means embedded login forms (Instagram, Facebook, etc.) never expose their AXTextField nodes to accessibility clients. Starting November 2025:
peekaboo seeperforms a normal accessibility traversal.- If zero text fields are detected, the command locates the dominant
AXWebArea(or equivalent) inside the target window and performs a syntheticAXPress. - The traversal runs one more time. If the web view exposes its inputs after gaining focus, they now appear in the JSON output.
This fallback only runs inside the resolved window (it won’t hop between windows) and logs a debug entry when it fires. If you need to disable it for a specialized flow, run see inside a different window or manually focus the desired element first.
#JSON output primer
When --json is supplied, the CLI prints:
snapshot_id– reference for subsequentclick --snapshot …andtype --snapshot ….ui_map– path to the persisted snapshot file (~/.peekaboo/snapshots/<id>/snapshot.json).ui_elements– flattened list of actionable nodes (buttons, text fields, links, etc.).interactable_count,element_count,capture_mode, and performance metadata for debugging.- Each
ui_elements[n]entry now mirrors the raw AX metadata we capture—title,label,description,role_description,help,identifier, and the keyboard shortcut if one exists. That makes Chrome toolbar icons (which frequently hide their name inAXDescription) searchable without relying on coordinates. - GLM vision model analysis responses are converted from the model's 0-1000 bounding box coordinate space into screenshot pixel coordinates before they are printed, so follow-up
click --coordscalls can use returned box centers directly.
Use jq or any JSON parser to find elements:
peekaboo see --app "Safari" --json \
| jq '.data.ui_elements[] | select(.label | test("Sign in"; "i"))'
# Toolbar buttons that only expose AXDescription:
peekaboo see --app "Google Chrome" --json \
| jq '.data.ui_elements[] | select((.description // "") | test("Wingman"; "i"))'
#Troubleshooting tips
- If the CLI reports blind typing, re-run
seewith--app <Name>so we can autofocus the app before typing. - Missing text fields after the fallback usually means the page is shielding its inputs from AX entirely; in that case rely on the Browser MCP DOM or image-based hit tests.
- For repeatable local tests, run
RUN_LOCAL_TESTS=true swift test --filter SeeCommandPlaygroundTeststo exercise the Playground fixtures mentioned indocs/research/interaction-debugging.md. - Rapid repeated
seecalls for the same window reuse a short-lived AX cache (~1.5s); wait a beat if you need a fully fresh traversal.
#Smart label placement (--annotate)
- The
SmartLabelPlacergenerates external label candidates (above/below/sides/corners) for each element, filters out overlaps/out-of-bounds positions, then scores remaining spots viaAcceleratedTextDetector.scoreRegionForLabelPlacementto prefer calm regions. Internal placements are a last-resort fallback. - Edge-aware scoring samples a padded rectangle (6 px halo, clamped to the image) so the chosen region stays clean once text is drawn; above/below placements get slight bonuses to reduce sideways clutter.
- Preferred orientations nudge horizontally tight elements toward vertical labels when scores tie.
- Tests:
Apps/CLI/Tests/CoreCLITests/SmartLabelPlacerTests.swift(run withswift test --package-path Apps/CLI --filter SmartLabelPlacerTests). - Manual validation:
peekaboo see --app Playground --annotate --path /tmp/see.png --jsonthen inspect the annotated PNG; if labels cover dense UI, capture the repro and adjust padding/scoring before committing.