Automation

Automation

Automation

Peekaboo's automation surface is small but covers the whole macOS UI graph. Each command is documented separately under commands/; this page is the map.

#Targeting model

Every input command accepts one of three target shapes:

  • Element ID--id E12 (from peekaboo see); the most reliable.
  • Label / role / app — positional query text such as peekaboo click "Send" --app Mail; resolved via the AX tree.
  • Coordinates--coords 480,120; target-relative when paired with --app, --pid, or --window-*, global otherwise. Add --global-coords to force screen coordinates with a target.

Prefer IDs when you can capture them, labels when you can't, and coordinates only as a last resort. The agent and MCP tooling default to the first two.

#Delivery modes

Peekaboo has two input delivery modes:

  • Background (default when a target process is known) posts process-targeted input without activating the app. click, type, press, hotkey, and paste use this mode when you pass --app, --pid, --window-id, or a snapshot with process metadata.
  • Foreground focuses the target first, then sends normal/global input to the active key window or mouse focus. Add --foreground when an app ignores background input, when a text field only accepts key-window input, or when you want focus/Space switching to be part of the action.

Focus flags such as --space-switch, --bring-to-current-space, and --no-auto-focus belong to foreground delivery; using them implies --foreground. Background element/query clicks can complete through Accessibility alone. Keyboard input, coordinate clicks, and synthetic click fallback require Event Synthesizing for the sender shown by peekaboo permissions status; request it with peekaboo permissions request-event-synthesizing.

Examples:

# Background: target Safari without activating it
peekaboo hotkey cmd,l --app Safari
peekaboo type "github.com/openclaw/Peekaboo" --app Safari --return

# Foreground: activate/focus first for apps that require a key window
peekaboo hotkey cmd,l --app Safari --foreground --space-switch
peekaboo type "github.com/openclaw/Peekaboo" --app Safari --return --foreground

#Input primitives

CommandUse it for
clickmouse clicks, double/triple, right/middle, hold
typetyping strings into targeted fields
pressindividual key presses (return, escape, arrows, etc.)
hotkeyshortcut combos, including background apps
scrollwheel scrolling at a point or on a target
dragpress, move, release — files, sliders, selections
swipetrackpad-style multi-finger gestures
movewarp the mouse without clicking
set-valuewrite to text fields without typing
perform-actiontrigger any AX action (AXPress, AXShowMenu, …)
sleepwait between steps with deterministic timing

For UX parity with humans (jitter, easing, dwell), see human-typing.md and human-mouse-move.md.

#Surfaces

SurfaceCommandNotes
App lifecycleapplaunch, quit, focus, hide
Windowswindowmove, resize, focus, minimize, fullscreen
Spaces & Stage Managerspaceenumerate and switch Spaces
Menusmenuwalk app menus by path
Menu bar / status itemsmenubar.mdextra-fiddly popovers
Dialogsdialogsheets, alerts, save panels
Dockdockinspect/click dock items
Clipboardclipboardread/write pasteboard contents
Open files / URLsopenwith focus controls
Visual feedbackvisualizeroverlay so a human can follow what the agent is doing

#Recipe: click a button by label

# 1. Inspect first to find a stable label.
peekaboo see --app Safari --annotate --path safari.png

# 2. Click it.
peekaboo click "Reload" --app Safari

#Recipe: a small flow

peekaboo window focus --app "Notes"
peekaboo hotkey cmd+n
peekaboo type "Standup notes\n\n- Shipped Peekaboo docs\n- Reviewed PR #42\n"
peekaboo hotkey cmd+s

Three primitives, four lines. The agent does the same thing under the hood — it just plans the sequence for you.

#Resilience tips

  • Always run peekaboo see when an element is unreachable. The AX tree refreshes after focus changes; capture again if a click fails.
  • Use focus and application-resolving for tricky cases (multiple windows, helper apps, processes that hide on activation).
  • Wrap risky sequences with peekaboo sleep 0.2 — humans don't fire ten clicks in a single frame, and neither should you.
  • Prefer background delivery for routine app-specific input so automations do not steal focus.
  • Add --foreground only when an app needs a focused key window, Space switch, or foreground mouse event.

#Going further

  • Agent overview — let Peekaboo plan input sequences from a goal.
  • MCP — expose all of the above to Codex, Claude Code, and Cursor.
  • Architecture — how the input pipeline routes through Bridge and Daemon.