OmniScout
CLI

Using OmniScout with AI agents

Drop-in prompts and skill files for Claude Code, Cursor, Codex, and Kimi to drive OmniScout.

OmniScout is built for AI agents to drive. This page collects everything you need to make Claude Code, Cursor, Codex, Kimi, or any other shell-capable agent productive with OmniScout in under a minute.

0. Install OmniScout

pip install omniscout

1. Install the skill

omniscout install --skill

This copies SKILL.md (the agent-facing instructions) plus references/operations.md into the well-known agent skill locations:

AgentSkill location
Claude Code~/.claude/skills/scout/SKILL.md
Cursor~/.cursor/skills-cursor/scout/SKILL.md
Codex~/.codex/skills/scout/SKILL.md
Antigravity~/.gemini/config/skills/scout/SKILL.md

The agent picks it up automatically on next session — no manual config.

You can also just paste the contents of docs/agent-skill.md into any agent's system prompt if it doesn't have a skill mechanism.

2. Start the daemon (once)

omniscout daemon start

It's idempotent — running it again is a no-op. The daemon survives between agent sessions. Verify with omniscout daemon status.

3. Drop a prompt

These prompts assume the skill is installed. Pick the closest match to your task; the agent will translate it into the right sequence of omniscout browser ... commands.

Ask an AI to build a knowledge graph

Build a knowledge graph for "Cursor" with OmniScout. Run
`omniscout graph "Cursor" --json --data`, show the tree, and cite sources.
If I give a website URL, use site-only mode (`omniscout graph "cursor.com"` or
`-w cursor.com`).

Ask an AI to research a question

Research "state of open-source browser-using AI agents in 2026" using
OmniScout. Run `omniscout research` with at least 8 results, then summarize the
findings in 3-5 bullet points with citations to source URLs.

Ask an AI to navigate and click

Use OmniScout to open https://news.ycombinator.com, snapshot the page, and
click the top story link. Then take a screenshot saved to /tmp/hn-top.png
and tell me the page title of the article that loaded.

Ask an AI to log in once and reuse

Help me set up a logged-in profile for GitHub via OmniScout. Run
`omniscout browser login https://github.com/login --profile work` to open a
headful window. Wait for me to authenticate in the browser, then I'll
tell you "done". After that, prove the profile works by running
`omniscout browser navigate https://github.com/settings --profile work` and
showing me the resulting page title.

Ask an AI to fill a form and submit

Use OmniScout to:
1. Open https://duckduckgo.com
2. Snapshot the page
3. Fill the search textbox with "local first AI agents"
4. Press Enter
5. Wait for the results to load (wait --idle)
6. Snapshot again and list the title of the first 5 results.

Ask an AI to capture network traffic

Use OmniScout to investigate what tracking calls a site makes. Start network
capture, navigate to https://example.com, scroll to the bottom, stop
capture, then list any captured requests whose URL matches
"analytics|google|facebook|stripe". For the matching ones, run `network
detail` and tell me the response status code.

Ask an AI to handle a CAPTCHA

Open https://site-with-captcha.example using OmniScout. If you detect a
CAPTCHA, run `omniscout browser captcha --detect-only` and tell me the
type. Then run `omniscout browser captcha` (no solver flag) so the tab
flips headful and pauses. I'll solve it manually; you continue once
detection comes back clean.
The default --solver none is the only mode that's truly local-first. The third-party solvers (2captcha, capsolver) send the sitekey + page URL to those services, which costs money and leaves your machine. Opt in explicitly.

4. Multi-step agent loop (the killer use case)

Most useful agent loops combine all of the above in a single task. Here's a complete worked example you can hand to Claude Code verbatim:

You have OmniScout available via the `omniscout` CLI. Carry out this task end
to end:

1. Start the daemon if it isn't running.
2. Create a profile "research" if it doesn't exist.
3. Use OmniScout to research "vector databases benchmark 2026" with at least
   10 results. Save the report JSON to /tmp/research.json.
4. From the report, pick the 3 source URLs with the highest passage
   scores. For each one:
     a. Open it in OmniScout (profile=research).
     b. Take a screenshot to /tmp/<i>.png.
     c. Use `omniscout extract <url>` to get the clean Markdown.
     d. Save the markdown to /tmp/<i>.md.
5. Tell me the 3 source URLs and the path to each screenshot/markdown
   pair. Then summarize the *common* themes across the 3 articles in 5
   bullet points, citing which article each theme came from.
6. Close all OmniScout sessions when done.

The agent will run something like:

omniscout daemon status
omniscout profile create research
OMNISCOUT_JSON=1 omniscout research "vector databases benchmark 2026" --results 10 > /tmp/research.json
URLS=$(jq -r '.sources[] | .url' /tmp/research.json | head -3)
i=1
for url in $URLS; do
  omniscout browser navigate "$url" --profile research --session research
  omniscout browser screenshot --session research --out /tmp/$i.png
  omniscout extract "$url" > /tmp/$i.md
  i=$((i+1))
done
omniscout browser close --all

5. JSON contract for agents

Every command supports --json (or env OMNISCOUT_JSON=1). Output is deterministic, with logs separated to stderr. Agents should:

  1. Set OMNISCOUT_JSON=1 once at the start of a session.
  2. Pipe stdout through jq for structured access.
  3. Treat any response with ok: false as recoverable; use error_kind to decide whether to retry, re-snapshot, or surface to the user.
OMNISCOUT_JSON=1 omniscout browser snapshot 2>/dev/null | jq '.refs[] | select(.role == "button")'

Two fields you should always pay attention to:

FieldWhereWhat it means
snapshot_generationevery response's dataMonotonic per session. If it changed since your last snapshot, re-snapshot before re-using @eN refs.
action_idevery response (top level)Stable hex ID for this exact invocation. Use it to call omniscout daemon trace <action_id> or omniscout daemon replay <action_id>.

6. Trace, replay, watch (the agent debugging loop)

# What did OmniScout just do?
omniscout daemon trace -n 5 --session demo

# Re-run a single call by ID (skips interactive verbs like login).
omniscout daemon replay 8f3a7c9e1b2d4e5f

# Re-run every replayable action for a session in the last minute.
omniscout daemon replay --session demo --since 60

# Tail live events (use --json-lines for machine-readable output).
omniscout daemon watch

When an agent fails halfway through a task, an agentic prompt like "replay the last 30 seconds against session demo" tends to recover state faster than asking the agent to remember each step itself.

7. Stable error_kind values agents can branch on

error_kindMeaningRecommended action
timeoutOperation exceeded its budgetRetry with --timeout-ms larger, or wait --idle first
no_such_sessionSession doesn't existRe-navigate to recreate it
no_such_ref@eN ref expired or page changedRe-run snapshot and use the new refs
backend_unavailableExtension backend not connectedDrop --backend extension to fall back to Playwright
invalid_argsBad CLI argumentsSurface to user; don't retry blindly
internalUnhandled daemon errorCheck omniscout daemon logs -n 100
requires_userCAPTCHA / login needs humanTell the user what to do and pause
unsupportedBackend can't do this verbSwitch to Playwright (e.g. pdf, upload)

8. Skill template for any new agent

If you're integrating OmniScout into an agent without a skill mechanism, paste this into the agent's system prompt:

You have access to OmniScout via the `omniscout` CLI. OmniScout drives a browser
locally — it can navigate, click, fill, scroll, take screenshots, capture
network traffic, detect CAPTCHAs, and run multi-step research pipelines.

Always:
- Set OMNISCOUT_JSON=1 for structured output.
- Run `omniscout browser snapshot --refs-only` to find elements; use the
  returned @eN refs for click/fill rather than guessing CSS selectors.
- After EVERY response, check data.snapshot_generation. If it differs
  from the value your last `snapshot` returned, re-snapshot before re-
  using cached @eN refs.
- Save the top-level `action_id` from each response — you can replay or
  trace by ID later if something goes sideways.
- Close sessions with `omniscout browser close --all` at the end of a task.
- Screenshots go to disk; read them via your file-read tool, never embed
  base64 in your reply.

Available verbs:
navigate, snapshot, click, fill, scroll, key, hover, upload, screenshot,
pdf, eval, wait, tab list|close|switch, network start|stop|list|detail,
login, captcha, close.

Diagnostics: omniscout daemon trace, replay, watch, status, logs.

Run `omniscout <verb> --help` if unsure of arguments.

More

Copyright © 2026