Using OmniScout with AI agents
OmniScout is built for AI agents to drive. This page collects everything you need to make Claude Code, Cursor, Codex, Kimi, or any other shell-capable agent productive with OmniScout in under a minute.
0. Install OmniScout
pip install omniscout
1. Install the skill
omniscout install --skill
This copies SKILL.md (the agent-facing instructions) plus
references/operations.md into the well-known agent skill locations:
| Agent | Skill location |
|---|---|
| Claude Code | ~/.claude/skills/scout/SKILL.md |
| Cursor | ~/.cursor/skills-cursor/scout/SKILL.md |
| Codex | ~/.codex/skills/scout/SKILL.md |
| Antigravity | ~/.gemini/config/skills/scout/SKILL.md |
The agent picks it up automatically on next session — no manual config.
2. Start the daemon (once)
omniscout daemon start
It's idempotent — running it again is a no-op. The daemon survives between
agent sessions. Verify with omniscout daemon status.
3. Drop a prompt
These prompts assume the skill is installed. Pick the closest match to your
task; the agent will translate it into the right sequence of omniscout browser ... commands.
Ask an AI to build a knowledge graph
Build a knowledge graph for "Cursor" with OmniScout. Run
`omniscout graph "Cursor" --json --data`, show the tree, and cite sources.
If I give a website URL, use site-only mode (`omniscout graph "cursor.com"` or
`-w cursor.com`).
@omniscout Map "Cursor" as a knowledge graph: company, founders, competitors,
pricing, features. Use `omniscout graph` with JSON output. For a specific
site, pass the URL or `--website`.
Use `omniscout graph "<entity>" --json` to produce a structured knowledge
graph tree. Add `--data` for sources. Use a URL entity or `--website` when
the user points at one site only.
Ask an AI to research a question
Research "state of open-source browser-using AI agents in 2026" using
OmniScout. Run `omniscout research` with at least 8 results, then summarize the
findings in 3-5 bullet points with citations to source URLs.
@omniscout Use OmniScout to research the question: "what's the practical
difference between Browserbase, Browser-Use, and Stagehand?"
Compare them in a table covering pricing model, runtime, element
addressing strategy, and best-fit use case.
You have access to OmniScout via the `omniscout` CLI. Research "FedRAMP-ready
LLM hosting providers" using `omniscout research --results 10 --json`,
extract the top 5 sources, and write a concise report to /tmp/report.md.
You have the `omniscout` CLI available. Use it to research a topic the user
gives you. Use `omniscout research "<topic>" --json` and summarize the
returned `summary` plus the top 3 passages.
Topic: <YOUR TOPIC HERE>
Ask an AI to navigate and click
Use OmniScout to open https://news.ycombinator.com, snapshot the page, and
click the top story link. Then take a screenshot saved to /tmp/hn-top.png
and tell me the page title of the article that loaded.
@omniscout Open the GitHub trending page in OmniScout, snapshot the page, click
the first repo link, and read me the README using `omniscout extract` on
the resulting URL.
Use the OmniScout CLI to:
1. Navigate to https://hn.algolia.com
2. Fill the search box with "browser agents"
3. Click search
4. Take a screenshot to /tmp/results.png
5. Use `omniscout browser snapshot --refs-only` and list the first 5 link
refs you see.
Ask an AI to log in once and reuse
Help me set up a logged-in profile for GitHub via OmniScout. Run
`omniscout browser login https://github.com/login --profile work` to open a
headful window. Wait for me to authenticate in the browser, then I'll
tell you "done". After that, prove the profile works by running
`omniscout browser navigate https://github.com/settings --profile work` and
showing me the resulting page title.
@omniscout I want to scrape my private GitHub starred repos. First create a
profile via `omniscout profile create work`. Then run
`omniscout browser login https://github.com/login --profile work`. Pause
for me to log in. Then navigate to https://github.com/?tab=stars and
snapshot the page. Loop over the @eN refs whose role is "link" and
extract their titles.
Ask an AI to fill a form and submit
Use OmniScout to:
1. Open https://duckduckgo.com
2. Snapshot the page
3. Fill the search textbox with "local first AI agents"
4. Press Enter
5. Wait for the results to load (wait --idle)
6. Snapshot again and list the title of the first 5 results.
@omniscout Open https://google.com/forms/example, fill the visible form
fields in order with these values: name="Test User",
email="test@example.com", message="Hello from OmniScout". Then take a
screenshot and tell me which fields you filled.
Ask an AI to capture network traffic
Use OmniScout to investigate what tracking calls a site makes. Start network
capture, navigate to https://example.com, scroll to the bottom, stop
capture, then list any captured requests whose URL matches
"analytics|google|facebook|stripe". For the matching ones, run `network
detail` and tell me the response status code.
Profile the network behavior of https://vercel.com/pricing using OmniScout:
- `omniscout browser network start`
- `omniscout browser navigate https://vercel.com/pricing`
- `omniscout browser scroll down --amount 10`
- `omniscout browser network stop`
- `omniscout browser network list --filter "stripe|payment"`
Return the result as JSON.
Ask an AI to handle a CAPTCHA
Open https://site-with-captcha.example using OmniScout. If you detect a
CAPTCHA, run `omniscout browser captcha --detect-only` and tell me the
type. Then run `omniscout browser captcha` (no solver flag) so the tab
flips headful and pauses. I'll solve it manually; you continue once
detection comes back clean.
Open https://site-with-captcha.example using OmniScout. If there's a CAPTCHA,
solve it via 2Captcha — `TWOCAPTCHA_API_KEY` is set in my env. Run
`omniscout browser captcha --solver 2captcha` and continue once it returns
solved=true.
--solver none is the only mode that's truly local-first. The
third-party solvers (2captcha, capsolver) send the sitekey + page URL
to those services, which costs money and leaves your machine. Opt in
explicitly.4. Multi-step agent loop (the killer use case)
Most useful agent loops combine all of the above in a single task. Here's a complete worked example you can hand to Claude Code verbatim:
You have OmniScout available via the `omniscout` CLI. Carry out this task end
to end:
1. Start the daemon if it isn't running.
2. Create a profile "research" if it doesn't exist.
3. Use OmniScout to research "vector databases benchmark 2026" with at least
10 results. Save the report JSON to /tmp/research.json.
4. From the report, pick the 3 source URLs with the highest passage
scores. For each one:
a. Open it in OmniScout (profile=research).
b. Take a screenshot to /tmp/<i>.png.
c. Use `omniscout extract <url>` to get the clean Markdown.
d. Save the markdown to /tmp/<i>.md.
5. Tell me the 3 source URLs and the path to each screenshot/markdown
pair. Then summarize the *common* themes across the 3 articles in 5
bullet points, citing which article each theme came from.
6. Close all OmniScout sessions when done.
The agent will run something like:
omniscout daemon status
omniscout profile create research
OMNISCOUT_JSON=1 omniscout research "vector databases benchmark 2026" --results 10 > /tmp/research.json
URLS=$(jq -r '.sources[] | .url' /tmp/research.json | head -3)
i=1
for url in $URLS; do
omniscout browser navigate "$url" --profile research --session research
omniscout browser screenshot --session research --out /tmp/$i.png
omniscout extract "$url" > /tmp/$i.md
i=$((i+1))
done
omniscout browser close --all
5. JSON contract for agents
Every command supports --json (or env OMNISCOUT_JSON=1). Output is
deterministic, with logs separated to stderr. Agents should:
- Set
OMNISCOUT_JSON=1once at the start of a session. - Pipe stdout through
jqfor structured access. - Treat any response with
ok: falseas recoverable; useerror_kindto decide whether to retry, re-snapshot, or surface to the user.
OMNISCOUT_JSON=1 omniscout browser snapshot 2>/dev/null | jq '.refs[] | select(.role == "button")'
Two fields you should always pay attention to:
| Field | Where | What it means |
|---|---|---|
snapshot_generation | every response's data | Monotonic per session. If it changed since your last snapshot, re-snapshot before re-using @eN refs. |
action_id | every response (top level) | Stable hex ID for this exact invocation. Use it to call omniscout daemon trace <action_id> or omniscout daemon replay <action_id>. |
6. Trace, replay, watch (the agent debugging loop)
# What did OmniScout just do?
omniscout daemon trace -n 5 --session demo
# Re-run a single call by ID (skips interactive verbs like login).
omniscout daemon replay 8f3a7c9e1b2d4e5f
# Re-run every replayable action for a session in the last minute.
omniscout daemon replay --session demo --since 60
# Tail live events (use --json-lines for machine-readable output).
omniscout daemon watch
When an agent fails halfway through a task, an agentic prompt like
"replay the last 30 seconds against session demo" tends to recover
state faster than asking the agent to remember each step itself.
7. Stable error_kind values agents can branch on
error_kind | Meaning | Recommended action |
|---|---|---|
timeout | Operation exceeded its budget | Retry with --timeout-ms larger, or wait --idle first |
no_such_session | Session doesn't exist | Re-navigate to recreate it |
no_such_ref | @eN ref expired or page changed | Re-run snapshot and use the new refs |
backend_unavailable | Extension backend not connected | Drop --backend extension to fall back to Playwright |
invalid_args | Bad CLI arguments | Surface to user; don't retry blindly |
internal | Unhandled daemon error | Check omniscout daemon logs -n 100 |
requires_user | CAPTCHA / login needs human | Tell the user what to do and pause |
unsupported | Backend can't do this verb | Switch to Playwright (e.g. pdf, upload) |
8. Skill template for any new agent
If you're integrating OmniScout into an agent without a skill mechanism, paste this into the agent's system prompt:
You have access to OmniScout via the `omniscout` CLI. OmniScout drives a browser
locally — it can navigate, click, fill, scroll, take screenshots, capture
network traffic, detect CAPTCHAs, and run multi-step research pipelines.
Always:
- Set OMNISCOUT_JSON=1 for structured output.
- Run `omniscout browser snapshot --refs-only` to find elements; use the
returned @eN refs for click/fill rather than guessing CSS selectors.
- After EVERY response, check data.snapshot_generation. If it differs
from the value your last `snapshot` returned, re-snapshot before re-
using cached @eN refs.
- Save the top-level `action_id` from each response — you can replay or
trace by ID later if something goes sideways.
- Close sessions with `omniscout browser close --all` at the end of a task.
- Screenshots go to disk; read them via your file-read tool, never embed
base64 in your reply.
Available verbs:
navigate, snapshot, click, fill, scroll, key, hover, upload, screenshot,
pdf, eval, wait, tab list|close|switch, network start|stop|list|detail,
login, captcha, close.
Diagnostics: omniscout daemon trace, replay, watch, status, logs.
Run `omniscout <verb> --help` if unsure of arguments.