Agent Browser

Agent Browser: A Detailed Guide to AI-Ready Web Automation

Introduction Agent Browser is a fast, native Rust CLI designed for browser automation tailored to AI agents. It pairs a lean command-line client with a persistent daemon, orchestrating headless browser sessions through the Chrome DevTools Protocol (CDP) or WebDriver backends. Built to work seamlessly in AI workflows, it supports a wide ecosystem of deployment options—from local development to cloud browser providers—while offering strong security features, robust session management, and rich introspection capabilities. The project emphasizes speed, reliability, and deterministic interaction with web pages via refs and semantic locators, making it an invaluable tool for testing, automation, data extraction, and AI-driven decision making.

Overview: What Problem It Solves

Automate real browser interactions for AI agents without embedding heavy Node.js stacks into your workflows.
Maintain persistent browser state across runs, enabling seamless login sessions and personalized contexts.
Provide AI-friendly observability through accessibility trees, structured snapshots, and deterministic element references.
Support both local and cloud-based browser environments, including premier cloud providers, for scalable experimentation.

Getting Started: How to Install and Begin Agent Browser offers multiple installation paths to fit diverse environments and preferences.

Global Installation (recommended)

Installs the native Rust binary, enabling a fast, system-wide CLI experience.
Typical command sequences:
npm install -g agent-browser agent-browser install
This prepares a ready-to-run environment with the daemon waiting for commands.

Project Installation (local dependency)

Pin a specific version inside package.json for project-scoped usage:
npm install agent-browser agent-browser install
Use via package.json scripts or invoke agent-browser directly, ensuring reproducible environments.

Other installation options

macOS Homebrew: brew install agent-browser agent-browser install
Cargo (Rust): cargo install agent-browser agent-browser install
From Source: clone, build native parts, link globally, and install
git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native
pnpm link --global
agent-browser install

Linux prerequisites

Use agent-browser install --with-deps to fetch system dependencies along with the Chrome-for-Testing download.

Updating to the latest version

agent-browser upgrade detects your method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.

What You Need to Run It

Chrome (or compatible Chrome-for-Testing channel) for automation
Rust toolchain (only needed when building from source)

Quick Start: A Tiny Demo Script A typical first automation workflow demonstrates core concepts like opening a page, taking a snapshot of the accessibility tree, and performing basic interactions:

agent-browser open example.com
agent-browser snapshot
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser get text @e1
agent-browser screenshot page.png
agent-browser close

Traditional selectors and modern, AI-friendly selectors

Traditional selectors: CSS selectors or text-based selectors
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"
Semantic Locators (the AI-friendly approach)
agent-browser find role [value] # By ARIA role
agent-browser find text # By visible text
agent-browser find label [value] # By label
agent-browser find placeholder [value] # By placeholder
agent-browser find alt # By alt text
agent-browser find testid [value] # By data-testid

Core Commands at a Glance Agent Browser exposes a rich set of commands organized into practical actions:

Browser control

open, goto, navigate: Launch or navigate a page
click, dblclick, focus
hover
type, fill, press, keyboard type, inserttext
scroll, scrollintoview
drag, upload
screenshot (with annotate and directory/format options)
pdf
snapshot (best for AI; builds an accessibility snapshot with refs)

Query and information

get text, get html, get value, get attr, get title, get url, get cdp-url, get count
is visible, is enabled, is checked
get box, get styles

Element targeting and selectors

find role, find text, find label, find placeholder, find alt, find title, find testid, find first, find last, find nth

Batch execution and orchestration

batch: run multiple commands in a single invocation (with optional --json and --bail)
wait: various preconditions, timeouts, and state checks
batch + json mode enables streaming, CI-friendly automation

Session and state management

tab, tab new, tab new --label, tab , tab close
window new
use --session, --session-name, or environment variables for isolated sessions
profile persistence (persist cookies, localStorage, IndexedDB)
state save, state load, state list, state rename, state clear
state encryption using AES-256-GCM with a 64-char hex key

Security and safety features

content boundaries: wrap page output with markers for LLM safety
domain allowlist: restrict navigation to trusted domains
action policy: gate destructive actions via a policy.json
confirm actions: require explicit confirmation for sensitive operations
interactive prompts and safety defaults in auto-dialog handling

Initialization and setup

init scripts to register pre-navigation routines
React DevTools integration for React apps and universal Web Vitals support
agent-browser doctor: diagnose installation issues and repair
doctor --offline / --quick options for offline sanity checks

Authentication and sessions: keeping you signed in Agent Browser offers multiple strategies to persist login sessions:

Chrome profile reuse: reuse existing login state quickly
Persistent profiles: full, cross-restart browser state
Session persistence: auto-save/restore cookies and localStorage by a named session
Import from your browser: export/import authentication state from an existing Chrome session
Encrypted state: AES-256-GCM encryption with a user-provided key or an auto-generated one
State files can be loaded via --state or AGENTBROWSERSTATE
The auth vault stores credentials locally and can be referenced by name, never exposed to the LLM

Cloud and serverless deployment: extend beyond your laptop Agent Browser embraces a wide ecosystem of deployment options for cloud and serverless contexts:

Cloud providers

Browserless, Browserbase, Browser Use, Kernel, AgentCore
Each provider is engaged through a simple -p flag (or AGENTBROWSERPROVIDER environment variable)
They offer cloud-hosted browser sessions with API keys and service endpoints
Example usage: export BROWSERLESSAPIKEY="your-api-token" then agent-browser -p browserless open https://example.com
Optional provider-specific environment variables tune latency, TTL, stealth, and region

Kernel, AgentCore, and enterprise-grade options

Kernel: stealth mode, persistent profiles, and API-driven cloud sessions
AgentCore: AWS Bedrock-backed sessions with SigV4 authentication
These integrations enable scalable automation in production-grade AI workflows

iOS simulator and real-device automation

iOS simulator on macOS via Appium and XCUITest driver
Real device support via WebDriverAgent (for Safari on iOS)
Device management commands support listing devices, launching Safari, capturing screenshots, and more

Mobile and cloud-native testing

Cloud-backed browsers (Browserless, Browserbase, Browser Use, Kernel, AgentCore) support mobile and desktop contexts
Native Rust performance helps keep latency predictable in multi-step AI tasks

Streaming and real-time viewing: the streaming viewport

Each session can stream its viewport over WebSocket for live previews
The stream endpoint is accessible at a bound port (default), with options to adjust via AGENTBROWSERSTREAM_PORT
Tools for controlling the streaming session include stream enable, stream disable, and stream status
The protocol supports frames, input events (mouse, keyboard, touch), and other interactions for pair browsing or AI-assisted debugging

Observability dashboard: monitor in real time

An optional local dashboard server provides a live viewport and command activity feed
Runs in the background on port 4848 (configurable)
Features:
Live viewport: real-time frames from the browser
Activity feed: a chronological log of commands, results, and timing
Console output: browser console logs and errors
Session management: create and monitor multiple sessions, with support for different engines and cloud providers
AI chat panel (via Vercel AI Gateway) to interact with AI agents directly in the dashboard

AI integration: chat, React, and Web Vitals

The dashboard includes an optional AI chat panel powered by the Vercel AI Gateway
The CLI also supports a chat command for natural-language browser control
React introspection commands, coupled with Web Vitals metrics, provide deep insight into UI performance
React DevTools hook (install at launch) enables detailed component trees, props, hooks, state, and source information
Web Vitals (LCP, CLS, TTFB, FCP, INP) help quantify user-experience factors during automated interactions

Configuration and extensibility: config files, schemas, and defaults

agent-browser.json provides a persistent defaults file with a clear precedence: 1) user-level defaults (~/.agent-browser/config.json) 2) project-level overrides (./agent-browser.json) 3) environment variables (AGENTBROWSER*) 4) command-line flags override everything
A JSON Schema is available to enable IDE autocomplete and validation
You can point the CLI to a specific config with --config or AGENTBROWSERCONFIG
Unknown keys are ignored for forward compatibility
The default timeout for most operations is 25 seconds, with a note about IPC timeout implications; you can extend it via AGENTBROWSERDEFAULT_TIMEOUT
Extensions from user and project configs are merged (not replaced)

Performance, reliability, and safety

Client-daemon architecture ensures fast subsequent operations by reusing a running daemon
Idle-shutdown capability: set AGENTBROWSERIDLETIMEOUTMS to automatically close the browser after inactivity
Content boundaries and a robust action policy ensure that AI-driven automation remains safe and auditable
In-context ref-based interaction makes AI reasoning more reliable, deterministic, and auditable

Privacy, security, and best practices

Authentication vault keeps credentials encrypted and local
Domain allowlists restrict navigation to trusted domains
Action policy and confirmation prompts prevent unintended destructive actions
Output length controls help avoid context flooding when interacting with AI models
When using remote debugging, the remote port must be secured; keep that in mind during development

Developer experience: extensibility and integration with AI tools

Agent mode outputs are designed for machine consumption (JSON) and are friendly to AI agents
The snapshot feature provides a structured view of the page’s interactive elements, which AI models can use for planning actions
Semantic locators and refs streamline interaction in AI-guided workflows, enabling deterministic automation even as the DOM changes

Licensing

Apache-2.0 license ensures broad usage with permissive terms, making it suitable for both open-source projects and enterprise deployments

Images and visual assets Note: The input text contains no embedded images or image assets. If you’re preparing a blog post and want visual supplements, consider including:

A diagram of the client-daemon architecture
A screenshot showing the annotated accessibility snapshot with refs
A flowchart of a typical AI-driven automation scenario (open → snapshot → interact → snapshot)
A quick screencap of the Observability Dashboard in action

Why this matters for AI agents

Deterministic element selection via refs makes AI reasoning more reliable, especially in dynamic pages
Accessibility snapshots provide a stable representation of the UI for AI interpretation
Persistent sessions reduce the friction of repeated logins and context switching, enabling longer-running experiments
Cloud providers and serverless options enable scalable experimentation without sacrificing control or visibility

Best Practices for Getting the Most from Agent Browser

Start with a minimal session (no login state) and build up with a persistent profile as you stabilize workflows
Use the snapshot command early in the workflow to identify refs before performing actions
Prefer semantic locators over brittle selectors when possible to improve resilience to UI changes
Leverage the batch and JSON modes for automation pipelines and AI-driven orchestration
Enable the Observability Dashboard during development to observe timing, console output, and session activity in real time
When integrating with AI agents, consider enabling content boundaries and action-policy controls to minimize risk

Conclusion Agent Browser stands out as a purpose-built tool for AI-driven web automation. It pairs the speed and safety of a native Rust implementation with the flexibility of a multi-provider, cloud-ready ecosystem. By combining deterministic element refs, rich accessibility snapshots, robust session management, and a comprehensive suite of automation primitives, it empowers AI agents to reason about web pages, perform complex tasks, and learn from interactions—without compromising security or reliability. Whether you’re building automated QA, data collection pipelines, or AI copilots that need direct browser manipulation, Agent Browser offers a capable, extensible foundation that scales from a developer’s laptop to enterprise-grade cloud deployments.

Images

None provided in the input. If you’d like, you can include illustrative visuals such as architecture diagrams, a sample accessibility snapshot with refs, and a screenshot showcase of the Observability Dashboard to accompany this guide.

Notes about images in the input

The provided input is text-only and does not contain embedded images or image files. If you have image assets you want to include, you can supply them separately, and I can place them into the post with appropriate captions and alt text.

Agent Browser

Enjoying this project?

GitHub - vercel-labs/agent-browser: Agent Browser

Stay Updated

Product

Learn

Company

Legal

Stay Updated

Browse by Category