Agent Browser
Agent Browser: A Detailed Guide to AI-Ready Web Automation
Introduction Agent Browser is a fast, native Rust CLI designed for browser automation tailored to AI agents. It pairs a lean command-line client with a persistent daemon, orchestrating headless browser sessions through the Chrome DevTools Protocol (CDP) or WebDriver backends. Built to work seamlessly in AI workflows, it supports a wide ecosystem of deployment options—from local development to cloud browser providers—while offering strong security features, robust session management, and rich introspection capabilities. The project emphasizes speed, reliability, and deterministic interaction with web pages via refs and semantic locators, making it an invaluable tool for testing, automation, data extraction, and AI-driven decision making.
Overview: What Problem It Solves
- Automate real browser interactions for AI agents without embedding heavy Node.js stacks into your workflows.
- Maintain persistent browser state across runs, enabling seamless login sessions and personalized contexts.
- Provide AI-friendly observability through accessibility trees, structured snapshots, and deterministic element references.
- Support both local and cloud-based browser environments, including premier cloud providers, for scalable experimentation.
Getting Started: How to Install and Begin Agent Browser offers multiple installation paths to fit diverse environments and preferences.
Global Installation (recommended)
- Installs the native Rust binary, enabling a fast, system-wide CLI experience.
- Typical command sequences:
- npm install -g agent-browser agent-browser install
- This prepares a ready-to-run environment with the daemon waiting for commands.
Project Installation (local dependency)
- Pin a specific version inside package.json for project-scoped usage:
- npm install agent-browser agent-browser install
- Use via package.json scripts or invoke agent-browser directly, ensuring reproducible environments.
Other installation options
- macOS Homebrew: brew install agent-browser agent-browser install
- Cargo (Rust): cargo install agent-browser agent-browser install
- From Source: clone, build native parts, link globally, and install
- git clone https://github.com/vercel-labs/agent-browser
- cd agent-browser
- pnpm install
- pnpm build
- pnpm build:native
- pnpm link --global
- agent-browser install
Linux prerequisites
- Use agent-browser install --with-deps to fetch system dependencies along with the Chrome-for-Testing download.
Updating to the latest version
- agent-browser upgrade detects your method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
What You Need to Run It
- Chrome (or compatible Chrome-for-Testing channel) for automation
- Rust toolchain (only needed when building from source)
Quick Start: A Tiny Demo Script A typical first automation workflow demonstrates core concepts like opening a page, taking a snapshot of the accessibility tree, and performing basic interactions:
- agent-browser open example.com
- agent-browser snapshot
- agent-browser click @e2
- agent-browser fill @e3 "test@example.com"
- agent-browser get text @e1
- agent-browser screenshot page.png
- agent-browser close
Traditional selectors and modern, AI-friendly selectors
- Traditional selectors: CSS selectors or text-based selectors
- agent-browser click "#submit"
- agent-browser fill "#email" "test@example.com"
- agent-browser find role button click --name "Submit"
- Semantic Locators (the AI-friendly approach)
- agent-browser find role [value] # By ARIA role
- agent-browser find text # By visible text
- agent-browser find label [value] # By label
- agent-browser find placeholder [value] # By placeholder
- agent-browser find alt # By alt text
- agent-browser find testid [value] # By data-testid
Core Commands at a Glance Agent Browser exposes a rich set of commands organized into practical actions:
Browser control
- open, goto, navigate: Launch or navigate a page
- click, dblclick, focus
- hover
- type, fill, press, keyboard type, inserttext
- scroll, scrollintoview
- drag, upload
- screenshot (with annotate and directory/format options)
- snapshot (best for AI; builds an accessibility snapshot with refs)
Query and information
- get text, get html, get value, get attr, get title, get url, get cdp-url, get count
- is visible, is enabled, is checked
- get box, get styles
Element targeting and selectors
- find role, find text, find label, find placeholder, find alt, find title, find testid, find first, find last, find nth
Batch execution and orchestration
- batch: run multiple commands in a single invocation (with optional --json and --bail)
- wait: various preconditions, timeouts, and state checks
- batch + json mode enables streaming, CI-friendly automation
Session and state management
- tab, tab new, tab new --label, tab
, tab close - window new
- use --session, --session-name, or environment variables for isolated sessions
- profile persistence (persist cookies, localStorage, IndexedDB)
- state save, state load, state list, state rename, state clear
- state encryption using AES-256-GCM with a 64-char hex key
Security and safety features
- content boundaries: wrap page output with markers for LLM safety
- domain allowlist: restrict navigation to trusted domains
- action policy: gate destructive actions via a policy.json
- confirm actions: require explicit confirmation for sensitive operations
- interactive prompts and safety defaults in auto-dialog handling
Initialization and setup
- init scripts to register pre-navigation routines
- React DevTools integration for React apps and universal Web Vitals support
- agent-browser doctor: diagnose installation issues and repair
- doctor --offline / --quick options for offline sanity checks
Authentication and sessions: keeping you signed in Agent Browser offers multiple strategies to persist login sessions:
- Chrome profile reuse: reuse existing login state quickly
- Persistent profiles: full, cross-restart browser state
- Session persistence: auto-save/restore cookies and localStorage by a named session
- Import from your browser: export/import authentication state from an existing Chrome session
- Encrypted state: AES-256-GCM encryption with a user-provided key or an auto-generated one
- State files can be loaded via --state or AGENTBROWSERSTATE
- The auth vault stores credentials locally and can be referenced by name, never exposed to the LLM
Cloud and serverless deployment: extend beyond your laptop Agent Browser embraces a wide ecosystem of deployment options for cloud and serverless contexts:
Cloud providers
- Browserless, Browserbase, Browser Use, Kernel, AgentCore
- Each provider is engaged through a simple -p flag (or AGENTBROWSERPROVIDER environment variable)
- They offer cloud-hosted browser sessions with API keys and service endpoints
- Example usage: export BROWSERLESSAPIKEY="your-api-token" then agent-browser -p browserless open https://example.com
- Optional provider-specific environment variables tune latency, TTL, stealth, and region
Kernel, AgentCore, and enterprise-grade options
- Kernel: stealth mode, persistent profiles, and API-driven cloud sessions
- AgentCore: AWS Bedrock-backed sessions with SigV4 authentication
- These integrations enable scalable automation in production-grade AI workflows
iOS simulator and real-device automation
- iOS simulator on macOS via Appium and XCUITest driver
- Real device support via WebDriverAgent (for Safari on iOS)
- Device management commands support listing devices, launching Safari, capturing screenshots, and more
Mobile and cloud-native testing
- Cloud-backed browsers (Browserless, Browserbase, Browser Use, Kernel, AgentCore) support mobile and desktop contexts
- Native Rust performance helps keep latency predictable in multi-step AI tasks
Streaming and real-time viewing: the streaming viewport
- Each session can stream its viewport over WebSocket for live previews
- The stream endpoint is accessible at a bound port (default), with options to adjust via AGENTBROWSERSTREAM_PORT
- Tools for controlling the streaming session include stream enable, stream disable, and stream status
- The protocol supports frames, input events (mouse, keyboard, touch), and other interactions for pair browsing or AI-assisted debugging
Observability dashboard: monitor in real time
- An optional local dashboard server provides a live viewport and command activity feed
- Runs in the background on port 4848 (configurable)
- Features:
- Live viewport: real-time frames from the browser
- Activity feed: a chronological log of commands, results, and timing
- Console output: browser console logs and errors
- Session management: create and monitor multiple sessions, with support for different engines and cloud providers
- AI chat panel (via Vercel AI Gateway) to interact with AI agents directly in the dashboard
AI integration: chat, React, and Web Vitals
- The dashboard includes an optional AI chat panel powered by the Vercel AI Gateway
- The CLI also supports a chat command for natural-language browser control
- React introspection commands, coupled with Web Vitals metrics, provide deep insight into UI performance
- React DevTools hook (install at launch) enables detailed component trees, props, hooks, state, and source information
- Web Vitals (LCP, CLS, TTFB, FCP, INP) help quantify user-experience factors during automated interactions
Configuration and extensibility: config files, schemas, and defaults
- agent-browser.json provides a persistent defaults file with a clear precedence: 1) user-level defaults (~/.agent-browser/config.json) 2) project-level overrides (./agent-browser.json) 3) environment variables (AGENTBROWSER*) 4) command-line flags override everything
- A JSON Schema is available to enable IDE autocomplete and validation
- You can point the CLI to a specific config with --config or AGENTBROWSERCONFIG
- Unknown keys are ignored for forward compatibility
- The default timeout for most operations is 25 seconds, with a note about IPC timeout implications; you can extend it via AGENTBROWSERDEFAULT_TIMEOUT
- Extensions from user and project configs are merged (not replaced)
Performance, reliability, and safety
- Client-daemon architecture ensures fast subsequent operations by reusing a running daemon
- Idle-shutdown capability: set AGENTBROWSERIDLETIMEOUTMS to automatically close the browser after inactivity
- Content boundaries and a robust action policy ensure that AI-driven automation remains safe and auditable
- In-context ref-based interaction makes AI reasoning more reliable, deterministic, and auditable
Privacy, security, and best practices
- Authentication vault keeps credentials encrypted and local
- Domain allowlists restrict navigation to trusted domains
- Action policy and confirmation prompts prevent unintended destructive actions
- Output length controls help avoid context flooding when interacting with AI models
- When using remote debugging, the remote port must be secured; keep that in mind during development
Developer experience: extensibility and integration with AI tools
- Agent mode outputs are designed for machine consumption (JSON) and are friendly to AI agents
- The snapshot feature provides a structured view of the page’s interactive elements, which AI models can use for planning actions
- Semantic locators and refs streamline interaction in AI-guided workflows, enabling deterministic automation even as the DOM changes
Licensing
- Apache-2.0 license ensures broad usage with permissive terms, making it suitable for both open-source projects and enterprise deployments
Images and visual assets Note: The input text contains no embedded images or image assets. If you’re preparing a blog post and want visual supplements, consider including:
- A diagram of the client-daemon architecture
- A screenshot showing the annotated accessibility snapshot with refs
- A flowchart of a typical AI-driven automation scenario (open → snapshot → interact → snapshot)
- A quick screencap of the Observability Dashboard in action
Why this matters for AI agents
- Deterministic element selection via refs makes AI reasoning more reliable, especially in dynamic pages
- Accessibility snapshots provide a stable representation of the UI for AI interpretation
- Persistent sessions reduce the friction of repeated logins and context switching, enabling longer-running experiments
- Cloud providers and serverless options enable scalable experimentation without sacrificing control or visibility
Best Practices for Getting the Most from Agent Browser
- Start with a minimal session (no login state) and build up with a persistent profile as you stabilize workflows
- Use the snapshot command early in the workflow to identify refs before performing actions
- Prefer semantic locators over brittle selectors when possible to improve resilience to UI changes
- Leverage the batch and JSON modes for automation pipelines and AI-driven orchestration
- Enable the Observability Dashboard during development to observe timing, console output, and session activity in real time
- When integrating with AI agents, consider enabling content boundaries and action-policy controls to minimize risk
Conclusion Agent Browser stands out as a purpose-built tool for AI-driven web automation. It pairs the speed and safety of a native Rust implementation with the flexibility of a multi-provider, cloud-ready ecosystem. By combining deterministic element refs, rich accessibility snapshots, robust session management, and a comprehensive suite of automation primitives, it empowers AI agents to reason about web pages, perform complex tasks, and learn from interactions—without compromising security or reliability. Whether you’re building automated QA, data collection pipelines, or AI copilots that need direct browser manipulation, Agent Browser offers a capable, extensible foundation that scales from a developer’s laptop to enterprise-grade cloud deployments.
Images
- None provided in the input. If you’d like, you can include illustrative visuals such as architecture diagrams, a sample accessibility snapshot with refs, and a screenshot showcase of the Observability Dashboard to accompany this guide.
Notes about images in the input
- The provided input is text-only and does not contain embedded images or image files. If you have image assets you want to include, you can supply them separately, and I can place them into the post with appropriate captions and alt text.
Enjoying this project?
Discover more amazing open-source projects on TechLogHub. We curate the best developer tools and projects.
Repository:https://github.com/vercel-labs/agent-browser
GitHub - vercel-labs/agent-browser: Agent Browser
Agent Browser is a fast native Rust CLI designed for browser automation tailored to AI agents....
github - vercel-labs/agent-browser