Open‑source alternative to Perplexity Comet and director.aiVision + Automation

BrowserPilot: Tell your browser what to do. It actually does it.

An AI-powered, vision-first browser that navigates any website, handles anti-bot walls, solves CAPTCHAs, and delivers data in your format — PDF, CSV, or JSON.

PlaywrightGoogle Gemini VisionFastAPIProxy RotationCAPTCHA Solving

See it in action

A quick demo showing navigation, extraction, and export flow.

Powerful by design, resilient in practice

From vision-first interaction to anti-bot evasion and universal export formats, every piece is battle-tested for the real web.

It Actually Sees Websites

Vision model understands layouts like a human — resilient to redesigns and anti-bot tricks.

Anti‑Bot Ninja

Detects Cloudflare and rate limits, rotates proxies, solves CAPTCHAs, and adapts on the fly.

Agentic Control

Clicks, types, scrolls, navigates. Like a tireless intern who never makes mistakes.

Recovery & Resilience

Restarts browsers, swaps identities, and continues tasks even when sites fight back.

Any Website

Amazon, LinkedIn, random blogs — even weird edge cases. If humans can do it, so can it.

Live Streaming

Watch the browser session in real time. Step in to click or type if needed.

JSON

Clean, structured output with timestamps and metadata.

CSV

Perfectly shaped tables for analysis and reporting.

PDF

Beautifully rendered documents for sharing or archiving.

Real examples you can run today

Speak in plain English. The agent figures out the rest — reliably.

Just Getting Started
Go to Hacker News and save the top stories as JSON
NavigationExtractionJSON
Shopping for Data
Search Amazon for wireless headphones under $100 and export the results to CSV
E‑commerceFilteringCSV
Social Media Intel
Go to LinkedIn, find AI engineers in San Francisco, and save their profiles
AuthInfinite ScrollProfiles
The Wild West
Visit this random e‑commerce site and grab all the product prices
GeneralizationResiliencePricing

The cool technical stuff

Built for resilience and clarity — from dynamic vision to robust proxy rotation.

Smart Format Detection

Say “save as PDF / CSV / JSON” — output is shaped perfectly with metadata.

Anti‑Bot Mode

Spots challenges early, switches proxies, solves CAPTCHAs, and backs off gracefully.

Proxy Management

Health tracking, performance‑based selection, site‑specific block lists.

Universal Extractor

AI‑powered extraction that organizes content with timestamps and structure.

Vision Integration

Understands dynamic layouts, anti‑bot patterns, and interaction targets.

Meaningful Dashboard

Live sessions, proxy performance, and token spend that actually helps.

Built with Open Source + Gemini

Designed for low-cost maintenance: open-source building blocks on the backend with a vision model for robust, layout‑agnostic interaction. Minimal glue, maximum leverage.

Playwright
FastAPI
Google Gemini (Vision)
Python
Uvicorn
Next.js + shadcn/ui
Tailwind CSS
Proxies & Rotations
Open Source Core

Browser automation, backend framework, and UI all rely on permissive OSS libraries.

Vision‑first Control

Gemini looks at pages like a human, so layout changes don't break your flows.

Lower Ops Cost

Fewer brittle selectors, less maintenance. Proxies and anti‑bot handled centrally.

How BrowserPilot compares

A practical look at scraping and automation tradeoffs. The goal: resilient automation with minimal upkeep.

FeatureTraditional scrapingdirector.aiPerplexity CometBrowserPilot
Open Source
Natural-language tasks
Vision-based interaction
Anti-bot detection
Proxy rotation / identity
manual
Scrape protected sites
difficult
limited
Export formats (PDF/CSV/JSON)
manual
Live session streaming
Self-hostable

Notes: Feature mapping is based on our understanding of public information and typical usage. Availability and capabilities may change.

Vision: an open‑source Comet‑style browser focused on scraping

We're building BrowserPilot as the open‑source alternative to Comet‑like browsers — with a special emphasis on reliable scraping. Instead of brittle selectors and manual parsing, describe the task in natural language and let the agent handle the rest.

Natural language first

Say what you want — “scrape prices from this site and export to CSV” — and the agent executes.

Handles protected sites

Anti‑bot detection, CAPTCHAs, and proxy rotation keep tasks running with anonymity.

Data formats that fit

Export as PDF, CSV, or JSON. We add timestamps and metadata automatically.

Vision‑driven robustness

Gemini sees the page like a human, so redesigns break less often than CSS/XPath scrapers.

From task to workflow

Stream the session live, step in when needed, and compose repeatable workflows.

Open source, lower cost

Built on OSS libraries and a pay‑as‑you‑go model for low maintenance burden.

Getting started — it's actually easy

From cloning to running your first task in minutes. Configure proxies if you plan to scrape heavily.

Requirements
python --version  # 3.8 or newer
Clone the project
git clone https://github.com/ai-naymul/AI-Agent-Scraper.git
cd ai-agentic-browser
Install dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
uv pip install -r requirements.txt
Add your secrets
# .env (gitignored)
GOOGLE_API_KEY=your_actual_api_key_here
SCRAPER_PROXIES=[{"server":"http://proxy1:port","username":"user","password":"pass"}]
Run the server
python -m uvicorn backend.main:app --reload
# Open http://localhost:8000
Proxy configuration (optional)
{
"SCRAPER_PROXIES": [
  { "server": "http://proxy1.example.com:8080", "username": "user1", "password": "pass1", "location": "US" },
  { "server": "http://proxy2.example.com:8080", "username": "user2", "password": "pass2", "location": "EU" }
]
}