Open‑source alternative to Perplexity Comet and director.aiVision + Automation

BrowserPilot: Tell your browser what to do. It actually does it.

An AI-powered, vision-first browser that navigates any website, handles anti-bot walls, solves CAPTCHAs, and delivers data in your format — PDF, CSV, or JSON.

Get Started Watch it Work

PlaywrightGoogle Gemini VisionFastAPIProxy RotationCAPTCHA Solving

See it in action

A quick demo showing navigation, extraction, and export flow.

Open on YouTube

Powerful by design, resilient in practice

From vision-first interaction to anti-bot evasion and universal export formats, every piece is battle-tested for the real web.

It Actually Sees Websites

Vision model understands layouts like a human — resilient to redesigns and anti-bot tricks.

Anti‑Bot Ninja

Detects Cloudflare and rate limits, rotates proxies, solves CAPTCHAs, and adapts on the fly.

Agentic Control

Clicks, types, scrolls, navigates. Like a tireless intern who never makes mistakes.

Recovery & Resilience

Restarts browsers, swaps identities, and continues tasks even when sites fight back.

Any Website

Amazon, LinkedIn, random blogs — even weird edge cases. If humans can do it, so can it.

Live Streaming

Watch the browser session in real time. Step in to click or type if needed.

JSON

Clean, structured output with timestamps and metadata.

CSV

Perfectly shaped tables for analysis and reporting.

PDF

Beautifully rendered documents for sharing or archiving.

Real examples you can run today

Speak in plain English. The agent figures out the rest — reliably.

Just Getting Started

“Go to Hacker News and save the top stories as JSON”

NavigationExtractionJSON

Shopping for Data

“Search Amazon for wireless headphones under $100 and export the results to CSV”

E‑commerceFilteringCSV

Social Media Intel

“Go to LinkedIn, find AI engineers in San Francisco, and save their profiles”

AuthInfinite ScrollProfiles

The Wild West

“Visit this random e‑commerce site and grab all the product prices”

GeneralizationResiliencePricing

The cool technical stuff

Built for resilience and clarity — from dynamic vision to robust proxy rotation.

Smart Format Detection

Say “save as PDF / CSV / JSON” — output is shaped perfectly with metadata.

Anti‑Bot Mode

Spots challenges early, switches proxies, solves CAPTCHAs, and backs off gracefully.

Proxy Management

Health tracking, performance‑based selection, site‑specific block lists.

Universal Extractor

AI‑powered extraction that organizes content with timestamps and structure.

Vision Integration

Understands dynamic layouts, anti‑bot patterns, and interaction targets.

Meaningful Dashboard

Live sessions, proxy performance, and token spend that actually helps.

Built with Open Source + Gemini

Designed for low-cost maintenance: open-source building blocks on the backend with a vision model for robust, layout‑agnostic interaction. Minimal glue, maximum leverage.

Playwright

FastAPI

Google Gemini (Vision)

Python

Uvicorn

Next.js + shadcn/ui

Tailwind CSS

Proxies & Rotations

Open Source Core

Browser automation, backend framework, and UI all rely on permissive OSS libraries.

Vision‑first Control

Gemini looks at pages like a human, so layout changes don't break your flows.

Lower Ops Cost

Fewer brittle selectors, less maintenance. Proxies and anti‑bot handled centrally.

How BrowserPilot compares

A practical look at scraping and automation tradeoffs. The goal: resilient automation with minimal upkeep.

Feature	Traditional scraping	Perplexity Comet
Open Source
Natural-language tasks
Vision-based interaction
Anti-bot detection
Proxy rotation / identity	manual
Scrape protected sites	difficult	limited
Export formats (PDF/CSV/JSON)	manual
Live session streaming
Self-hostable

Notes: Feature mapping is based on our understanding of public information and typical usage. Availability and capabilities may change.

Vision: an open‑source Comet‑style browser focused on scraping

We're building BrowserPilot as the open‑source alternative to Comet‑like browsers — with a special emphasis on reliable scraping. Instead of brittle selectors and manual parsing, describe the task in natural language and let the agent handle the rest.

Natural language first

Say what you want — “scrape prices from this site and export to CSV” — and the agent executes.

Handles protected sites

Anti‑bot detection, CAPTCHAs, and proxy rotation keep tasks running with anonymity.

Data formats that fit

Export as PDF, CSV, or JSON. We add timestamps and metadata automatically.

Vision‑driven robustness

Gemini sees the page like a human, so redesigns break less often than CSS/XPath scrapers.

From task to workflow

Stream the session live, step in when needed, and compose repeatable workflows.

Open source, lower cost

Built on OSS libraries and a pay‑as‑you‑go model for low maintenance burden.

Getting started — it's actually easy

From cloning to running your first task in minutes. Configure proxies if you plan to scrape heavily.

Requirements

python --version  # 3.8 or newer

Clone the project

git clone https://github.com/ai-naymul/AI-Agent-Scraper.git
cd ai-agentic-browser

Install dependencies

curl -LsSf https://astral.sh/uv/install.sh | sh
uv pip install -r requirements.txt

Add your secrets

# .env (gitignored)
GOOGLE_API_KEY=your_actual_api_key_here
SCRAPER_PROXIES=[{"server":"http://proxy1:port","username":"user","password":"pass"}]

Run the server

python -m uvicorn backend.main:app --reload
# Open http://localhost:8000

Proxy configuration (optional)

{
"SCRAPER_PROXIES": [
  { "server": "http://proxy1.example.com:8080", "username": "user1", "password": "pass1", "location": "US" },
  { "server": "http://proxy2.example.com:8080", "username": "user2", "password": "pass2", "location": "EU" }
]
}