Meet Agent Browser: An AI-First CLI for Browser Automation
If you've ever tried to automate browser tasks with AI, you know the drill: cobble together Puppeteer, some LLM prompt engineering, and a prayer that the selector doesn't break. Vercel Labs just dropped something that skips most of that pain.
Agent Browser is an open-source CLI tool that uses an LLM (like GPT-4) to interpret natural language instructions and control a real browser behind the scenes. It's not a wrapper around Selenium or Playwright. It's a purpose-built automation agent that can navigate, click, fill forms, extract data, and even take screenshots all from a single command.
What It Does
You give Agent Browser a task in plain English. Something like:
"Go to Hacker News, find the top 5 posts by points, and save their titles and links to a CSV."
The tool spins up a headless Chromium instance, plans the steps, executes them (clicking, scrolling, waiting), and returns the result. The magic is that it decides how to do it, not just what to do.
Under the hood, it uses an LLM to break down your request into browser actions, then executes them with Playwright. It handles errors, retries, and even explains its reasoning as it goes.
Why It's Cool
No selectors, no scripts. You don't write XPath or CSS selectors. You don't handle pagination or loading states. The model figures that out.
It actually works for real tasks. I tested it on a few scraping jobs that would normally require a custom script. It handled login forms, infinite scroll, and modal popups without failing dramatically.
Open source and CLI-first. This isn't a SaaS product. You install it with npm, and it runs locally. Your API keys, your browser instance, your data.
Transparent execution. It prints every action it's taking. If it clicks the wrong thing, you see exactly where it went off track. That's rare in AI tools.
Use cases:
- Quick data scraping without writing code
- Automating repetitive browser tasks (filling forms, checking dashboards)
- Testing web apps with natural language instructions
- Building demos or prototypes that need real browser interaction
How to Try It
You need Node.js 18+ and an OpenAI API key (or any compatible LLM provider).
# Install globally
npm install -g @agent-browser/cli
# Or run directly
npx @agent-browser/cli
Then run a task:
agent-browser "Go to https://news.ycombinator.com and tell me the top 3 stories"
It'll ask for your API key on first run. You can also set OPENAI_API_KEY as an environment variable.
For more control, check the README on GitHub for custom models, headful mode (to watch the browser), and output formats.
Final Thoughts
I wouldn't replace all your scraping infrastructure with this just yet. It's still early, and you're paying per API call. But for one-off tasks, prototyping, or situations where writing a full script feels like overkill, this is surprisingly useful.
The approach of "describe what you want, get what you asked for" is the direction browser automation should have gone years ago. Agent Browser is a solid first step.
If you're curious, the repo is open source and the code is clean enough to learn from. Definitely worth a star.
Follow @githubprojects for more developer tools and open source finds.
Repository: https://github.com/vercel-labs/agent-browser