Browser Harness: Let LLMs Navigate and Complete Any Web Task
You've probably tried asking an LLM to "go book a flight" or "fill out that form." The response is usually a step-by-step guide you have to follow yourself. It's frustrating because you wanted the AI to do the thing, not tell you how.
That's where Browser Harness comes in. It's a single tool that gives an LLM the ability to actually control a browser, click buttons, type into fields, and complete tasks from start to finish. It's like giving your AI a pair of hands for the web.
What It Does
Browser Harness is exactly what it sounds like: a harness that connects an LLM to a real browser instance. The AI can see what's on the page (via screenshots or DOM snapshots), decide what to do next, and then execute actions like clicking, scrolling, or typing. The result is an autonomous agent that can navigate complex multi-step workflows.
It's built on top of Playwright, so you get reliable browser automation without the fluff.
Why It's Cool
Two things stand out about this project:
-
It works with any LLM. You're not locked into a specific model. You can use GPT-4, Claude, Llama, or whatever you're already running. The harness handles the browser control part, and your LLM handles the reasoning.
-
It's task aware. Instead of just giving an LLM a screenshot and hoping it figures out your intent, Browser Harness lets you define the task upfront. The AI keeps the goal in mind, tracks its progress, and adapts when it hits roadblocks like CAPTCHAs or unexpected popups.
Developers can use this to automate QA testing, scrape data from dynamic sites, or build personal assistants that actually do things online.
How to Try It
Getting started is straightforward:
- Clone the repo:
git clone https://github.com/browser-use/browser-harness - Install dependencies:
cd browser-harness && npm install(or your preferred package manager) - Set your LLM API key in a
.envfile - Run a task:
node index.js --task "Find the cheapest flights from NYC to London next Friday"
There's also a command line tool and a basic API if you want to integrate it into your own projects. Check the README for more examples.
Final Thoughts
Browser Harness solves a real pain point. Instead of building custom browser automations for every new task, you can just describe what you want in natural language and let the LLM figure out the steps. It's not magic, it's just clever engineering.
For developers, this means less time writing brittle selectors and more time focusing on the actual logic. If you've been wanting to experiment with AI agents without reinventing the wheel, this is a solid place to start.
Follow @githubprojects for more open source tools like this.
Repository: https://github.com/browser-use/browser-harness