Web Voyager is an AI agent that autonomously navigates and interacts with web pages using the Playwright browser. It uses GPT-4o (customizable) to interpret web content and make decisions on how to interact with the page to accomplish user-defined tasks.
This is a version where it is included as a chat window inside the browser, to be used as a personal assistant while browsing.
- Autonomous web navigation and interaction
- Visual element recognition and labeling
- Task-oriented decision making
- HTML report generation with screenshots
- Python 3.7+
- Playwright
- LangChain
- OpenAI API key
-
Set up environment variables:
LANGCHAIN_API_KEY
OPENAI_API_KEY
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<your-api-key>
-
Start the script
browse.py
. Playwright browser will open with a chat window where you can ask for assistance while browsing the web. -
To custimiza the profile picture, replace the "me.jpeg" image.
-
View the generated
web_voyager_results.html
for a step-by-step breakdown of the agent's actions and screenshots.
- The agent takes a screenshot of the current page
- It annotates interactive elements with bounding boxes
- GPT-4 analyzes the page content and decides on the next action
- The agent performs the action (click, type, scroll, etc.)
- This process repeats until the task is completed
- Requires a valid OpenAI API key with GPT-4 access
- Performance may vary depending on the complexity of the web pages and tasks
- Extended sessions can incur significant API costs due to frequent image processing and GPT-4 calls