Develop a version of WebSurferAgent that uses a proper headless browser (perhaps with multimodal LLM support) #1481

afourney · 2024-01-31T02:15:09Z

Possible roadmap:

Investigate (Headless Chrome)[https://developer.chrome.com/blog/headless-chrome/] or equivalent.
Add a HeadlessChromeBrowser to complement SimpleTextBrowser in https://github.com/microsoft/autogen/blob/main/autogen/browser_utils.py
Update WebSurferAgent to accept a web_broswer instance rather than a web_browser_config, and pass in either a SimpleTextBrowser or HeadlessChromeBrowser as appropriate

Additional thoughts: We should try to take full advantage to having a browser under our control. Don't just dump the dom to HTML for BeautifulSoup to parse (like what Langchain does). Rather use javascript running privileged in the page context to query the document, extract text, interact with links, etc.

vijaykramesh · 2024-02-05T00:45:22Z

Started an initial pass at this, right now it doesn't do much more than the SimpleTextBrowser but since it runs in a headless chrome it can take advantage of client side interactions. It might also be worth looking into something like https://github.com/lucgagan/auto-playwright to allow the HeadlessChromeBrowser agent to intelligently do selenium interactions.

gee842 · 2024-02-05T08:53:12Z

I have opened this issue which might be related as it tackles a similar problem: #1538

However, one enhancement that would be extremely beneficial would be shortening the context length by selectively remembering things and increasing the useful-information-density produced by the web browsing agent.

Perhaps a library like this might come in handy:

https://github.com/buriy/python-readability

afourney · 2024-02-05T19:56:06Z

I have opened this issue which might be related as it tackles a similar problem: #1538

However, one enhancement that would be extremely beneficial would be shortening the context length by selectively remembering things and increasing the useful-information-density produced by the web browsing agent.

Perhaps a library like this might come in handy:

https://github.com/buriy/python-readability

Yes, 100%. If our goal is to find a piece of information on a webpage, we don't need to keep the whole browsing history in our context window. If one page/search result was a dead-end, it's just burning tokens to keep that around when surfing to other pages.

Gagan's PR might help (if we limit the context to a few messages #1513 ) But additional strategies like summarization would no doubt also be advantageous.

FWIW the reason SimpleTextBrowser provides virtual "viewports" is an effort to help with this issue (by not immediately swamping the context with long pages)

INF800 · 2024-02-29T04:04:38Z

Hi @afourney @vijaykramesh, I want to try this feature but I see that this issue is still pending.

If you guys don't mind I will try to complete this PR while experimenting with the headless browser.

You may see the merges from latest work in main and vijaykramesh's branch here: https://github.com/INF800/autogen/tree/feat/headless_browser

signalprime · 2024-02-29T04:21:56Z

Hi friends, if you're in a hurry to try it out, here is the PR #1733 and on Friday I'll have time to finish the PR feedback. To my surprise, agent navigation such as clicking on links continued to work even with the graphical browser. Edge, chrome, and Firefox are supported. It was a smaller building block in a larger pipeline I'd been working on.

edit, I forgot the tagline: "Till all are one" ~ Optimus

afourney · 2024-02-29T05:30:02Z

Hey folks. @signalprime , sorry for the delay. I've been heads down on a benchmark (GAIA, and hope to be done soon). Enhanced browsing is, I think, planned work for March.

Having said that, let me raise two points:

I've found that Playwright has better support for detecting when pages are loaded, and for interacting with downloads vs. Selenium, and would probably pursue that direction -- but it does not preclude having a common interface that would support Selenium-based approaches.
I have refactored websurfer a little, basing it off a new mdconvert package I've been working on that converts various media files to Mardown. Available here:
https://github.com/microsoft/autogen/blob/ct_orchestrator/autogen/mdconvert.py
And Here
https://github.com/microsoft/autogen/blob/ct_orchestrator/autogen/browser_utils.py

INF800 · 2024-02-29T15:40:25Z

mdconvert looks nice.

afourney · 2024-02-29T23:48:46Z

mdconvert looks nice.

Added as a standalone PR draft here: #1825

* Add headless browser to the WebSurferAgent, closes #1481 * replace soup.get_text() with markdownify.MarkdownConverter().convert_soup(soup) * import HeadlessChromeBrowser * implicitly wait for 10s * inicrease max. wait time to 99s * fix: trim trailing whitespace * test: fix headless tests * better bing query search * docs: add example 3 for headless option --------- Co-authored-by: Vijay Ramesh <[email protected]>

…es) (#1929) * Feat/headless browser (retargeted) (#1832) * Add headless browser to the WebSurferAgent, closes #1481 * replace soup.get_text() with markdownify.MarkdownConverter().convert_soup(soup) * import HeadlessChromeBrowser * implicitly wait for 10s * inicrease max. wait time to 99s * fix: trim trailing whitespace * test: fix headless tests * better bing query search * docs: add example 3 for headless option --------- Co-authored-by: Vijay Ramesh <[email protected]> * Handle missing Selenium package. * Added browser_chat.py example to simplify testing. * Based browser on mdconvert. (#1847) * Based browser on mdconvert. * Updated web_surfer. * Renamed HeadlessChromeBrowser to SeleniumChromeBrowser * Added an initial POC with Playwright. * Separated Bing search into it's own utility module. * Simple browser now uses Bing tools. * Updated Playwright browser to inherit from SimpleTextBrowser * Got Selenium working too. * Renamed classes and files for consistency. * Added more instructions. * Initial work to support other search providers. * Added some basic behavior when the BING_API_KEY is missing. * Cleaned up some search results. * Moved to using the request.Sessions object. Moved Bing SERP paring to mdconvert to be more broadly useful. * Added backward compatibility to WebSurferAgent * Selenium and Playwright now grab the whole DOM, not jus the body, allowing the converters access to metadata. * Fixed printing of page titles in Playwright. * Moved installation of WebSurfer dependencies to contrib-tests.yml * Fixing pre-commit issues. * Reverting conversable_agent, which should not have been changed in prior commit. * Added RequestMarkdownBrowser tests. * Fixed a bug with Bing search, and added search test cases. * Added tests for Bing search. * Added tests for md_convert * Added test files. * Added missing pptx. * Added more tests for WebSurfer coverage. * Fixed guard on requests_markdown_browser test. * Updated test coverage for mdconvert. * Fix brwser_utils tests. * Removed image test from browser, since exiftool isn't installed on test machine. * Removed image test from browser, since exiftool isn't installed on test machine. * Disable Selenium GPU and sandbox to ensure it runs headless in Docker. * Added option for Bing API results to be interleaved (as Bing specifies), or presented in a categorized list (Web, News, Videos), etc * Print more details when requests exceptions are thrown. * Added additional documentation to markdown_search * Added documentation to the selenium_markdown_browser. * Added documentation to playwright_markdown_browser.py * Added documentation to requests_markdown_browser * Added documentation to mdconvert.py * Updated agentchat_surfer notebook. * Update .github/workflows/contrib-tests.yml Co-authored-by: Davor Runje <[email protected]> * Merge main. Resolve conflicts. * Resolve pre-commit checks. * Removed offending LFS file. * Re-added offending LFS file. * Fixed browser_utils tests. * Fixed style errors. --------- Co-authored-by: Asapanna Rakesh <[email protected]> Co-authored-by: Vijay Ramesh <[email protected]> Co-authored-by: Eric Zhu <[email protected]> Co-authored-by: Davor Runje <[email protected]>

afourney mentioned this issue Jan 31, 2024

[Roadmap]: Complex Tasks Work Items (GAIA) #1369

Closed

afourney added the gaia label Jan 31, 2024

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 5, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

ef5c9db

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 5, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

de7915a

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 5, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

b8e400d

vijaykramesh mentioned this issue Feb 5, 2024

Add headless browser to the WebSurferAgent #1534

Closed

3 tasks

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

266a9ce

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

db79a1b

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

79e6fa6

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

ac6f400

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

9757c9a

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

e91ce69

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

c7ebe28

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

643a738

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

28ee5ad

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

b4ec0de

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

1499ae3

vijaykramesh added a commit to vijaykramesh/autogen that referenced this issue Feb 6, 2024

Add headless browser to the WebSurferAgent, closes microsoft#1481

b0ab6c1

afourney mentioned this issue Mar 9, 2024

WebSurfer Updated (Selenium, Playwright, and support for many filetypes) #1929

Merged

gagb closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop a version of WebSurferAgent that uses a proper headless browser (perhaps with multimodal LLM support) #1481

Develop a version of WebSurferAgent that uses a proper headless browser (perhaps with multimodal LLM support) #1481

afourney commented Jan 31, 2024 •

edited

Loading

vijaykramesh commented Feb 5, 2024

gee842 commented Feb 5, 2024 •

edited

Loading

afourney commented Feb 5, 2024 •

edited

Loading

INF800 commented Feb 29, 2024

signalprime commented Feb 29, 2024 •

edited

Loading

afourney commented Feb 29, 2024

INF800 commented Feb 29, 2024

afourney commented Feb 29, 2024

Develop a version of WebSurferAgent that uses a proper headless browser (perhaps with multimodal LLM support) #1481

Develop a version of WebSurferAgent that uses a proper headless browser (perhaps with multimodal LLM support) #1481

Comments

afourney commented Jan 31, 2024 • edited Loading

vijaykramesh commented Feb 5, 2024

gee842 commented Feb 5, 2024 • edited Loading

afourney commented Feb 5, 2024 • edited Loading

INF800 commented Feb 29, 2024

signalprime commented Feb 29, 2024 • edited Loading

afourney commented Feb 29, 2024

INF800 commented Feb 29, 2024

afourney commented Feb 29, 2024

afourney commented Jan 31, 2024 •

edited

Loading

gee842 commented Feb 5, 2024 •

edited

Loading

afourney commented Feb 5, 2024 •

edited

Loading

signalprime commented Feb 29, 2024 •

edited

Loading