Fix 'get_website_content' parameter handling and improve text cleanup #103

mikahanninen · 2024-11-18T09:58:25Z

Description

The get_website_content could be called without user_agent parameter or by setting its value to empty dictionary and these were not handled correctly by the action.

The response size of the same action could be considerable (unnecessarily) due to empty spaces and/or line breaks within text content.

Improved text content cleanup for all text properties
Fix how parameter user_agent is handled for the get_website_content action
Set max timeout 5 seconds for waiting for networkidle, but continue actions regardless

How can (was) this tested?

Tested with Sema4.ai VSCode extension and with Sema4.ai Studio.

Screenshots (if needed)

Checklist:

I have bumped the version number for the Action Package / Agent
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation - README.md file
I have updated the CHANGELOG.md file in correspondence with my changes
I have added tests that prove my fix is effective or that my feature works
OAuth2: I have made sure that action has necessary scopes (works in whitelisted mode)

vlad-sema4 · 2024-11-18T10:20:09Z

actions/browsing/actions.py



 @action
-def get_website_content(url: str, user_agent: UserAgent) -> Response[WebPage]:
+def get_website_content(url: str, user_agent: UserAgent = {}) -> Response[WebPage]:


I'm guessing you need to define it as an empty dict because using None will cause the action lint to fail?

Yeah the LLM sent some calls to this action without user_agent which broke the param handling. This was the easiest way to fix it.

With this set the model will request for random user agent.

I'm not a huge fan of having mutable defaults as function params but I see we never modify the dict so this should be fine

Fix 'get_website_content' parameter handling and improve text cleanup

a22e51d

mikahanninen self-assigned this Nov 18, 2024

mikahanninen added bug Something isn't working enhancement New feature or request labels Nov 18, 2024

Update CHANGELOG.md

c8a08df

mikahanninen requested review from tonnitommi, OvidiuCode and vlad-sema4 November 18, 2024 10:00

Update support.py

da68305

vlad-sema4 reviewed Nov 18, 2024

View reviewed changes

vlad-sema4 approved these changes Nov 18, 2024

View reviewed changes

mikahanninen merged commit 14ad70a into main Nov 18, 2024
6 checks passed

mikahanninen deleted the fix/browser-get-website-content branch November 18, 2024 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 'get_website_content' parameter handling and improve text cleanup #103

Fix 'get_website_content' parameter handling and improve text cleanup #103

mikahanninen commented Nov 18, 2024 •

edited

Loading

vlad-sema4 Nov 18, 2024

mikahanninen Nov 18, 2024

mikahanninen Nov 18, 2024

vlad-sema4 Nov 18, 2024

Fix 'get_website_content' parameter handling and improve text cleanup #103

Fix 'get_website_content' parameter handling and improve text cleanup #103

Conversation

mikahanninen commented Nov 18, 2024 • edited Loading

Description

How can (was) this tested?

Screenshots (if needed)

Checklist:

vlad-sema4 Nov 18, 2024

Choose a reason for hiding this comment

mikahanninen Nov 18, 2024

Choose a reason for hiding this comment

mikahanninen Nov 18, 2024

Choose a reason for hiding this comment

vlad-sema4 Nov 18, 2024

Choose a reason for hiding this comment

mikahanninen commented Nov 18, 2024 •

edited

Loading