Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'get_website_content' parameter handling and improve text cleanup #103

Merged
merged 3 commits into from
Nov 18, 2024

Conversation

mikahanninen
Copy link
Contributor

@mikahanninen mikahanninen commented Nov 18, 2024

Description

The get_website_content could be called without user_agent parameter or by setting its value to empty dictionary and these were not handled correctly by the action.

The response size of the same action could be considerable (unnecessarily) due to empty spaces and/or line breaks within text content.

  • Improved text content cleanup for all text properties
  • Fix how parameter user_agent is handled for the get_website_content action
  • Set max timeout 5 seconds for waiting for networkidle, but continue actions regardless

How can (was) this tested?

Tested with Sema4.ai VSCode extension and with Sema4.ai Studio.

Screenshots (if needed)

Checklist:

  • I have bumped the version number for the Action Package / Agent
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation - README.md file
  • I have updated the CHANGELOG.md file in correspondence with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • OAuth2: I have made sure that action has necessary scopes (works in whitelisted mode)

@mikahanninen mikahanninen self-assigned this Nov 18, 2024
@mikahanninen mikahanninen added bug Something isn't working enhancement New feature or request labels Nov 18, 2024


@action
def get_website_content(url: str, user_agent: UserAgent) -> Response[WebPage]:
def get_website_content(url: str, user_agent: UserAgent = {}) -> Response[WebPage]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you need to define it as an empty dict because using None will cause the action lint to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the LLM sent some calls to this action without user_agent which broke the param handling. This was the easiest way to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this set the model will request for random user agent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of having mutable defaults as function params but I see we never modify the dict so this should be fine

@mikahanninen mikahanninen merged commit 14ad70a into main Nov 18, 2024
6 checks passed
@mikahanninen mikahanninen deleted the fix/browser-get-website-content branch November 18, 2024 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants