Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add editable URL to browser tab #3078

Closed
wants to merge 20 commits into from

Conversation

SmartManoj
Copy link
Contributor

@SmartManoj SmartManoj commented Jul 23, 2024

Solves Discord thread

image

Add functionality to make the browser tab URL editable.

  • frontend/src/components/Browser.tsx
    • Add an input field to allow editing the URL.
    • Add onChange handler to update the URL state.
    • Add onKeyDown handler to trigger the URL update.
  • frontend/src/services/browseService.ts
    • Implement a function to send data using websocket for updating the browser tab URL.
  • frontend/src/state/browserSlice.ts
    • Add a new action to update the URL state.
    • Call updateBrowserTabUrl from frontend/src/services/browseService.ts.

For more details, open the Copilot Workspace session.

Add editable URL input field to Browser component

* Add an input field in `frontend/src/components/Browser.tsx` to allow editing the URL.
* Add `handleUrlChange` and `handleUrlBlur` handlers to update the URL state.
* Implement `updateBrowserTabUrl` function in `frontend/src/services/browseService.ts` to send data using websocket for updating the browser tab URL.
* Modify `frontend/src/state/browserSlice.ts` to include a new action to update the URL state and call `updateBrowserTabUrl` function.


---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/OpenDevin/OpenDevin?shareId=b39056db-d2ea-402e-aabd-c77505aa2575).
@tobitege tobitege requested review from amanape and iFurySt July 23, 2024 15:07
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind this though? The browser supposedly to be controlled by the agent - and that tab is used to display the agent's action to the user. It is probably not really important for human to able to change the URL - we can't interact with the displayed screenshot anyway?

@SmartManoj
Copy link
Contributor Author

SmartManoj commented Jul 24, 2024

Like Interactive Writable Terminal #2493, Helps to tell SLM to use the active tab.
Discord thread
One can quickly check whether the browser is working.
One can check the local server once the browser is moved to the sandbox and the network host is off.


LLM Model: groq/llama3-8b-8192

Why it creates app.py? Consistently it is doing that. Not following the simple command.

image

image

Event History 1

Event History 2


If I ask browse google.com, it is doing that. After that, the previous query works.

image
image

@xingyaoww
Copy link
Collaborator

xingyaoww commented Jul 24, 2024

sounds good! Similar to interactive terminal, would the agent also knows that the user changed the URL of the browser? (e.g., you push an BrowsingObservation into eventstream and tell the agent "User changed the web browsing page to XXX; Here's the webpage YYY") If so, i think we are probably good to merge this?

@SmartManoj
Copy link
Contributor Author

SmartManoj commented Jul 24, 2024

would the agent also knows that the user changed the URL of the browser?

Yes.

image

Log:

image

LLM Msg:

image

Event history

@@ -121,6 +123,8 @@ def action_to_str(self, action: Action) -> str:
return f'{action.thought}\n<execute_browse>\n{action.inputs["task"]}\n</execute_browse>'
elif isinstance(action, MessageAction):
return action.content
elif isinstance(action, BrowseURLAction):
return f'Open {action.url} in browser'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only added to CodeAct, but I guess most complex browsing actions are actually performed by browsing agents I'm not sure if there's an easy way to do that for browsing agents as well.

Also, the message added here is just plain string (and for agent, it will recognize it as its past action), and got the BrowserOutputObservation as output for such plain string action. This will most likely confuses the agent - as it may try to issue action like "Open {action.url} in browser" and only find it will not be able to get observations.

Making the URL editable will add a lot of things to consider on the agent side, so I'd advise we be careful with it & come up with a more general solution. If we can't do that in a general way, it will be cleaner for us not to implement it rather than open ourselves to tech debt in the future.

Copy link
Contributor Author

@SmartManoj SmartManoj Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reworded Opening {action.url} in browser manually

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both the action and the observation here are sent to the LLM with role 'user'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link

@rishi8011 rishi8011 Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww @SmartManoj
can you guide me how to use browse ??

i tried to click but it is not allow me to edit URL .

@SmartManoj
Copy link
Contributor Author

image

Like this, it will solve #3034

@rbren
Copy link
Collaborator

rbren commented Jul 30, 2024

We're doing some cleanup of PRs, and are going to close this one for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants