Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy #2948

Merged
merged 14 commits into from
Jul 16, 2024

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jul 15, 2024

What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?

  1. AgentDelegationAction does not re-use the LLM instance from the caller Agent, which makes it hard to accumulate cost metrics. It won't work correctly use the right LLM config during evaluation.
  2. In the agent controller, if self.delegate is checked before the max iteration check, which makes the whole agent loop never stop.

Give a summary of what the PR does, explaining any non-trivial design decisions

  1. Add LLM as an additional argument for AgentDelegationAction so that the child agent can re-use the same instance from the parent (that particular instance won't be saved to the event stream, though).
  2. Fix the order of if self.delegate check and the max iteration check
  3. add a benchmark (more like a test) to check whether the agent can successfully delegate the browsing task to the BrowsingAgent - this is needed to report CodeActAgent's browsing performance on the paper. README.

Other references

@xingyaoww xingyaoww marked this pull request as ready for review July 15, 2024 18:48
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, but just had one question to make sure that you intentionally removed the thought from the delegation task.

agenthub/codeact_agent/action_parser.py Outdated Show resolved Hide resolved
agenthub/codeact_agent/action_parser.py Outdated Show resolved Hide resolved
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM then!

extra={'msg_type': 'STEP'},
)

if self.state.iteration >= self.state.max_iterations:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this code around won't solve the real problem. The real problem is we don't accumulate steps across parents and children, and needs to be fixed by #2296.

This code move would likely introduce a bug: you will see the parent agent die while its child still running.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the slow progress of #2296, I can take it over unless someone else volunteers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will see the parent agent die while its child still running.

On second thought, we do hope the parent agent dies when it exceeds the (global)_max_iteration. Maybe i can just move this code block back for this PR, and wait for the gloabl_max_iteration fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix is coming! #2990

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still not fixed yet :( -

image

We probably need the headless_mode change from https://github.com/OpenDevin/OpenDevin/pull/2994/files#diff-37dff7c7cda815bf00598e78acaaebb91df0bf68b820bdca78a86b89f67dfc9b to fix it:
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a new problem arises after fixing the old problem lol

evaluation/browsing_delegation/README.md Outdated Show resolved Hide resolved
evaluation/browsing_delegation/README.md Outdated Show resolved Hide resolved
evaluation/browsing_delegation/run_infer.py Outdated Show resolved Hide resolved
@xingyaoww xingyaoww enabled auto-merge (squash) July 16, 2024 14:47
@xingyaoww xingyaoww merged commit f45a2ff into main Jul 16, 2024
@xingyaoww xingyaoww deleted the xw/web-delegate branch July 16, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants