[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy #2948

xingyaoww · 2024-07-15T18:48:41Z

What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?

AgentDelegationAction does not re-use the LLM instance from the caller Agent, which makes it hard to accumulate cost metrics. It won't work correctly use the right LLM config during evaluation.
In the agent controller, if self.delegate is checked before the max iteration check, which makes the whole agent loop never stop.

Give a summary of what the PR does, explaining any non-trivial design decisions

Add LLM as an additional argument for AgentDelegationAction so that the child agent can re-use the same instance from the parent (that particular instance won't be saved to the event stream, though).
Fix the order of if self.delegate check and the max iteration check
add a benchmark (more like a test) to check whether the agent can successfully delegate the browsing task to the BrowsingAgent - this is needed to report CodeActAgent's browsing performance on the paper. README.

Other references

…or cost accum purpose

neubig

Basically LGTM, but just had one question to make sure that you intentionally removed the thought from the delegation task.

agenthub/codeact_agent/action_parser.py

Co-authored-by: Graham Neubig <[email protected]>

neubig

LGTM then!

opendevin/controller/agent_controller.py

li-boxuan · 2024-07-16T03:12:10Z

opendevin/controller/agent_controller.py

+            extra={'msg_type': 'STEP'},
+        )
+
+        if self.state.iteration >= self.state.max_iterations:


Moving this code around won't solve the real problem. The real problem is we don't accumulate steps across parents and children, and needs to be fixed by #2296.

This code move would likely introduce a bug: you will see the parent agent die while its child still running.

Considering the slow progress of #2296, I can take it over unless someone else volunteers.

you will see the parent agent die while its child still running.

On second thought, we do hope the parent agent dies when it exceeds the (global)_max_iteration. Maybe i can just move this code block back for this PR, and wait for the gloabl_max_iteration fix.

Fix is coming! #2990

I think this is still not fixed yet :( -

We probably need the headless_mode change from https://github.com/OpenDevin/OpenDevin/pull/2994/files#diff-37dff7c7cda815bf00598e78acaaebb91df0bf68b820bdca78a86b89f67dfc9b to fix it:

Right, a new problem arises after fixing the old problem lol

…me llm for cost accum purpose" This reverts commit 81034c4.

…elegate

evaluation/browsing_delegation/README.md

evaluation/browsing_delegation/run_infer.py

xingyaoww added 4 commits July 15, 2024 13:32

fix json import

47e26cd

pass llm to delegation action so that sub-agent shares the same llm f…

81034c4

…or cost accum purpose

add inference script for browser delegation

7a09580

add readme

7e22766

xingyaoww marked this pull request as ready for review July 15, 2024 18:48

xingyaoww requested review from li-boxuan, frankxu2004 and neubig July 15, 2024 18:48

neubig reviewed Jul 15, 2024

View reviewed changes

agenthub/codeact_agent/action_parser.py Outdated Show resolved Hide resolved

agenthub/codeact_agent/action_parser.py Outdated Show resolved Hide resolved

Update agenthub/codeact_agent/action_parser.py

a19bae7

Co-authored-by: Graham Neubig <[email protected]>

neubig approved these changes Jul 15, 2024

View reviewed changes

revert action parser changes.

472bd8b

li-boxuan reviewed Jul 16, 2024

View reviewed changes

opendevin/controller/agent_controller.py Outdated Show resolved Hide resolved

li-boxuan reviewed Jul 16, 2024

View reviewed changes

li-boxuan and others added 4 commits July 15, 2024 20:42

Rework --llm-config CLI arg

74c1c05

Revert "pass llm to delegation action so that sub-agent shares the sa…

24904cd

…me llm for cost accum purpose" This reverts commit 81034c4.

Merge commit '74c1c058773899cc1341ab7bba1720d27e467724' into xw/web-d…

d53abe9

…elegate

Merge branch 'main' into xw/web-delegate

5344503

li-boxuan approved these changes Jul 16, 2024

View reviewed changes

evaluation/browsing_delegation/README.md Outdated Show resolved Hide resolved

evaluation/browsing_delegation/README.md Outdated Show resolved Hide resolved

evaluation/browsing_delegation/run_infer.py Outdated Show resolved Hide resolved

xingyaoww added 4 commits July 16, 2024 08:06

remove view summary

5af239e

update readme

1d0296e

update comment

f4c4457

update readme

5dc4d13

xingyaoww enabled auto-merge (squash) July 16, 2024 14:47

xingyaoww merged commit f45a2ff into main Jul 16, 2024

xingyaoww deleted the xw/web-delegate branch July 16, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy #2948

[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy #2948

xingyaoww commented Jul 15, 2024 •

edited

Loading

neubig left a comment

neubig left a comment

li-boxuan Jul 16, 2024

li-boxuan Jul 16, 2024

xingyaoww Jul 16, 2024

li-boxuan Jul 18, 2024

xingyaoww Jul 18, 2024

li-boxuan Jul 19, 2024

[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy #2948

[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy #2948

Conversation

xingyaoww commented Jul 15, 2024 • edited Loading

neubig left a comment

Choose a reason for hiding this comment

neubig left a comment

Choose a reason for hiding this comment

li-boxuan Jul 16, 2024

Choose a reason for hiding this comment

li-boxuan Jul 16, 2024

Choose a reason for hiding this comment

xingyaoww Jul 16, 2024

Choose a reason for hiding this comment

li-boxuan Jul 18, 2024

Choose a reason for hiding this comment

xingyaoww Jul 18, 2024

Choose a reason for hiding this comment

li-boxuan Jul 19, 2024

Choose a reason for hiding this comment

xingyaoww commented Jul 15, 2024 •

edited

Loading