-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Agent, Action, Observation Abstraction and with updated Controller #105
New Agent, Action, Observation Abstraction and with updated Controller #105
Conversation
opendevin/action/base.py
Outdated
return { | ||
"action_type": self.__class__.__name__, | ||
"args": self.__dict__ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each action (optionally) takes in a controller
and handles execution by itself.
""" | ||
|
||
_registry: Dict[str, Type['Agent']] = {} | ||
|
||
def __init__( | ||
self, | ||
instruction: str, | ||
workspace_dir: str, | ||
model_name: str, | ||
max_steps: int = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get rid of workspace_dir
and max_steps
since these are handled by the Controller
now.
pass | ||
|
||
@abstractmethod | ||
def step(self, cmd_mgr: CommandManager) -> Event: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get rid of add_event
method, with the hope that we can put all these "added events" into State
, which is passed to agent's .step
method, with the hope that it simplifies the workflow and make the code easier for people to understand?
opendevin/agent.py
Outdated
|
||
@abstractmethod | ||
def step(self, cmd_mgr: CommandManager) -> Event: | ||
def step(self, state: State) -> Action: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"state --- agent ---> action" is a commonly used paradigm in RL i think, conforming to this convention make help people understand what's going on here easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
step(state) makes much more sense!
@@ -1,49 +0,0 @@ | |||
from opendevin.lib.command_manager import CommandManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this file to a separate folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I move this file under controller
since i realized that this command_manager is not really used in other places, rather, its invocation mostly stays in the Controller.
That is:
Agent
: an LLM that takes aState
and Spit out anAction
Controller
: responsible for puttingState
into Agent, and executing itsAction
(where command manager is involved) & put the results back intoState
opendevin/main.py
Outdated
controller = AgentController(agent, args.directory) | ||
controller = AgentController( | ||
agent, workdir=args.directory, max_iterations=args.max_iterations | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just move the directory and max_iteration arguments from AgentCls to Controller
Hey @rbren, I tweaked the abstraction a bit; hope to get your feedback on how you think about this before I move any further (converting langchains agent and codeact to conform with this). My hope is to potentially make this abstraction simpler and easier for everyone to contribute (e.g., by conforming to the |
opendevin/action/base.py
Outdated
def to_dict(self): | ||
return { | ||
"action_type": self.__class__.__name__, | ||
"args": self.__dict__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently added a message
field for all Actions (Events in the old system). Might be worth keeping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you merge this one already though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I had at the time, but it's in now 😄
from ..controller import AgentController | ||
|
||
@dataclass | ||
class Action: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see this class replacing Event
? Or living alongside it?
I've been using Event
to track:
- Actions
- Action Outputs (logs, file contents, webpages)
- User messages
- Thinking in between actions
It's important for the agent to see a stream of these Events (so it can react appropriately), and for the controller to emit a stream of Events (e.g. to show them in the browser).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually thinking of replacing Event
with Action
:
- If we want to stream thinking between Actions to the user, you could emit NoopAction (pure text message for thinking).
- If we want to stream information to the Agent, we can do so by updating the
State
, and feed into the agent? If an action is still running, the agent would be blocked in current implementation so that we would not be able to interrupt it with Event as well?
I think we have some trade-off on the decision here: The advantage is that we don't need to really keep Action
and Event
as two seemingly similar but different things (which may confuse people?) - the downside would be the control loop might need to run more frequently, but I think it will get the job down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the downside would be the control loop might need to run more frequently, but I think it will get the job down?
Ah, so are you thinking the loop would be more like:
- step 1:
run ls
- step 2: output of ls
- step 3: read file foo.txt
- step 4: foo contents
instead of the current loop, which combines actions and their outputs into a single step:
- step 1 :
run ls
, output of ls - step 2: read file foo.txt, foo contents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it roughly look like this:
for i in range(max_iter):
state = ...
# iter 2: state has the output of ls
action = agent.step(state)
# iter 1: action == `run ls`
# iter 2: action == "read file foo.txt"
observation = action.run(self) # self == controller
# iter 1: observation == output of ls
# iter 2: observation == "content of foo"
state = state.update(observation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also allows us to interrupt the control loop easier (since they run on higher frequency). Also, this frequency also corresponds to LLM's API calling frequency (one step corresponds to 1 LLM call), which could potentially be easier for calculating stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like this
from typing import Mapping | ||
|
||
@dataclass | ||
class State: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should keep the Event/Action history here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! will add
opendevin/state.py
Outdated
|
||
@dataclass | ||
class State: | ||
background_commands: Mapping[int, str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is str
here the command that was run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - as your originally wrote i guess?
opendevin/controller/__init__.py
Outdated
# NOTE: Backgorund events are now part of the State | ||
# log_events = self.command_manager.get_background_events() | ||
# for event in log_events: | ||
# for callback in self.callbacks: | ||
# callback(event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have background_commands
in the state, but we still need to get the logs for those commands intermittently.
opendevin/controller/__init__.py
Outdated
# TODO: Make all these Log Events into State so that we can get rid of the Event all together | ||
|
||
action: Action = self.agent.step(self.state) | ||
observation: str = action.run(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does observation
get passed back to the agent? In State
?
How does it get passed to the client? I was using callback
, but open to other options.
I do think we'll want structured data in here, like exit_code
or http_status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! In State
. In the current implementation of the control loop, if I am reading this correctly, even if we stream multiple events to the Agent, the agent would still be blocked at the previous job before they can handle the next event.
Instead of dispatching events to agents in real-time (like the add_event
), we could potentially just maintain a list of Event/Observation history and pass it all together to the Agent when they finished with their previous action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
How will the agent collate Actions and Observations? E.g. how does it know the order of:
- I did this
runcommand
action - Then I saw output X
- Then I read file foo.txt
- Then I saw output Y
- Then user told me to change tactics
- Then I thought Z
Does state keep a single array of those things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- that's my plan, State
just keep a single array of those histories, and pass it to the agent every turn (i.e., just like the messages
argument for OpenAI API - you have to pass in the complete history to the model for it to emit the next response).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK great! So we have an array like the above--I assume they're all the same class? Are they all Actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES! Or just a list of Action and Observation (- both could be converted to string, or we could make Observation a separate class, but I'm still debating atm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having Action and Observation in the same list might be a little tough, unless they're both subclasses of the same abstraction. Otherwise the client will have to introspect each one to figure out if they're looking at an Action or an Observation.
I do think we should keep the data as structured as possible--converting to strings will lose information and make it harder to craft good prompts.
For example, when an error
observation occurs, I might want to add a hint to my prompt, saying "your last command failed with exit code $code. We saw these logs: $logs. Can you fix it?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make a lot of sense! I'll figure it out :) Will let know when i finished this one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great overall! I like the idea of creating an abstract Action
class and subclassing it.
I think you're right that we can probably combine step
and add_event
. We could add the observation
history into State
to achieve this. But now the agent needs to keep track of what the newest observations are, instead of getting a stream of them in order. Any ideas on how to make that easier?
Also--how do you see user interrupts getting passed into the agent?
@rbren Thanks!! This is just an initial draft and by no means the final version -- will try to get this completed (fixed with langchains agent and codeact) before next Monday.
I think we can handle this in the control loop and can add attributes in |
opendevin/server/session.py
Outdated
"action": event.__class__.__name__, | ||
"args": event.__dict__, | ||
"message": event.message, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbren Fix most issues related to the new backend server and should be ready for review!
But I could need help here -- what would the message generally contain? Do we have a list of different events/actions we supported with the front-end? Thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping we can use the Action.to_dict
method to make this cleaner. We'll need a robust way to serialize/deserialize actions and observations so we can send them back and forth.
Right now, every payload has three fields:
action
: the ID of the action (could be something passive, likeoutput
)args
: any extra data about the action. key-value pairs, but something we'll want to standardize. E.g. therun
action has acommand
arg, and theoutput
action has anoutput
arg.message
: a human-readable description of this action, to be put in the chat window
But it might make sense to split Actions and Observations with two distinct payloads. Maybe something like:
{
"action": "run",
"args": {"command": "ls"},
"message": "let's check what files are available"
}
and
{
"observation": "run",
"content": "foo.txt\nbar.txt",
"extras": {
"exit_code": 0
},
"message": "I see two text files."
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! I totally agree! I probably need to read more about how front-end use these event - I can update the PR to do something like this:
{
"type": "observation OR action",
"name": "name_of_observation_or_action",
"arguments": {"kwargs for actions"},
"content": "TEXT CONTENT FOR OBS, null for action",
"message":
"Human readable description of the action - not necessarily need to be model generated",
"extras": {
"EXTRA KEY" : "EXTRA Value"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along really well!
The thing I'm most concerned about is serializing and deserializing these actions and observations. That will be really important for client/server communication, and for agent/LLM communication.
Having each type be its own class makes things a lot better in Python, but we'll have to work some magic to get them serialized nicely.
Things to consider:
- Being consistent with how we serialize different actions will make it easier for clients to consume them (e.g. in JavaScript)
- Being concise will help conserve token usage (e.g. see my comment on
action
vsaction_type
below)
I think you have a good start with the to_dict
method on Action
--maybe if we put some more logic in there, we can remove the conversions that are currently in the langchains agent and the server
class AgentRecallAction(ExecutableAction): | ||
query: str | ||
|
||
def run(self, controller: "AgentController") -> AgentRecallObservation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the types supposed to be quoted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh because AgentController
is imported only for type-checking here, so we should quote the type there (but it will show up correctly in Python IDE). If we are not import these for type-checking, it will create circular import that breaks the Syntax check i think.
def run(self, controller: "AgentController") -> "Observation": | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is run
a method on the base action, rather than on ExecutableAction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good point - i should move this down!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i realize the reason i did that is for python's static analysis to infer the correct .run method in our control loop, when we are expecting Action
type as outputs of the agent (instead of ExecutableAction
. But it shouldn't hurt? - .run
should not be called for a NotExecutableAction
anyway.
elif isinstance(info, CmdKillAction): | ||
d = {"action": "kill", "args": {"id": info.id}} | ||
elif isinstance(info, BrowseURLAction): | ||
d = {"action": "browse", "args": {"url": info.url}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Action <-> JSON conversion is going to be very important, so it's probably something we want to abstract.
For one, the server and client are going to need to send Actions and Observations back and forth, and will need to serialize them. Ditto for agent <-> LLM interactions.
Could d = info.to_dict()
simplify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so! But I feel we should probably prioritize getting this abstraction merged, then start a separate PR to re-write the inner workings of langchains
agent completely to adopt this Action
and Observation
(and the serialization via to_dict
) -- otherwise it will be pretty challenge for me to keep chasing with all the new commits keeping pushing to main
everyday hahah
opendevin/controller/__init__.py
Outdated
self.max_iterations = max_iterations | ||
self.workdir = workdir | ||
self.command_manager = CommandManager(workdir) | ||
self.state_updated_info: List[Action | Observation] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a thought: what if it's a List[(Action, Observation)]
?
We'd need to add a NullObservation
for actions that aren't executable. But it makes the list a lot more predictable, and it makes it very clear which observation goes with which action.
It'd also help us avoid all the isinstance
logic 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one issue we are having right now is that user can actually send over their action
to the control loop, which might make Tuple[Action, Observation]
challenge. Should agent always treat that as an observation? Or do we always enforce that the only thing that comes from the front-end can only be Observation
(which will simplify things by a ton!!). I think this is what Devin is doing - they only allow the agent to do all the work, and the user can just sit back and watch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...interesting question.
I'd actually say that the user input should always be an Action
. Messages could be something like the think
action we currently have. Maybe hint
?
But I imagine the user could also run
a command to help the agent (e.g. maybe it's struggling to figure out apt-get install npm
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW there's some video with Devin struggling with a task, and the user saying "wait, don't do that, do [something else]".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an aside, I think we'll want to have an API endpoint that returns the history, so the user can see everything happening (at least for debugging purposes).
Giving the items in the history a consistent format will help a bunch there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a research perspective, I guess allowing the user to execute a command in Devin's execution environment naturally makes the agent's environment change from static to dynamic, which might pose a bunch of challenges in both implementations and research. To the best of my knowledge, most agent research today mostly assumes a static environment: very few agent benchmarks are tailored for temporal changes, let alone a dynamic environment.
So I think maybe it will be beneficial for us to focus on the dynamic environment first (user just watch and talk, but do not execute actions themselves). Maybe after we get everything work smoothly and our agents can solve SWE-Bench issues, then we can start studying this problem with dynamic environment, how do y'all think?
PLUS: i think it will be very nice to keep track of history using Controller, we can starts a new issue to call for help after this PR is merged.
opendevin/action/base.py
Outdated
raise NotImplementedError | ||
|
||
def to_dict(self): | ||
return {"action_type": self.__class__.__name__, "args": self.__dict__} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you made this action_type
instead of action
, but it might be easier if we just call it action
--it saves a token if the dict gets passed to an LLM, and is a little easier to read.
return exit_code, logs.decode('utf-8') | ||
|
||
def execute_in_background(self, cmd: str) -> None: | ||
self.log_time = time.time() | ||
result = self.container.exec_run(['/bin/bash', '-c', cmd], socket=True, workdir="/workspace") | ||
result = self.container.exec_run(['su', 'devin', '-c', cmd], socket=True, workdir="/workspace") | ||
self.log_generator = result.output # socket.SocketIO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 we'll probably want to rework this at some point. I've noticed it often has issues installing software.
opendevin/server/session.py
Outdated
"action": event.__class__.__name__, | ||
"args": event.__dict__, | ||
"message": event.message, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping we can use the Action.to_dict
method to make this cleaner. We'll need a robust way to serialize/deserialize actions and observations so we can send them back and forth.
Right now, every payload has three fields:
action
: the ID of the action (could be something passive, likeoutput
)args
: any extra data about the action. key-value pairs, but something we'll want to standardize. E.g. therun
action has acommand
arg, and theoutput
action has anoutput
arg.message
: a human-readable description of this action, to be put in the chat window
But it might make sense to split Actions and Observations with two distinct payloads. Maybe something like:
{
"action": "run",
"args": {"command": "ls"},
"message": "let's check what files are available"
}
and
{
"observation": "run",
"content": "foo.txt\nbar.txt",
"extras": {
"exit_code": 0
},
"message": "I see two text files."
}
…w-new-abstraction
@rbren Can confirm the latest code now works on the minimal example for both However, there could be something that could break other things (e.g., front-end). I'd hope we can get this merged soon and start a series of issues to keep track of new issues that may come with the new abstraction (and save me temporarily from trying to solve merge conflicts every day haha), for example:
|
Co-authored-by: Robert Brennan <[email protected]>
…ll-Hands-AI#105) * rearrange workspace_dir and max_step as arguments to controller * remove unused output * abstract each action into dataclass * move actions * fix action import * move cmd manager and change method to private * move controller * rename action folder * add state * a draft of Controller & new agent abstraction * add agent actions * remove controller file * add observation to perform a refractor on langchains agent * revert to make this compatible via translation * fix typo and translate error * add error to observation * index thought as dict * refractor controller * fix circular dependency caused by type hint * add runnable attribute to agent * add mixin to denote executable * change baseclass * make file read/write action compatible w/ docker directory * remove event * fix some merge issue * fix sandbox w/ permission issue * cleanup history abstraction since langchains agent is not really using it * tweak to make langchains agent working * make all actions return observation * fix missing import * add echo action for agent * add error code to cmd output obs * make cmd manager returns cmd output obs * fix codeact agent to make it work * fix all ruff issue * fix mypy * add import agenthub back * add message for Action attribute (migrate from previous event) * fix typo * fix instruction setting * fix instruction setting * attempt to fix session * ruff fix * add .to_dict method for base and observation * add message for recall * try to simplify the state_updated_info with tuple of action and obs * update_info to Tuple[Action, Observation] * make codeact agent and langchains compatible with Tuple[Action, Observation] * fix ruff * fix ruff * change to base path to fix minimal langchains agent * add NullAction to potentially handle for chat scenario * Update opendevin/controller/command_manager.py Co-authored-by: Robert Brennan <[email protected]> * fix event args * set the default workspace to "workspace" * make directory relative (so it does not show up to agent in File*Action) * fix typo * await to yield for sending observation * fix message format --------- Co-authored-by: Robert Brennan <[email protected]>
No description provided.