New Agent, Action, Observation Abstraction and with updated Controller #105

xingyaoww · 2024-03-23T05:46:46Z

No description provided.

xingyaoww · 2024-03-23T05:47:32Z

opendevin/action/base.py

+        return {
+            "action_type": self.__class__.__name__,
+            "args": self.__dict__
+        }


Each action (optionally) takes in a controller and handles execution by itself.

xingyaoww · 2024-03-23T05:48:18Z

opendevin/agent.py

    """

    _registry: Dict[str, Type['Agent']] = {}

    def __init__(
        self,
        instruction: str,
-        workspace_dir: str,
-        model_name: str,
-        max_steps: int = 100


I get rid of workspace_dir and max_steps since these are handled by the Controller now.

xingyaoww · 2024-03-23T05:49:26Z

opendevin/agent.py

-        pass
-
-    @abstractmethod
-    def step(self, cmd_mgr: CommandManager) -> Event:


I get rid of add_event method, with the hope that we can put all these "added events" into State, which is passed to agent's .step method, with the hope that it simplifies the workflow and make the code easier for people to understand?

xingyaoww · 2024-03-23T05:51:00Z

opendevin/agent.py

-
-    @abstractmethod
-    def step(self, cmd_mgr: CommandManager) -> Event:
+    def step(self, state: State) -> Action:


"state --- agent ---> action" is a commonly used paradigm in RL i think, conforming to this convention make help people understand what's going on here easily.

step(state) makes much more sense!

xingyaoww · 2024-03-23T05:51:21Z

opendevin/controller.py

@@ -1,49 +0,0 @@
-from opendevin.lib.command_manager import CommandManager


I moved this file to a separate folder.

xingyaoww · 2024-03-23T05:53:21Z

opendevin/controller/command_manager.py

I move this file under controller since i realized that this command_manager is not really used in other places, rather, its invocation mostly stays in the Controller.

That is:

Agent: an LLM that takes a State and Spit out an Action

Controller: responsible for putting State into Agent, and executing its Action (where command manager is involved) & put the results back into State

xingyaoww · 2024-03-23T05:53:53Z

opendevin/main.py

-    controller = AgentController(agent, args.directory)
+    controller = AgentController(
+        agent, workdir=args.directory, max_iterations=args.max_iterations
+    )


Just move the directory and max_iteration arguments from AgentCls to Controller

xingyaoww · 2024-03-23T05:56:28Z

Hey @rbren, I tweaked the abstraction a bit; hope to get your feedback on how you think about this before I move any further (converting langchains agent and codeact to conform with this). My hope is to potentially make this abstraction simpler and easier for everyone to contribute (e.g., by conforming to the state-agent->action convention).

rbren · 2024-03-23T13:02:00Z

opendevin/action/base.py

+    def to_dict(self):
+        return {
+            "action_type": self.__class__.__name__,
+            "args": self.__dict__


I recently added a message field for all Actions (Events in the old system). Might be worth keeping

Did you merge this one already though?

Not sure if I had at the time, but it's in now 😄

https://github.com/OpenDevin/OpenDevin/blob/4aa24eb41d51392b17bb4a2df50554fd73f89834/opendevin/lib/event.py#L7

rbren · 2024-03-23T13:07:16Z

opendevin/action/base.py

+from ..controller import AgentController
+
+@dataclass
+class Action:


Do you see this class replacing Event? Or living alongside it?

I've been using Event to track:

Actions

Action Outputs (logs, file contents, webpages)

User messages

Thinking in between actions

It's important for the agent to see a stream of these Events (so it can react appropriately), and for the controller to emit a stream of Events (e.g. to show them in the browser).

I am actually thinking of replacing Event with Action:

If we want to stream thinking between Actions to the user, you could emit NoopAction (pure text message for thinking).

If we want to stream information to the Agent, we can do so by updating the State, and feed into the agent? If an action is still running, the agent would be blocked in current implementation so that we would not be able to interrupt it with Event as well?

I think we have some trade-off on the decision here: The advantage is that we don't need to really keep Action and Event as two seemingly similar but different things (which may confuse people?) - the downside would be the control loop might need to run more frequently, but I think it will get the job down?

the downside would be the control loop might need to run more frequently, but I think it will get the job down?

Ah, so are you thinking the loop would be more like:

step 1: run ls

step 2: output of ls

step 3: read file foo.txt

step 4: foo contents

instead of the current loop, which combines actions and their outputs into a single step:

step 1 : run ls, output of ls

step 2: read file foo.txt, foo contents

Yes, it roughly look like this:

for i in range(max_iter): state = ... # iter 2: state has the output of ls action = agent.step(state) # iter 1: action == `run ls` # iter 2: action == "read file foo.txt" observation = action.run(self) # self == controller # iter 1: observation == output of ls # iter 2: observation == "content of foo" state = state.update(observation)

it also allows us to interrupt the control loop easier (since they run on higher frequency). Also, this frequency also corresponds to LLM's API calling frequency (one step corresponds to 1 LLM call), which could potentially be easier for calculating stats?

👍 I like this

rbren · 2024-03-23T13:08:32Z

opendevin/state.py

+from typing import Mapping
+
+@dataclass
+class State:


I wonder if we should keep the Event/Action history here too

Good catch! will add

rbren · 2024-03-23T13:08:48Z

opendevin/state.py

+
+@dataclass
+class State:
+    background_commands: Mapping[int, str]


Is str here the command that was run?

Yes - as your originally wrote i guess?

rbren · 2024-03-23T13:09:50Z

opendevin/controller/__init__.py

+            # NOTE: Backgorund events are now part of the State
+            # log_events = self.command_manager.get_background_events()
+            # for event in log_events:
+            #     for callback in self.callbacks:
+            #         callback(event)


You have background_commands in the state, but we still need to get the logs for those commands intermittently.

rbren · 2024-03-23T13:11:49Z

opendevin/controller/__init__.py

+            # TODO: Make all these Log Events into State so that we can get rid of the Event all together
+
+            action: Action = self.agent.step(self.state)
+            observation: str = action.run(self)


How does observation get passed back to the agent? In State?

How does it get passed to the client? I was using callback, but open to other options.

I do think we'll want structured data in here, like exit_code or http_status

Yes! In State. In the current implementation of the control loop, if I am reading this correctly, even if we stream multiple events to the Agent, the agent would still be blocked at the previous job before they can handle the next event.
Instead of dispatching events to agents in real-time (like the add_event), we could potentially just maintain a list of Event/Observation history and pass it all together to the Agent when they finished with their previous action?

Makes sense!

How will the agent collate Actions and Observations? E.g. how does it know the order of:

I did this runcommand action

Then I saw output X

Then I read file foo.txt

Then I saw output Y

Then user told me to change tactics

Then I thought Z

Does state keep a single array of those things?

Yes -- that's my plan, State just keep a single array of those histories, and pass it to the agent every turn (i.e., just like the messages argument for OpenAI API - you have to pass in the complete history to the model for it to emit the next response).

OK great! So we have an array like the above--I assume they're all the same class? Are they all Actions?

YES! Or just a list of Action and Observation (- both could be converted to string, or we could make Observation a separate class, but I'm still debating atm)

Having Action and Observation in the same list might be a little tough, unless they're both subclasses of the same abstraction. Otherwise the client will have to introspect each one to figure out if they're looking at an Action or an Observation.

I do think we should keep the data as structured as possible--converting to strings will lose information and make it harder to craft good prompts.

For example, when an error observation occurs, I might want to add a hint to my prompt, saying "your last command failed with exit code $code. We saw these logs: $logs. Can you fix it?"

That make a lot of sense! I'll figure it out :) Will let know when i finished this one!

rbren

This looks great overall! I like the idea of creating an abstract Action class and subclassing it.

I think you're right that we can probably combine step and add_event. We could add the observation history into State to achieve this. But now the agent needs to keep track of what the newest observations are, instead of getting a stream of them in order. Any ideas on how to make that easier?

Also--how do you see user interrupts getting passed into the agent?

xingyaoww · 2024-03-23T13:22:41Z

@rbren Thanks!! This is just an initial draft and by no means the final version -- will try to get this completed (fixed with langchains agent and codeact) before next Monday.

We could add the observation history into State to achieve this. But now the agent needs to keep track of what the newest observations are,

I think we can handle this in the control loop and can add attributes in State something like observations, which is a list presenting Observations in chronological order?

xingyaoww · 2024-03-24T16:05:05Z

opendevin/server/session.py

+                "action": event.__class__.__name__,
+                "args": event.__dict__,
+                "message": event.message,
+            }


@rbren Fix most issues related to the new backend server and should be ready for review!
But I could need help here -- what would the message generally contain? Do we have a list of different events/actions we supported with the front-end? Thanks :)

I'm hoping we can use the Action.to_dict method to make this cleaner. We'll need a robust way to serialize/deserialize actions and observations so we can send them back and forth.

Right now, every payload has three fields:

action: the ID of the action (could be something passive, like output)

args: any extra data about the action. key-value pairs, but something we'll want to standardize. E.g. the run action has a command arg, and the output action has an output arg.

message: a human-readable description of this action, to be put in the chat window

But it might make sense to split Actions and Observations with two distinct payloads. Maybe something like:

{ "action": "run", "args": {"command": "ls"}, "message": "let's check what files are available" }

and

{ "observation": "run", "content": "foo.txt\nbar.txt", "extras": { "exit_code": 0 }, "message": "I see two text files." }

Great suggestion! I totally agree! I probably need to read more about how front-end use these event - I can update the PR to do something like this:

{ "type": "observation OR action", "name": "name_of_observation_or_action", "arguments": {"kwargs for actions"}, "content": "TEXT CONTENT FOR OBS, null for action", "message": "Human readable description of the action - not necessarily need to be model generated", "extras": { "EXTRA KEY" : "EXTRA Value" } }

rbren

This is coming along really well!

The thing I'm most concerned about is serializing and deserializing these actions and observations. That will be really important for client/server communication, and for agent/LLM communication.

Having each type be its own class makes things a lot better in Python, but we'll have to work some magic to get them serialized nicely.

Things to consider:

Being consistent with how we serialize different actions will make it easier for clients to consume them (e.g. in JavaScript)
Being concise will help conserve token usage (e.g. see my comment on action vs action_type below)

I think you have a good start with the to_dict method on Action--maybe if we put some more logic in there, we can remove the conversions that are currently in the langchains agent and the server

rbren · 2024-03-24T20:43:20Z

opendevin/action/agent.py

+class AgentRecallAction(ExecutableAction):
+    query: str
+
+    def run(self, controller: "AgentController") -> AgentRecallObservation:


Are the types supposed to be quoted here?

oh because AgentController is imported only for type-checking here, so we should quote the type there (but it will show up correctly in Python IDE). If we are not import these for type-checking, it will create circular import that breaks the Syntax check i think.

rbren · 2024-03-24T20:44:52Z

opendevin/action/base.py

+    def run(self, controller: "AgentController") -> "Observation":
+        raise NotImplementedError


Why is run a method on the base action, rather than on ExecutableAction?

oh good point - i should move this down!

oh i realize the reason i did that is for python's static analysis to infer the correct .run method in our control loop, when we are expecting Action type as outputs of the agent (instead of ExecutableAction. But it shouldn't hurt? - .run should not be called for a NotExecutableAction anyway.

rbren · 2024-03-24T20:47:34Z

agenthub/langchains_agent/__init__.py

+                elif isinstance(info, CmdKillAction):
+                    d = {"action": "kill", "args": {"id": info.id}}
+                elif isinstance(info, BrowseURLAction):
+                    d = {"action": "browse", "args": {"url": info.url}}


This Action <-> JSON conversion is going to be very important, so it's probably something we want to abstract.

For one, the server and client are going to need to send Actions and Observations back and forth, and will need to serialize them. Ditto for agent <-> LLM interactions.

Could d = info.to_dict() simplify this?

I think so! But I feel we should probably prioritize getting this abstraction merged, then start a separate PR to re-write the inner workings of langchains agent completely to adopt this Action and Observation (and the serialization via to_dict) -- otherwise it will be pretty challenge for me to keep chasing with all the new commits keeping pushing to main everyday hahah

rbren · 2024-03-24T20:51:54Z

opendevin/controller/__init__.py

+        self.max_iterations = max_iterations
+        self.workdir = workdir
+        self.command_manager = CommandManager(workdir)
+        self.state_updated_info: List[Action | Observation] = []


Here's a thought: what if it's a List[(Action, Observation)]?

We'd need to add a NullObservation for actions that aren't executable. But it makes the list a lot more predictable, and it makes it very clear which observation goes with which action.

It'd also help us avoid all the isinstance logic 😄

I guess one issue we are having right now is that user can actually send over their action to the control loop, which might make Tuple[Action, Observation] challenge. Should agent always treat that as an observation? Or do we always enforce that the only thing that comes from the front-end can only be Observation (which will simplify things by a ton!!). I think this is what Devin is doing - they only allow the agent to do all the work, and the user can just sit back and watch.

Hmm...interesting question.

I'd actually say that the user input should always be an Action. Messages could be something like the think action we currently have. Maybe hint?

But I imagine the user could also run a command to help the agent (e.g. maybe it's struggling to figure out apt-get install npm)

FWIW there's some video with Devin struggling with a task, and the user saying "wait, don't do that, do [something else]".

as an aside, I think we'll want to have an API endpoint that returns the history, so the user can see everything happening (at least for debugging purposes).

Giving the items in the history a consistent format will help a bunch there

From a research perspective, I guess allowing the user to execute a command in Devin's execution environment naturally makes the agent's environment change from static to dynamic, which might pose a bunch of challenges in both implementations and research. To the best of my knowledge, most agent research today mostly assumes a static environment: very few agent benchmarks are tailored for temporal changes, let alone a dynamic environment.

So I think maybe it will be beneficial for us to focus on the dynamic environment first (user just watch and talk, but do not execute actions themselves). Maybe after we get everything work smoothly and our agents can solve SWE-Bench issues, then we can start studying this problem with dynamic environment, how do y'all think?

PLUS: i think it will be very nice to keep track of history using Controller, we can starts a new issue to call for help after this PR is merged.

rbren · 2024-03-24T20:54:47Z

opendevin/action/base.py

+        raise NotImplementedError
+
+    def to_dict(self):
+        return {"action_type": self.__class__.__name__, "args": self.__dict__}


I understand why you made this action_type instead of action, but it might be easier if we just call it action--it saves a token if the dict gets passed to an LLM, and is a little easier to read.

rbren · 2024-03-24T20:58:13Z

opendevin/sandbox/sandbox.py

        return exit_code, logs.decode('utf-8')

    def execute_in_background(self, cmd: str) -> None:
        self.log_time = time.time()
-        result = self.container.exec_run(['/bin/bash', '-c', cmd], socket=True, workdir="/workspace")
+        result = self.container.exec_run(['su', 'devin', '-c', cmd], socket=True, workdir="/workspace")
        self.log_generator = result.output # socket.SocketIO


👍 we'll probably want to rework this at some point. I've noticed it often has issues installing software.

rbren · 2024-03-24T21:07:22Z

opendevin/server/session.py

+                "action": event.__class__.__name__,
+                "args": event.__dict__,
+                "message": event.message,
+            }


I'm hoping we can use the Action.to_dict method to make this cleaner. We'll need a robust way to serialize/deserialize actions and observations so we can send them back and forth.

Right now, every payload has three fields:

action: the ID of the action (could be something passive, like output)

args: any extra data about the action. key-value pairs, but something we'll want to standardize. E.g. the run action has a command arg, and the output action has an output arg.

message: a human-readable description of this action, to be put in the chat window

But it might make sense to split Actions and Observations with two distinct payloads. Maybe something like:

{ "action": "run", "args": {"command": "ls"}, "message": "let's check what files are available" }

and

{ "observation": "run", "content": "foo.txt\nbar.txt", "extras": { "exit_code": 0 }, "message": "I see two text files." }

…w-new-abstraction

…vation]

xingyaoww · 2024-03-25T16:13:54Z

@rbren Can confirm the latest code now works on the minimal example for both langchains and codeact agents.

However, there could be something that could break other things (e.g., front-end). I'd hope we can get this merged soon and start a series of issues to keep track of new issues that may come with the new abstraction (and save me temporarily from trying to solve merge conflicts every day haha), for example:

refractor langchains agent to this new abstraction
support parsing from end observations (e.g., user messages) by adding from_dict method for Observation (potentially for Action in the future - if we want to make the agent work with the user at the same time)
make sure we also change the message key for the front-end to support performing multi-turn chat
remove all the print statement inside individual agent implementation, and do a systematic colored action/observation printing in the controller
keep track of the entire (action, observation) history in the controller for the front-end to use when needed

opendevin/controller/command_manager.py

Co-authored-by: Robert Brennan <[email protected]>

…ll-Hands-AI#105) * rearrange workspace_dir and max_step as arguments to controller * remove unused output * abstract each action into dataclass * move actions * fix action import * move cmd manager and change method to private * move controller * rename action folder * add state * a draft of Controller & new agent abstraction * add agent actions * remove controller file * add observation to perform a refractor on langchains agent * revert to make this compatible via translation * fix typo and translate error * add error to observation * index thought as dict * refractor controller * fix circular dependency caused by type hint * add runnable attribute to agent * add mixin to denote executable * change baseclass * make file read/write action compatible w/ docker directory * remove event * fix some merge issue * fix sandbox w/ permission issue * cleanup history abstraction since langchains agent is not really using it * tweak to make langchains agent working * make all actions return observation * fix missing import * add echo action for agent * add error code to cmd output obs * make cmd manager returns cmd output obs * fix codeact agent to make it work * fix all ruff issue * fix mypy * add import agenthub back * add message for Action attribute (migrate from previous event) * fix typo * fix instruction setting * fix instruction setting * attempt to fix session * ruff fix * add .to_dict method for base and observation * add message for recall * try to simplify the state_updated_info with tuple of action and obs * update_info to Tuple[Action, Observation] * make codeact agent and langchains compatible with Tuple[Action, Observation] * fix ruff * fix ruff * change to base path to fix minimal langchains agent * add NullAction to potentially handle for chat scenario * Update opendevin/controller/command_manager.py Co-authored-by: Robert Brennan <[email protected]> * fix event args * set the default workspace to "workspace" * make directory relative (so it does not show up to agent in File*Action) * fix typo * await to yield for sending observation * fix message format --------- Co-authored-by: Robert Brennan <[email protected]>

xingyaoww added 10 commits March 23, 2024 00:03

rearrange workspace_dir and max_step as arguments to controller

d3c0ff6

remove unused output

42675f2

abstract each action into dataclass

a1fc2c4

move actions

d99ee8a

fix action import

e602bfb

move cmd manager and change method to private

ae677e8

move controller

37a8d9a

rename action folder

0027cb5

add state

6bca774

a draft of Controller & new agent abstraction

a7707c3

xingyaoww commented Mar 23, 2024

View reviewed changes

xingyaoww requested a review from rbren March 23, 2024 05:54

rbren reviewed Mar 23, 2024

View reviewed changes

xingyaoww added 3 commits March 23, 2024 10:36

add agent actions

f974b8d

remove controller file

45c948c

add observation to perform a refractor on langchains agent

0e2d969

ruff fix

932fb38

xingyaoww commented Mar 24, 2024

View reviewed changes

rbren reviewed Mar 24, 2024

View reviewed changes

xingyaoww added 10 commits March 25, 2024 10:21

Merge commit '8a64d7c9120653c37087afe925bc9322b1936bbd' into codeact-…

ac9378d

…w-new-abstraction

add .to_dict method for base and observation

e6106fa

add message for recall

e48c291

try to simplify the state_updated_info with tuple of action and obs

c184e4e

update_info to Tuple[Action, Observation]

bb36f15

make codeact agent and langchains compatible with Tuple[Action, Obser…

d3807dd

…vation]

fix ruff

787a012

fix ruff

c1e386e

change to base path to fix minimal langchains agent

9e07ec0

add NullAction to potentially handle for chat scenario

0d204e2

rbren reviewed Mar 25, 2024

View reviewed changes

opendevin/controller/command_manager.py Outdated Show resolved Hide resolved

Update opendevin/controller/command_manager.py

ff48ea0

Co-authored-by: Robert Brennan <[email protected]>

xingyaoww changed the title ~~A draft of new Agent & Action Abstraction and Their Interaction with Controller~~ New Agent, Action, Observation Abstraction and with updated Controller Mar 25, 2024

xingyaoww added 6 commits March 25, 2024 12:12

fix event args

9829de5

set the default workspace to "workspace"

1fe6557

make directory relative (so it does not show up to agent in File*Action)

9a1b087

fix typo

c5c0a0f

await to yield for sending observation

adf5edd

fix message format

6c78ae0

rbren approved these changes Mar 25, 2024

View reviewed changes

xingyaoww merged commit 82f934d into All-Hands-AI:main Mar 25, 2024

This was referenced Mar 25, 2024

Improve serialization and deserialization of Actions and Observations #148

Closed

Handle observation events on frontend #149

Closed

Update documentation for new Agent abstraction #152

Closed

xingyaoww mentioned this pull request Mar 26, 2024

Docker terminal state #124

Closed

		@@ -1,49 +0,0 @@
		from opendevin.lib.command_manager import CommandManager

		def run(self, controller: "AgentController") -> "Observation":
		raise NotImplementedError

New Agent, Action, Observation Abstraction and with updated Controller #105

New Agent, Action, Observation Abstraction and with updated Controller #105

Conversation

xingyaoww commented Mar 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww commented Mar 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbren left a comment • edited Loading

Choose a reason for hiding this comment

xingyaoww commented Mar 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

rbren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enyst Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww commented Mar 25, 2024 • edited Loading

xingyaoww Mar 23, 2024 •

edited

Loading

xingyaoww Mar 23, 2024 •

edited

Loading

rbren left a comment •

edited

Loading

xingyaoww Mar 25, 2024 •

edited

Loading

xingyaoww Mar 25, 2024 •

edited

Loading

xingyaoww Mar 25, 2024 •

edited

Loading

enyst Mar 25, 2024 •

edited

Loading

xingyaoww commented Mar 25, 2024 •

edited

Loading