Merge branch 'main' into swe-bench-eval

xingyaoww · Mar 21, 2024 · deb2af2 · deb2af2
2 parents 909f2de + b84463f
commit deb2af2
Show file tree

Hide file tree

Showing 157 changed files with 926 additions and 262 deletions.
diff --git a/.gitignore b/.gitignore
@@ -14,7 +14,7 @@ dist/
 downloads/
 eggs/
 .eggs/
-lib/
+./lib/
 lib64/
 parts/
 sdist/
@@ -190,4 +190,4 @@ yarn-error.log*
 
 # agent
 .envrc
-agent/workspace
+/workspace
diff --git a/agent/build-and-run.sh b/agent/build-and-run.sh
diff --git a/agent/lib/actions/__init__.py b/agent/lib/actions/__init__.py
diff --git a/agent/lib/actions/kill.py b/agent/lib/actions/kill.py
diff --git a/agent/lib/actions/run.py b/agent/lib/actions/run.py
diff --git a/agent/lib/agent.py b/agent/lib/agent.py
diff --git a/agent/lib/controlloop.py b/agent/lib/controlloop.py
diff --git a/agent/main.py b/agent/main.py
diff --git a/agenthub/README.md b/agenthub/README.md
@@ -0,0 +1,57 @@
+# Agent Framework Research
+
+In this folder, there may exist multiple implementations of `Agent` that will be used by the 
+
+For example, `agenthub/langchain_agent`, `agenthub/metagpt_agent`, `agenthub/codeact_agent`, etc.
+Contributors from different backgrounds and interests can choose to contribute to any (or all!) of these directions.
+
+## Constructing an Agent
+Your agent must implement the following methods:
+
+### `step`
+```
+def step(self, cmd_mgr: CommandManager) -> Event:
+```
+`step` moves the agent forward one step towards its goal. This probably means
+sending a prompt to the LLM, then parsing the response into an action `Event`.
+
+Each Event has an `action` and a dict of `args`. Supported Events include:
+* `read` - reads the contents of a file. Arguments:
+  * `path` - the path of the file to read
+* `write` - writes the contents to a file. Arguments:
+  * `path` - the path of the file to write
+  * `contents` - the contents to write to the file
+* `run` - runs a command. Arguments:
+  * `command` - the command to run
+  * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+* `kill` - kills a background command
+  * `id` - the ID of the background command to kill
+* `browse` - opens a web page. Arguments:
+  * `url` - the URL to open
+* `recall` - recalls a past memory. Arguments:
+  * `query` - the query to search for
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+  * `thought` - the thought to record
+* `finish` - if you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working.
+
+For Events like `read` and `run`, a follow-up event will be added via `add_event` with the output.
+
+### `add_event`
+```
+def add_event(self, event: Event) -> None:
+```
+`add_event` adds an event to the agent's history. This could be a user message,
+an action taken by the agent, log output, file contents, or anything else.
+
+You'll probably want to keep a history of events, and use them in your prompts
+so that the agent knows what it did recently. You may also want to keep events
+in a vector database so the agent can refer back to them.
+
+The output of `step` will automatically be passed to this method.
+
+### `search_memory`
+```
+def search_memory(self, query: str) -> List[str]:
+```
+`search_memory` should return a list of events that match the query. This will be used
+for the `recall` action.
diff --git a/agenthub/__init__.py b/agenthub/__init__.py
@@ -0,0 +1,2 @@
+from . import langchains_agent
+from . import codeact_agent
diff --git a/agenthub/codeact_agent/README.md b/agenthub/codeact_agent/README.md
@@ -0,0 +1,21 @@
+# CodeAct-based Agent Framework
+
+This folder implements the [CodeAct idea](https://arxiv.org/abs/2402.13463) that relies on LLM to autonomously perform actions in a Bash shell. It requires more from the LLM itself: LLM needs to be capable enough to do all the stuff autonomously, instead of stuck in an infinite loop. 
+
+A minimalistic exmaple can be found at [research/codeact/examples/run_flask_server_with_bash.py](./examples/run_flask_server_with_bash.py):
+
+```bash
+mkdir workspace
+PYTHONPATH=`pwd`:$PYTHONPATH python3 opendevin/main.py -d ./workspace -c CodeActAgent -t "Please write a flask app that returns 'Hello, World\!' at the root URL, then start the app on port 5000. python3 has already been installed for you."
+```
+
+
+Example: prompts `gpt-3.5-turbo-0125` to write a flask server, install `flask` library, and start the server.
+
+<img width="951" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/325c3115-a343-4cc5-a92b-f1e5d552a077">
+
+<img width="957" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/68ad10c1-744a-4e9d-bb29-0f163d665a0a">
+
+Most of the things are working as expected, except at the end, the model did not follow the instruction to stop the interaction by outputting `<execute> exit </execute>` as instructed. 
+
+**TODO**: This should be fixable by either (1) including a complete in-context example like [this](https://github.com/xingyaoww/mint-bench/blob/main/mint/tasks/in_context_examples/reasoning/with_tool.txt), OR (2) collect some interaction data like this and fine-tune a model (like [this](https://github.com/xingyaoww/code-act), a more complex route).
-Original file line number
+Diff line change
@@ Expand Up / @@ -14,7 +14,7 @@ dist/ @@
     downloads/
     eggs/
     .eggs/
-    lib/
+    ./lib/
     lib64/
     parts/
     sdist/
@@ Expand Down Expand Up / @@ -190,4 +190,4 @@ yarn-error.log* @@
     # agent
     .envrc
-    agent/workspace
+    /workspace
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		from . import langchains_agent
		from . import codeact_agent