Skip to content

Commit

Permalink
Update evaluation/TUTORIAL.md
Browse files Browse the repository at this point in the history
Co-authored-by: Boxuan Li <[email protected]>
  • Loading branch information
xingyaoww and li-boxuan authored May 18, 2024
1 parent 94f7bcc commit e552f52
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion evaluation/TUTORIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ If you see something like this, you can consider adding this to your evaluation

Sandbox is a fully functioning docker container where the agent can perform all sorts of tasks, e.g., using bash, calling Python, install packages, and more. You can leave `sandbox` to `None` if you don't need to do anything special to pre-configure the `Sandbox`.

In SWE-Bench, we need to copy the proper repository directory to the workspace and activate the right python virtual environment before the agent can start performing the task, so we actually defined a custom [`SWEBenchSSHBox`](https://github.com/OpenDevin/OpenDevin/blob/7ca560471bd262f22513f3863995d0a8e6121c07/evaluation/swe_bench/swe_env_box.py#L12-L118) that inherit from the default sandbox [`SSHBox`](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/ssh_box.py#L188) and handles all these initial setup. If you need to configure the `sandbox` for your evaluation, check `SWEBenchSSHBox` for a reference of implementation.
In SWE-Bench, we need to copy the proper repository directory to the workspace and activate the right python virtual environment before the agent can start performing the task, so we actually defined a custom [`SWEBenchSSHBox`](https://github.com/OpenDevin/OpenDevin/blob/7ca560471bd262f22513f3863995d0a8e6121c07/evaluation/swe_bench/swe_env_box.py#L12-L118) that inherit from the default sandbox [`SSHBox`](https://github.com/OpenDevin/OpenDevin/blob/7ca560471bd262f22513f3863995d0a8e6121c07/opendevin/runtime/docker/ssh_box.py#L188) and handles all these initial setup. If you need to configure the `sandbox` for your evaluation, check `SWEBenchSSHBox` for a reference of implementation.

## How to put together an evaluation script?

Expand Down

0 comments on commit e552f52

Please sign in to comment.