forked from All-Hands-AI/OpenHands
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Ad/regression tests using pytest (All-Hands-AI#329)
* Remove all the unnecessary files * Create finalize the regression testing framework and add hello world test case * Update requirements.txt * Update the test function to execute the generate script
- Loading branch information
1 parent
fa87352
commit 7c27e59
Showing
92 changed files
with
184 additions
and
40,307 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
node_modules | ||
outputs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,79 @@ | ||
# Regression Tests | ||
# OpenDevin - Regression Test Framework | ||
|
||
These files demonstrate how OpenDevin currently handles certain scenarios. | ||
OpenDevin project is an open-source software engineering AI that can solve various software engineering tasks. This repository contains the regression test framework for OpenDevin project. | ||
|
||
To add a new regression case: | ||
```bash | ||
name="hello-script" | ||
## Running the Tests | ||
|
||
# The start directory contains the initial state of the project the agent will work on | ||
# Add any files you'd like here. | ||
mkdir -p ./agent/regression/cases/$name/start | ||
To run the tests for OpenDevin project, you can use the provided test runner script. Follow these steps: | ||
|
||
# task.txt contains the task to be accomplished | ||
echo "write a hello world script" >> ./agent/regression/cases/$name/task.txt | ||
1. Ensure you have Python 3.6 or higher installed on your system. | ||
2. Install the required dependencies by running the following command in your terminal: | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
3. Navigate to the root directory of the project. | ||
4. Run the test suite using the test runner script with the required arguments: | ||
``` | ||
python evaluation/regression/run_tests.py --OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxx --model=gpt-4-0125-preview | ||
``` | ||
Replace `sk-xxxxxxxxxxxxxxxxxxxxxx` with your actual OpenAI API key. The default model is `gpt-4-0125-preview`, but you can specify a different model if needed. | ||
|
||
# Single out your test case using the TEST_CASE environment variable | ||
TEST_CASE=$name ./agent/regression/run.sh | ||
``` | ||
The test runner will discover and execute all the test cases in the `cases/` directory, and display the results of the test suite, including the status of each individual test case and the overall summary. | ||
|
||
## Test Case Structure | ||
|
||
The test cases for OpenDevin project are organized in the `cases/` directory. Each test case has the following structure: | ||
|
||
To add agent to regreesion test: | ||
```bash | ||
add agent pair to directory_class_pairs variable in run.sh | ||
key is the directory name in /agenthub and value is the class name | ||
``` | ||
cases/ | ||
├── hello-world/ | ||
│ ├── task.txt | ||
│ ├── outputs/ | ||
│ │ ├── langchains_agent/ | ||
│ │ │ └── workspace/ | ||
│ │ │ ├── hello_world.sh | ||
│ │ └── codeact_agent/ | ||
│ │ └── workspace/ | ||
│ │ ├── hello_world.sh | ||
│ └── test_hello_world.py | ||
├── create_web_app/ | ||
│ ├── task.txt | ||
│ ├── outputs/ | ||
│ │ ├── langchains_agent/ | ||
│ │ │ └── workspace/ | ||
│ │ │ ├── app.py | ||
│ │ │ ├── requirements.txt | ||
│ │ │ ├── static/ | ||
│ │ │ └── templates/ | ||
│ │ └── codeact_agent/ | ||
│ │ └── workspace/ | ||
│ │ ├── app.py | ||
│ │ ├── requirements.txt | ||
│ │ ├── static/ | ||
│ │ └── templates/ | ||
│ └── test_create_web_app.py | ||
└── ... | ||
``` | ||
|
||
- `task.txt`: This file contains the task description provided by the user. | ||
- `outputs/`: This directory contains the output generated by OpenDevin for each agent. | ||
- `outputs/*/workspace/`: This directory contains the actual output files generated by OpenDevin. | ||
- `test_*.py`: These are the test scripts that validate the output of OpenDevin. | ||
|
||
## Adding New Test Cases | ||
|
||
To add a new test case to the regression test framework, follow the same steps as described in the previous sections. | ||
|
||
## Customizing the Test Cases | ||
|
||
The test cases can be customized by modifying the fixtures defined in the `conftest.py` file. The available fixtures are: | ||
|
||
- `test_cases_dir`: The directory containing the test cases. | ||
- `task_file`: The path to the `task.txt` file for the current test case. | ||
- `workspace_dir`: The path to the `workspace/` directory for the current test case. | ||
- `model`: The model selected start the generation. | ||
- `run_test_case`: A fixture that runs OpenDevin and generates the workspace for the current test case. | ||
|
||
You can modify these fixtures to change the behavior of the test cases or add new ones as needed. | ||
|
||
To run regresion test: | ||
```bash | ||
./run.sh and enter DEBUG, OPENAI_API_KEY and Model name in the prompt. | ||
``` | ||
If you have any questions or need further assistance, feel free to reach out to the project maintainers. |
206 changes: 0 additions & 206 deletions
206
evaluation/regression/cases/client-server/outputs/langchains_agent/logs.txt
This file was deleted.
Oops, something went wrong.
23 changes: 0 additions & 23 deletions
23
...ation/regression/cases/client-server/outputs/langchains_agent/workspace/client/.gitignore
This file was deleted.
Oops, something went wrong.
70 changes: 0 additions & 70 deletions
70
...ression/cases/client-server/outputs/langchains_agent/workspace/client/README.md
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.