feat: Ad/regression tests using pytest (All-Hands-AI#329)

* Remove all the unnecessary files * Create finalize the regression testing framework and add hello world test case * Update requirements.txt * Update the test function to execute the generate script
xingyaoww · Mar 29, 2024 · 7c27e59 · 7c27e59
1 parent fa87352
commit 7c27e59
Show file tree

Hide file tree

Showing 92 changed files with 184 additions and 40,307 deletions.
diff --git a/evaluation/regression/.gitignore b/evaluation/regression/.gitignore
@@ -1 +1,2 @@
 node_modules
+outputs
diff --git a/evaluation/regression/README.md b/evaluation/regression/README.md
@@ -1,29 +1,79 @@
-# Regression Tests
+# OpenDevin - Regression Test Framework
 
-These files demonstrate how OpenDevin currently handles certain scenarios.
+OpenDevin project is an open-source software engineering AI that can solve various software engineering tasks. This repository contains the regression test framework for OpenDevin project.
 
-To add a new regression case:
-```bash
-name="hello-script"
+## Running the Tests
 
-# The start directory contains the initial state of the project the agent will work on
-# Add any files you'd like here.
-mkdir -p ./agent/regression/cases/$name/start
+To run the tests for OpenDevin project, you can use the provided test runner script. Follow these steps:
 
-# task.txt contains the task to be accomplished
-echo "write a hello world script" >> ./agent/regression/cases/$name/task.txt
+1. Ensure you have Python 3.6 or higher installed on your system.
+2. Install the required dependencies by running the following command in your terminal:
+   ```
+   pip install -r requirements.txt
+   ```
+3. Navigate to the root directory of the project.
+4. Run the test suite using the test runner script with the required arguments:
+   ```
+   python evaluation/regression/run_tests.py --OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxx --model=gpt-4-0125-preview
+   ```
+   Replace `sk-xxxxxxxxxxxxxxxxxxxxxx` with your actual OpenAI API key. The default model is `gpt-4-0125-preview`, but you can specify a different model if needed.
 
-# Single out your test case using the TEST_CASE environment variable
-TEST_CASE=$name ./agent/regression/run.sh
-```
+The test runner will discover and execute all the test cases in the `cases/` directory, and display the results of the test suite, including the status of each individual test case and the overall summary.
+
+## Test Case Structure
+
+The test cases for OpenDevin project are organized in the `cases/` directory. Each test case has the following structure:
 
-To add agent to regreesion test:
-```bash
-add agent pair to directory_class_pairs variable in run.sh 
-key is the directory name in /agenthub and value is the class name 
 ```
+cases/
+├── hello-world/
+│   ├── task.txt
+│   ├── outputs/
+│   │   ├── langchains_agent/
+│   │   │   └── workspace/
+│   │   │       ├── hello_world.sh
+│   │   └── codeact_agent/
+│   │       └── workspace/
+│   │           ├── hello_world.sh
+│   └── test_hello_world.py
+├── create_web_app/
+│   ├── task.txt
+│   ├── outputs/
+│   │   ├── langchains_agent/
+│   │   │   └── workspace/
+│   │   │       ├── app.py
+│   │   │       ├── requirements.txt
+│   │   │       ├── static/
+│   │   │       └── templates/
+│   │   └── codeact_agent/
+│   │       └── workspace/
+│   │           ├── app.py
+│   │           ├── requirements.txt
+│   │           ├── static/
+│   │           └── templates/
+│   └── test_create_web_app.py
+└── ...
+```
+
+- `task.txt`: This file contains the task description provided by the user.
+- `outputs/`: This directory contains the output generated by OpenDevin for each agent.
+- `outputs/*/workspace/`: This directory contains the actual output files generated by OpenDevin.
+- `test_*.py`: These are the test scripts that validate the output of OpenDevin.
+
+## Adding New Test Cases
+
+To add a new test case to the regression test framework, follow the same steps as described in the previous sections.
+
+## Customizing the Test Cases
+
+The test cases can be customized by modifying the fixtures defined in the `conftest.py` file. The available fixtures are:
+
+- `test_cases_dir`: The directory containing the test cases.
+- `task_file`: The path to the `task.txt` file for the current test case.
+- `workspace_dir`: The path to the `workspace/` directory for the current test case.
+- `model`: The model selected start the generation.
+- `run_test_case`: A fixture that runs OpenDevin and generates the workspace for the current test case.
+
+You can modify these fixtures to change the behavior of the test cases or add new ones as needed.
 
-To run regresion test:
-```bash
-./run.sh and enter DEBUG, OPENAI_API_KEY and Model name in the prompt.
-``` 
+If you have any questions or need further assistance, feel free to reach out to the project maintainers.
diff --git a/evaluation/regression/cases/client-server/outputs/langchains_agent/logs.txt b/evaluation/regression/cases/client-server/outputs/langchains_agent/logs.txt
diff --git a/...ation/regression/cases/client-server/outputs/langchains_agent/workspace/client/.gitignore b/...ation/regression/cases/client-server/outputs/langchains_agent/workspace/client/.gitignore
diff --git a/...ression/cases/client-server/outputs/langchains_agent/workspace/client/README.md b/...ression/cases/client-server/outputs/langchains_agent/workspace/client/README.md