Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding a script to fetch and convert devin's output for evaluation #81

Merged
merged 5 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,8 @@ all the preprocessing/evaluation/analysis scripts.

## Tasks
### SWE-bench
- analysis
- devin_eval_analysis.ipynb: notebook analyzing devin's outputs
- notebooks
- `devin_eval_analysis.ipynb`: notebook analyzing devin's outputs
- src
- `prepare_devin_outputs_for_evaluation.py`: script fetching and converting devin's output into the desired json file for evaluation.
- outputs: two json files under `evaluation/SWE-bench/data/` that can be directly used for evaluation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you upload the post-processed file to our huggingface datasets, and add curl or wget command here so people can directly download those for debugging? You can request to join if you haven't already: https://huggingface.co/OpenDevin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested

62 changes: 62 additions & 0 deletions evaluation/SWE-bench/src/prepare_devin_outputs_for_evaluation.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we put this file to SWE-Bench/scripts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about this. It seems more reasonable to keep dataset related files in the dataset folder to me. @JustinLin610 @libowen2121 any thoughts on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh i suggest we do this: mv src/prepare_devin_outputs_for_evaluation.py scripts/prepare_devin_outputs_for_evaluation.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my bad, I thought we are moving it outside the evaluation folder. Will do

Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
'''
Script used to convert devin's output into the desired json format for evaluation on SWE-bench

Usage:
python prepare_devin_outputs_for_evaluation.py

Outputs:
two json files under evaluation/SWE-bench/data/

'''

import requests
import os
from tqdm import tqdm
import json

#fetch devin's outputs into a json file for evaluation
def get_devin_eval_output():
repo_url = "CognitionAI/devin-swebench-results"
folder_path = "output_diffs"

base_url = "https://api.github.com/repos/"
pass_api_url = f"{base_url}{repo_url}/contents/{folder_path}/pass"
failed_api_url = f"{base_url}{repo_url}/contents/{folder_path}/fail"

pass_files_info = []
failed_files_info = []

def get_files(api_url, subfolder_name, files_info):
response = requests.get(api_url)
if response.status_code == 200:
contents = response.json()
for item in tqdm(contents):
if item["type"] == "file":
file_url = f"https://raw.githubusercontent.com/{repo_url}/main/{folder_path}/{subfolder_name}/{item['name']}"
file_content = requests.get(file_url).text
instance_id = item['name'][:-9]
model_name = "Devin" # Update with actual model name
files_info.append({
"instance_id": instance_id,
"model_patch": file_content,
"model_name_or_path": model_name
})

get_files(pass_api_url, "pass", pass_files_info)
get_files(failed_api_url, "fail", failed_files_info)

script_dir = os.path.dirname(os.path.abspath(__file__))
output_dir = os.path.join(script_dir, "../data/devin/")

if not os.path.exists(output_dir):
os.makedirs(output_dir)

with open(os.path.join(output_dir, "pass_output.json"), "w") as pass_file:
json.dump(pass_files_info, pass_file, indent=4)

with open(os.path.join(output_dir, "fail_output.json"), "w") as fail_file:
json.dump(failed_files_info, fail_file, indent=4)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating whether we want to make this two separate files, or just one file -- how about we merge them into one, and add an additional bool field like devin_pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only takes ~1 minute to fetch and process the files. The purpose of having two files is you can directly start from the passed files for pilot testing. I can generate another merged file and upload it to HF

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having both options is a good option! maybe we can add an argument in the script to switch that behavior; and we can upload both version to HF and have user decide which one they want to download

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!


if __name__ == '__main__':
get_devin_eval_output()