adding a script to fetch and convert devin's output for evaluation #81

Jiaxin-Pei · 2024-03-21T14:13:45Z

No description provided.

xingyaoww · 2024-03-21T14:51:49Z

evaluation/SWE-bench/src/prepare_devin_outputs_for_evaluation.py

How about we put this file to SWE-Bench/scripts?

I'm not quite sure about this. It seems more reasonable to keep dataset related files in the dataset folder to me. @JustinLin610 @libowen2121 any thoughts on this?

Ohh i suggest we do this: mv src/prepare_devin_outputs_for_evaluation.py scripts/prepare_devin_outputs_for_evaluation.py

oh my bad, I thought we are moving it outside the evaluation folder. Will do

xingyaoww · 2024-03-21T14:53:49Z

evaluation/README.md

+- src
+  - `prepare_devin_outputs_for_evaluation.py`: script fetching and converting devin's output into the desired json file for evaluation.
+    - outputs: two json files under `evaluation/SWE-bench/data/` that can be directly used for evaluation


Can you upload the post-processed file to our huggingface datasets, and add curl or wget command here so people can directly download those for debugging? You can request to join if you haven't already: https://huggingface.co/OpenDevin

xingyaoww · 2024-03-21T14:55:01Z

evaluation/SWE-bench/src/prepare_devin_outputs_for_evaluation.py

+
+    with open(os.path.join(output_dir, "fail_output.json"), "w") as fail_file:
+        json.dump(failed_files_info, fail_file, indent=4)
+


I'm debating whether we want to make this two separate files, or just one file -- how about we merge them into one, and add an additional bool field like devin_pass?

It only takes ~1 minute to fetch and process the files. The purpose of having two files is you can directly start from the passed files for pilot testing. I can generate another merged file and upload it to HF

having both options is a good option! maybe we can add an argument in the script to switch that behavior; and we can upload both version to HF and have user decide which one they want to download

sounds good!

xingyaoww

LGTM!

…I#81

* a starting point for SWE-Bench evaluation with docker * fix the swe-bench uid issue * typo fixed * fix conda missing issue * move files based on new PR * Update doc and gitignore using devin prediction file from #81 * fix typo * add a sentence * fix typo in path * fix path --------- Co-authored-by: Binyuan Hui <[email protected]>

…ll-Hands-AI#81) * adding code to fetch and convert devin's output for evaluation * update README.md * update code for fetching and processing devin's outputs * update code for fetching and processing devin's outputs

* a starting point for SWE-Bench evaluation with docker * fix the swe-bench uid issue * typo fixed * fix conda missing issue * move files based on new PR * Update doc and gitignore using devin prediction file from All-Hands-AI#81 * fix typo * add a sentence * fix typo in path * fix path --------- Co-authored-by: Binyuan Hui <[email protected]>

Jiaxin-Pei added 2 commits March 21, 2024 10:10

adding code to fetch and convert devin's output for evaluation

3c1f36b

update README.md

7e95a01

Jiaxin-Pei mentioned this pull request Mar 21, 2024

[Evaluation] Convert Devin's output into SWE-Bench runnable format #80

Closed

xingyaoww reviewed Mar 21, 2024

View reviewed changes

Jiaxin-Pei and others added 3 commits March 21, 2024 13:10

Merge branch 'OpenDevin:main' into main

b509c69

update code for fetching and processing devin's outputs

b55541a

update code for fetching and processing devin's outputs

b4b6786

xingyaoww approved these changes Mar 21, 2024

View reviewed changes

xingyaoww merged commit dc88dac into All-Hands-AI:main Mar 21, 2024

xingyaoww added a commit to xingyaoww/OpenHands that referenced this pull request Mar 22, 2024

Update doc and gitignore using devin prediction file from All-Hands-A…

b7bd1d3

…I#81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding a script to fetch and convert devin's output for evaluation #81

adding a script to fetch and convert devin's output for evaluation #81

Jiaxin-Pei commented Mar 21, 2024

xingyaoww Mar 21, 2024

Jiaxin-Pei Mar 21, 2024

xingyaoww Mar 21, 2024

Jiaxin-Pei Mar 21, 2024

xingyaoww Mar 21, 2024

Jiaxin-Pei Mar 21, 2024

xingyaoww Mar 21, 2024

Jiaxin-Pei Mar 21, 2024

xingyaoww Mar 21, 2024

Jiaxin-Pei Mar 21, 2024

xingyaoww left a comment


		with open(os.path.join(output_dir, "fail_output.json"), "w") as fail_file:
		json.dump(failed_files_info, fail_file, indent=4)

adding a script to fetch and convert devin's output for evaluation #81

adding a script to fetch and convert devin's output for evaluation #81

Conversation

Jiaxin-Pei commented Mar 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww left a comment

Choose a reason for hiding this comment