Adds the GAIA benchark to the Testbed. This PR depends on #792 #810

afourney · 2023-11-29T22:19:16Z

Why are these changes needed?

This PR adds initial support for the GAIA benchmark to the Testbed.

Note: This PR depends on #792. Merge that one first.

Related issue number

N/A

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

…stbed_folders

… scenarios.

…ations.

… is a first version, and will likely need refinement.

qingyun-wu · 2023-11-30T16:23:57Z

Nice PR! I am reviewing 792 rn, and will move on to this PR later today or tmr.

…enarios in expand_gaia.py

codecov-commenter · 2023-11-30T17:36:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (73d7e92) 26.63% compared to head (08eec2b) 26.44%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #810      +/-   ##
==========================================
- Coverage   26.63%   26.44%   -0.19%     
==========================================
  Files          28       28              
  Lines        3725     3725              
  Branches      847      847              
==========================================
- Hits          992      985       -7     
- Misses       2660     2666       +6     
- Partials       73       74       +1

Flag	Coverage Δ
unittests	`26.44% <ø> (-0.14%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

qingyun-wu

LGTM! Following the practice here, should @LeoLjl move utils for AutoGPT to the utils folder? if so, @LeoLjl could you do that in a separate PR? Thank you!

LeoLjl

LGTM

samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py

LeoLjl · 2023-12-06T01:38:57Z

LGTM! Following the practice here, should @LeoLjl move utils for AutoGPT to the utils folder? if so, @LeoLjl could you do that in a separate PR? Thank you!

@qingyun-wu I'll copy the utils to the utils folder. Will do it in a separate PR.

…scenario.py Co-authored-by: LeoLjl <[email protected]>

… (microsoft#810) * Re-added completion logging when using older versions of autogen. * Extended scenario definitions and templating to include folders. * Prepare collate_human_eval.py for working with group chat scenarios. * Converted HumanEval to the folder-based approach, and added GroupChat scenarios. * Fixed the default termination message. * Fixed another termination condition. * Updated compatible autogen versions. * Added initial support for GAIA benchmark. * Fixed a bug in executing the finalize scripts. * Generalized the template further to support multiple folder copy operations. * Refined GAIA support, and broke scenarios down by difficulty. * Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement. * Added instructions for cloning GAIA * Updated README to fix some typos. * Added a script to format GAIA reslts for the leaderboard. * Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py Co-authored-by: LeoLjl <[email protected]> --------- Co-authored-by: Qingyun Wu <[email protected]> Co-authored-by: LeoLjl <[email protected]>

afourney added 17 commits November 16, 2023 15:25

Re-added completion logging when using older versions of autogen.

84897b4

Merge remote-tracking branch 'origin/testbed_add_014_logging' into te…

76ae8f5

…stbed_folders

Extended scenario definitions and templating to include folders.

014063e

Prepare collate_human_eval.py for working with group chat scenarios.

81c62c9

Converted HumanEval to the folder-based approach, and added GroupChat…

6f0a45f

… scenarios.

Fixed the default termination message.

5e9fe60

Fixed another termination condition.

f96245a

Merge branch 'main' into testbed_folders

bf815aa

Updated compatible autogen versions.

a95c748

Merge branch 'main' into testbed_folders

b962226

Added initial support for GAIA benchmark.

46b0abc

Fixed a bug in executing the finalize scripts.

53cd5b4

Generalized the template further to support multiple folder copy oper…

2d97bb8

…ations.

Merge branch 'testbed_folders' into testbed_gaia

f7b30a9

Refined GAIA support, and broke scenarios down by difficulty.

a22bea7

Added some experimental scripts for computing metrics over GAIA. This…

b26e26d

… is a first version, and will likely need refinement.

Added instructions for cloning GAIA

4ee155b

afourney added evaluation proj-autogenbench Issues related to AutoGenBench. labels Nov 29, 2023

afourney requested review from victordibia, qingyun-wu, yiranwu0 and a team November 29, 2023 22:19

afourney self-assigned this Nov 29, 2023

afourney mentioned this pull request Nov 29, 2023

Unable to succesfully run the Autogen Testbed Environment #800

Closed

afourney added 2 commits November 30, 2023 09:29

Updated README to describe how to run GAIA. Included the GAIA test sc…

b1118b2

…enarios in expand_gaia.py

Updated README to fix some typos.

66cd936

sonichi requested a review from LeoLjl December 1, 2023 15:23

sonichi requested a review from corbyrosset December 1, 2023 15:24

afourney and others added 2 commits December 1, 2023 10:08

Added a script to format GAIA reslts for the leaderboard.

23fa2ba

Merge branch 'main' into testbed_gaia

90101a4

qingyun-wu approved these changes Dec 5, 2023

View reviewed changes

LeoLjl approved these changes Dec 6, 2023

View reviewed changes

samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py Outdated Show resolved Hide resolved

qingyun-wu enabled auto-merge December 6, 2023 01:41

qingyun-wu disabled auto-merge December 6, 2023 01:41

Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/…

08eec2b

…scenario.py Co-authored-by: LeoLjl <[email protected]>

qingyun-wu enabled auto-merge December 6, 2023 01:42

qingyun-wu added this pull request to the merge queue Dec 6, 2023

Merged via the queue into main with commit f8b4b42 Dec 6, 2023
16 checks passed

sonichi deleted the testbed_gaia branch December 7, 2023 00:28

afourney added the gaia label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds the GAIA benchark to the Testbed. This PR depends on #792 #810

Adds the GAIA benchark to the Testbed. This PR depends on #792 #810

afourney commented Nov 29, 2023

qingyun-wu commented Nov 30, 2023

codecov-commenter commented Nov 30, 2023 •

edited

Loading

qingyun-wu left a comment

LeoLjl left a comment

LeoLjl commented Dec 6, 2023

Adds the GAIA benchark to the Testbed. This PR depends on #792 #810

Adds the GAIA benchark to the Testbed. This PR depends on #792 #810

Conversation

afourney commented Nov 29, 2023

Why are these changes needed?

Related issue number

Checks

qingyun-wu commented Nov 30, 2023

codecov-commenter commented Nov 30, 2023 • edited Loading

Codecov Report

qingyun-wu left a comment

Choose a reason for hiding this comment

LeoLjl left a comment

Choose a reason for hiding this comment

LeoLjl commented Dec 6, 2023

codecov-commenter commented Nov 30, 2023 •

edited

Loading