Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests (openhands fix issue 5076) #8

Closed
wants to merge 37 commits into from

Conversation

enyst
Copy link
Owner

@enyst enyst commented Nov 23, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions


Link of any specific issues this addresses

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: a7ff9aa
Integration Tests Evaluation Report


You can download the full evaluation outputs here.

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: f1ac848
Integration Tests Evaluation Report


You can download the full evaluation outputs here.

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: 9ba684d
Integration Tests Evaluation Report
Success rate: 66.67% (4/6)

instance_id success reason
t02_add_bash_hello True
t05_simple_browsing False The answer is not found in any message. Total messages: 0. Messages: []
t04_git_staging True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 0. Messages: []
t03_jupyter_write_file True
t01_fix_simple_typo True

You can download the full evaluation outputs here.

Repository owner deleted a comment from github-actions bot Nov 25, 2024
Repository owner deleted a comment from github-actions bot Nov 25, 2024
@enyst
Copy link
Owner Author

enyst commented Nov 25, 2024

@openhands-agent Make the integration-runner workflow work also on schedule, a nightly schedule.

Copy link

OpenHands started fixing the pr! You can monitor the progress here.

Repository owner deleted a comment from github-actions bot Nov 25, 2024
Repository owner deleted a comment from github-actions bot Nov 25, 2024
Repository owner deleted a comment from github-actions bot Nov 25, 2024
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: de57c25
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Repository owner deleted a comment from github-actions bot Nov 25, 2024
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: 95425d3
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: 95425d3
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing False The answer is not found in any message. Total messages: 4.
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t02_add_bash_hello True
t04_git_staging True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 2.
t01_fix_simple_typo True

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

@enyst enyst force-pushed the int/openhands-fix-issue-5076 branch from b37602a to 0c22181 Compare November 25, 2024 22:19
enyst added a commit that referenced this pull request Nov 25, 2024
* Fix issue All-Hands-AI#5076: Integration test github action

* Update integration-runner.yml

* Update integration-runner.yml

* update variables

* use haiku

* use base url

* fix report name

* Fix pr #8: Integration tests (openhands fix issue 5076)

* Revert "Fix pr #8: Integration tests (openhands fix issue 5076)"

This reverts commit dcd4681.

* Fix pr #8: Integration tests (openhands fix issue 5076)

* use haiku explicitly, in results too

* remove duplicate

* Update .github/workflows/integration-runner.yml

* Revert "Update .github/workflows/integration-runner.yml"

This reverts commit 7e7200e.

* funny space

* Fix pr #8: Integration tests (openhands fix issue 5076)

* artifact fix

* clean up remote runtimes

* clean up runtimes more aggressively - a bit unexpected though

* Fix pr #8: Integration tests (openhands fix issue 5076)

* fix type issue that was preventing checking results

* try with waiting time

* add eval notes

* increase timeouts

* try with CI local builds

* fix eval output

* set debug

* fix tests!

* fix outputs

* keep details in logs, not github comment

* tweak schedule

* lint-y

---------

Co-authored-by: openhands <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants