Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More integration tests info #17

Closed
wants to merge 8 commits into from
Closed

Conversation

enyst
Copy link
Owner

@enyst enyst commented Nov 28, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions


Link of any specific issues this addresses

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #17)
Commit: ab566c3
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t04_git_staging True
t03_jupyter_write_file True
t01_fix_simple_typo True
t06_github_pr_browsing True
t05_simple_browsing True
t02_add_bash_hello True
Total cost: USD 0.11

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 0.00% (0/6)

instance_id success reason
t06_github_pr_browsing False The answer is not found in any message. Total messages: 1.
t05_simple_browsing False The answer is not found in any message. Total messages: 1.
t04_git_staging False Failed to check for "nothing to commit, working tree clean": On branch master
No commits yet
Changes to be committed:
(use "git rm --cached ..." to unstage)
new file: hello.py.
t01_fix_simple_typo False File not fixed: This is a stupid typoo.
Really?
No mor typos!
Enjoy!
t03_jupyter_write_file False Failed to cat /workspace/test.txt: cat: /workspace/test.txt: No such file or directory.
t02_add_bash_hello False Failed to cat /workspace/hello.sh: cat: /workspace/hello.sh: No such file or directory.
Total cost: USD 0.00

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #17)
Commit: b6e0be1
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

Total cost: USD 0.11

instance_id success reason cost
t04_git_staging True 0.013286
t03_jupyter_write_file True 0.024609
t01_fix_simple_typo True 0.026226
t06_github_pr_browsing True 0.031924
t05_simple_browsing True 0.017153
t02_add_bash_hello True 0

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 0.00% (0/6)

Total cost: USD 0.00

instance_id success reason cost
t06_github_pr_browsing False The answer is not found in any message. Total messages: 1. 0
t03_jupyter_write_file False Failed to cat /workspace/test.txt: cat: /workspace/test.txt: No such file or directory. 0
t04_git_staging False Failed to check for "nothing to commit, working tree clean": On branch master 0
No commits yet
Changes to be committed:
(use "git rm --cached ..." to unstage)
new file: hello.py.
t05_simple_browsing False The answer is not found in any message. Total messages: 1. 0
t01_fix_simple_typo False File not fixed: This is a stupid typoo. 0
Really?
No mor typos!
Enjoy!
t02_add_bash_hello False Failed to cat /workspace/hello.sh: cat: /workspace/hello.sh: No such file or directory. 0

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #17)
Commit: b92af16
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

Total cost: USD 0.12

instance_id success reason cost
t04_git_staging True 0.013284
t03_jupyter_write_file True 0.024609
t01_fix_simple_typo True 0.026226
t06_github_pr_browsing True 0.031864
t05_simple_browsing True 0.026767
t02_add_bash_hello True 0

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

Total cost: USD 0.02

instance_id success reason cost
t03_jupyter_write_file True 0.00208754
t05_simple_browsing True 0.00284452
t06_github_pr_browsing False The answer is not found in any message. Total messages: 2. 0.00235606
t04_git_staging True 0.00387268
t02_add_bash_hello True 0.00356118
t01_fix_simple_typo True 0.00362362

Download testing outputs (includes both Haiku and DeepSeek results): Download

Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link

Trigger by: Pull Request (integration-test label on PR #17)
Commit: 4b1b513
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

Total cost: USD 0.13

instance_id success reason cost
t04_git_staging True 0.014
t03_jupyter_write_file True 0.025
t01_fix_simple_typo True 0.026
t06_github_pr_browsing True 0.032
t05_simple_browsing True 0.032
t02_add_bash_hello True 0

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 100.00% (6/6)

Total cost: USD 0.02

instance_id success reason cost
t03_jupyter_write_file True 0.002
t05_simple_browsing True 0.003
t04_git_staging True 0.003
t02_add_bash_hello True 0.004
t01_fix_simple_typo True 0.004
t06_github_pr_browsing True 0.006

Download testing outputs (includes both Haiku and DeepSeek results): Download

@enyst enyst closed this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant