Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests runs #9

Open
enyst opened this issue Nov 25, 2024 · 4 comments
Open

Integration tests runs #9

enyst opened this issue Nov 25, 2024 · 4 comments

Comments

@enyst
Copy link
Owner

enyst commented Nov 25, 2024

Summary

This is a placeholder issue for nightly integration tests results.

Copy link

Trigger by: Manual Trigger: test without PR
Commit: 6ddae01
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t01_fix_simple_typo True
t02_add_bash_hello True
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 66.67% (4/6)

instance_id success reason
t05_simple_browsing True
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.
t04_git_staging False Failed to check for "nothing to commit, working tree clean": On branch initial-commit
Untracked files:
(use "git add ..." to include in what will be committed)
push.log
nothing added to commit but untracked files present (use "git add" to track).

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Copy link

Trigger by: Nightly Scheduled Run
Commit: 6ddae01
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Copy link

Trigger by: Nightly Scheduled Run
Commit: 71fc115
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t02_add_bash_hello True
t03_jupyter_write_file True
t01_fix_simple_typo True
t05_simple_browsing True
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t05_simple_browsing True
t03_jupyter_write_file True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Copy link

Trigger by: Nightly Scheduled Run
Commit: 91135d7
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant