Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unexpected EOF" causes Google Cloud workflows to fail #3991

Closed
teor2345 opened this issue Mar 28, 2022 · 7 comments · Fixed by #4198 or #4206
Closed

"unexpected EOF" causes Google Cloud workflows to fail #3991

teor2345 opened this issue Mar 28, 2022 · 7 comments · Fixed by #4198 or #4206
Assignees
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles C-bug Category: This is a bug I-integration-fail Continuous integration fails, including build and test failures

Comments

@teor2345
Copy link
Contributor

teor2345 commented Mar 28, 2022

Motivation

Sometimes Google Cloud CI workflows fail with:

unexpected EOF
Error: Process completed with exit code 1.

https://github.com/ZcashFoundation/zebra/runs/5728525849?check_suite_focus=true

This seems to be happening a few times a day. It might be a Google Cloud change or infrastructure bug.

Suggested Fix

If the ssh command fails, wait a few seconds, then re-launch the command. Fail if the ssh command has failed more than 5-10 times.

This needs to be fixed in every workflow that follows Google Cloud logs.

@teor2345 teor2345 added C-bug Category: This is a bug A-devops Area: Pipelines, CI/CD and Dockerfiles S-needs-triage Status: A bug report needs triage P-High 🔥 I-integration-fail Continuous integration fails, including build and test failures labels Mar 28, 2022
@dconnolly
Copy link
Contributor

This is still happening, I don't think disk size is an issue, we have 100GB disks

@teor2345
Copy link
Contributor Author

I added a suggested fix to this ticket:

If the ssh command fails, wait a few seconds, then re-launch the command. Fail if the ssh command has failed more than 5-10 times.

@teor2345
Copy link
Contributor Author

This seems to be happening a lot, we might want to make it a high priority for the next sprint.

@dconnolly
Copy link
Contributor

Just seen again on a boring dependency update

@gustavovalverde
Copy link
Member

gustavovalverde commented Apr 22, 2022

I've been researching a bit about this, there's no error in the infrastructure side. I found some disperse Rust reports with similar issues, some of those ended up finding an issue with the logs parser. Maybe this could be related to a change I required to @jvff, to enable coloring in the terminal (?). I could temporarily disable the feature and see if this still happens.

@teor2345
Copy link
Contributor Author

@gustavovalverde can we try repeating the ssh command if it fails?

That might be a simple workaround until we can diagnose and fix the underlying issue.

@teor2345
Copy link
Contributor Author

This is a blocker for #4155, because these errors are failing every full sync test I try to run.

@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles C-bug Category: This is a bug I-integration-fail Continuous integration fails, including build and test failures
Projects
None yet
4 participants