-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buildx failed with: ERROR: failed to solve: failed to push ghcr.io/finchsec/kali:latest: failed to copy: io: read/write on closed pipe #761
Comments
This is happening bit too often for us since last week. Found a similar issue in |
Does switching to BuildKit 0.10.6 solves the issue in the meantime? -
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
driver-opts: |
image=moby/buildkit:v0.10.6 Also is it only happening when pushing to |
Digging into the old actions logs, I can narrow it down a bit. It was first appeared onto my GHA with moby/buildkit@01c0d67. Here comes pushlog moby/buildkit@e8dac6c...01c0d67 |
Thanks for the suggestion. I will give it a try!
We are currently hitting it only at |
I split the action so docker and GitHub are separate, and I'm trying that specific version of BuildKit for GitHub. Let's see if it keeps succeeding for the next 2 or 3 weeks.
Yes. Pushing to Docker is fine. It just randomly fails with ghcr.io. It failed again at midnight (before the change mentioned above). |
@tomdot-dev If you could also enable debug, that would help: - name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
buildkitd-flags: --debug https://docs.docker.com/build/ci/github-actions/configure-builder/#buildkit-container-logs
Any idea @kaminipandey? |
@crazy-max I created a test image to reproduce the scenario and hit it couple of times already: Raw logs (debug)
Not sure if this explains the root cause but please let me know if you think we can try something else? Also I did couple of runs with |
Same happens to my deploys every other time since 3.2.0, and restarting failed jobs helped all the time until today, we had to revert the changes from prod in order to fix the issue. Think this issue is pretty critical, in the meantime I'll downgrade moby driver or action. We push to ghcr and heroku, and heroku caused most of issues after 3.2.0 update, not being able to accept images with provenance and manifests. |
Been facing similar issue while pushing to |
I can confirm that pipelines previously failing to push to with:
driver-opts: |
image=moby/buildkit:v0.10.6 to my workflow(s). |
@crazy-max: Done. |
see docker/build-push-action#761 Signed-off-by: Artur Troian <[email protected]>
see docker/build-push-action#761 Signed-off-by: Artur Troian <[email protected]>
Facing similar issue when trying to push to gcr.io from |
We're having the same intermittent problem since the 12th of January. It seems to be random which images fail with this error. Re-running the jobs, sometimes a few times, makes it succeed. |
We are having this issue as well:
Here is full logs... And It only happens on ghcr.io not on dockerhub. And only seems to be for larger containers. We have several that take 2 to 5 minutes to build that are always fine. The ones that take 10+ minutes to build get the failures. |
Also hitting this problem when pushing to GCP Artifact Registry |
When building several images, some of them fail with 'io: read/write on closed pipe' error and cause the whole job to fail. The issues seems to be in build-push-action: docker/build-push-action#761
This solves random build/push issues. See: docker/build-push-action#761 (comment) Signed-off-by: Blaine Gardner <[email protected]>
This solves random build/push issues. See: docker/build-push-action#761 (comment) Signed-off-by: Blaine Gardner <[email protected]>
This solves random build/push issues. See: docker/build-push-action#761 (comment) Signed-off-by: Blaine Gardner <[email protected]>
This solves random build/push issues. See: docker/build-push-action#761 (comment) Signed-off-by: Blaine Gardner <[email protected]>
Thanks for all the work on buildkit and this investigation! We're still seeing this issue after we removed the pin. For a while we didnt see an issue and we don't see an error often but it pops up every now and then resulting in a failed push. After pr merge runatlantis/atlantis@59bc9c5 Failed run https://github.com/runatlantis/atlantis/actions/runs/5139979174/jobs/9250992369 Unpinned buildx https://github.com/runatlantis/atlantis/blob/3468f58d1e1a46c77d6acc053aeda548e8626399/.github/workflows/atlantis-image.yml#L52 so i assume it uses v0.11.6 (current latest). As a workaround, we may downgrade back to buildkit v0.11.2 to see if this rarity stops occurring. If that doesn't work then i suppose we may downgrade further back to v0.10.6 as mentioned above. It might be good to pin this dependency anyway for consistent builds and explicit dependency management. |
I can second the reappearance of the |
Agh. Which release do you start to see the issue again in? I don't think there should be anything that major between our patch releases (though I could be very wrong). I know there are some rare race conditions that I've fixed in containerd/containerd#8379 -- I'm slightly surprised that these can be hit in practice though. These kind of issues can be insanely painful to debug, since it's not just buildx or containerd here, but randomness and racyness with the connection to the registry, which can sometimes be caused by a CI issue or a registry issue. |
FYI, I don't know if it is GHCR's service or tooling, but I removed all GHCR posting, just too unreliable with the broken pipe errors that result in failed builds, no such problems with docker.io. |
I'm getting this again also:
From https://github.com/NebraLtd/hm-diag/actions/runs/5222474024/jobs/9428208081?pr=608 But only for ghcr. Not docker hub |
@shawaj this is a different issue - you can see in the logs that we attempt a push 5 times, and then still fail anyways. There's really not a lot buildkit can do in this case, GHCR has just hung up on us at that point - if it's proving to be a huge issue, I'd recommend raising this directly with GitHub. This issue is really only about the |
Same here, maybe this is a lead? (Additionally one of those is multi-arch.) Shall we reopen this issue or create a new one? Edit: It seems to have vanished again, I'll report back / open new issue in case it reappears. |
@credativ-dar feel free to spin this out into a new issue. I'm not 100% sure this is related to the original cause. |
For what it's worth I'm getting similar errors when using github container caching, so may be related to github infra when runnin matrix operations.
|
In the
In both we see this error
|
We're seeing the same issues as well. Trying to pin back. |
I experienced similar issues in a repository of mine using the following setup:
Pinning Is this not an issue with buildx itself? I can understand if this is an issue with GitHub's tooling causing the error, but if buildx was previously able to gracefully handle the error and it now isn't, wouldn't that be some kind of regression in buildx? |
For the new people facing this issue, I was one of the original users getting it. I am not getting this error anymore, but what I did recently see is that my large, multiarch images, were failing to build altogether. The cause was that the runner was out of disk space. GitHub must changed some stuff around in their image as I never faced that issue before. Anyway, maybe it's possible that the disk is out of space causing part of this action to fail again? This was my solution. Add something like this as the first step to your job. - name: Maximize build space
uses: easimon/maximize-build-space@v7
with:
root-reserve-mb: 30720 # https://github.com/easimon/maximize-build-space#caveats
remove-dotnet: 'true'
remove-android: 'true'
remove-haskell: 'true'
remove-codeql: 'true'
remove-docker-images: 'true' Root reserve starts with about 24 GB, but was not enough in my case, so I increased it to 30 GB. |
We are having this problem as well, every day a dozen PR each having half a dozen of failing jobs due due to the issue discussed here. Surprised that it's been 6 months and there is no other fix than this workaround has been found. |
If some of you are willing to test with latest stable (not yet promoted):
I'm locking this issue as it's a bit heated and doesn't provide more context with BuildKit logs except one comment #761 (comment) which has been addressed with a fix in containerd: #761 (comment). But feel free to open a new issue with BuildKit debug logs if it still occurs, thanks! |
Troubleshooting
It's not in the short troubleshooting guide.
Behaviour
I build a docker container at midnight every night (cron) and push it to Docker and GitHub repositories.
Steps to reproduce this issue
There aren't really steps on how to reproduce the issue as it works fine, but fails intermittently when on schedule.
Expected behaviour
It should push fine to GitHub
Actual behaviour
It works fine when pushing to the repository. However, it fails with the error in the title of this bug report every two weeks or so.
Configuration
Logs
logs_84.zip
Excerpt:
The text was updated successfully, but these errors were encountered: