Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerd / buildkit in a infinite loop and burning cpu #1313

Closed
bpaquet opened this issue Jan 5, 2020 · 10 comments · Fixed by #1382
Closed

Dockerd / buildkit in a infinite loop and burning cpu #1313

bpaquet opened this issue Jan 5, 2020 · 10 comments · Fixed by #1382

Comments

@bpaquet
Copy link
Contributor

bpaquet commented Jan 5, 2020

Hello,

While trying to use buildkit through docker build, my dockerd daemon seems to go in a infinite loop and the docker build is hanging.

Version: Docker version 19.03.5, build 633a0ea838
Processes (ps axu | grep docker):

root      5822 96.6  0.9 2460340 290756 ?      Ssl  18:29  45:16 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
tc0      26084  0.2  0.2 1080756 72472 ?       Sl   18:51   0:03 docker build --build-arg ...

While looking into the trace, I found lot of occurence of github.com/docker/docker/vendor/github.com/moby/buildkit/solver.(*cacheManager).filterResults:173
But I'm not a specialist of this kind of trace, so may be I'm not reading it right.

The trace is here: out.gz (remove the .gz if needed).
Trace extracted with curl --unix-socket /var/run/docker.sock http://./debug/pprof/trace

The CPU usage does not go down when I stop the docker build process (still using 100%).

@tonistiigi I already had an issue around cache and LoadWithParents (#1250, fixed by you, thx), while building the same Dockerfile.

Let me know if you have any idea about the issue and what I can do to dig further.

@tonistiigi
Copy link
Member

Do you have a reproducer you can share?

@bpaquet
Copy link
Contributor Author

bpaquet commented Jan 8, 2020

Thx a lot for answering :)
I can easily reproduce it on my env, it's systematic.I cannot share the Dockerfile. It's probably due to the fact I'm launching multiple stages in parallel (outside of buildkit, using make -j8, and multiple images in --cache-from). Not a standard use, but it should work.

I did some tests by recompiling docker.
I confirm the code is entering in filterResult function from here, and is never going out.

What can I do to provide more information ? I added logrus.Errorf("FilterResults: %v %v %v", id, m, ck) at the begining of filterResults. I see a loop. The function is called again and again with the same id. I can provide this log.

@tonistiigi
Copy link
Member

I can't easily see what could cause the id to repeat there so it must be something with your data. I also don't see if this is from imported or local cache without additional info. Could it be something like a duplicate copy instruction that triggers this?

@bpaquet
Copy link
Contributor Author

bpaquet commented Jan 8, 2020

I can't easily see what could cause the id to repeat there so it must be something with your data. I also don't see if this is from imported or local cache without additional info. Could it be something like a duplicate copy instruction that triggers this?

I have that:

COPY --from=assets_builder /app/public/assets /app/public/assets
COPY --from=assets_builder /app/public/webpack /app/public/webpack

And yes, the problem seems to be just after.

@tonistiigi
Copy link
Member

That's not exactly what I had in mind but could be related still. Can you try if you can put together a reproducer based on similar snippets to ease the debugging of this issue?

@bpaquet
Copy link
Contributor Author

bpaquet commented Jan 9, 2020

I did a lot of tests, I confirm the problem disappear if I remove the second copy.

I'm still trying to have a simple case failing, but did not succeed to write it :(

@tonistiigi
Copy link
Member

tonistiigi commented Feb 24, 2020

I think this might be same as #1336 cc @aiordache

@tonistiigi
Copy link
Member

@bpaquet Can you check if this is the same as #1336 and #1386 fixes this issue. If not then please post a stacktrace or a reproducer for your issue so it can be looked further.

@bpaquet
Copy link
Contributor Author

bpaquet commented Mar 3, 2020 via email

@tonistiigi
Copy link
Member

fixed with #1413, report if you still see anything like this

thaJeztah added a commit to thaJeztah/docker that referenced this issue Jul 8, 2020
….6.4-15-gdc6afa0f)

full diff: moby/buildkit@a7d7b7f...dc6afa0

- solver: avoid recursive loop on cache-export
    - fixes moby/buildkit#1336 --export-cache option crashes buildkitd on custom frontend
    - fixes moby/buildkit#1313 Dockerd / buildkit in a infinite loop and burning cpu
    - fixes / addresses moby#41044 19.03.9 goroutine stack exceeds 1000000000-byte limit
    - fixes / addresses moby#40993 Multistage docker build fails with unexpected EOF

Signed-off-by: Sebastiaan van Stijn <[email protected]>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this issue Jul 9, 2020
….6.4-15-gdc6afa0f)

full diff: moby/buildkit@a7d7b7f...dc6afa0

- solver: avoid recursive loop on cache-export
    - fixes moby/buildkit#1336 --export-cache option crashes buildkitd on custom frontend
    - fixes moby/buildkit#1313 Dockerd / buildkit in a infinite loop and burning cpu
    - fixes / addresses moby/moby#41044 19.03.9 goroutine stack exceeds 1000000000-byte limit
    - fixes / addresses moby/moby#40993 Multistage docker build fails with unexpected EOF

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Upstream-commit: e7c2b106ec7785fcb54b1cf80258a2bea25ed020
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants