Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing large images to Dockerhub 504 #920

Closed
dwillist opened this issue Jan 22, 2021 · 8 comments · Fixed by #923
Closed

Pushing large images to Dockerhub 504 #920

dwillist opened this issue Jan 22, 2021 · 8 comments · Fixed by #923

Comments

@dwillist
Copy link

I'm consistently getting 504 errors when pushing large layers to Dockerhub using this library. It seems like this has to do with the interplay between the following:

  • gzip writers do compression on demand.
  • http requests send whatever data is available after a read.
  • io.Pipecreated here to read the compressed layer and upload it here matches single reads to single writes.

As a result we send a bunch of small chunked pieces of a PATCH request which Dockerhub does not like.

Attached is a program I can used to consistently reproduce. by running go run . <dockerhub-image-tag> <highly-compressible-file>. If I bump up the BufferSize enough I can get these uploads to consistently pass.

ggcr-bug.tar.gz

I am using the enwiki8 or enwiki9 as my highly compressible file both are quite large and can be downloaded from: http://mattmahoney.net/dc/textdata

@jonjohnsonjr
Copy link
Collaborator

Attached is a program I can used to consistently reproduce.

Thank you for this. It saves me a lot of time. I'm taking a few days vacation, but let me see if I can reproduce this next week. In the meantime, enabling verbose logging might be helpful:

logs.Debug.SetOutput(os.Stderr)

Though, I recently started omitting binary bodies, so that might not help :)

@jonjohnsonjr
Copy link
Collaborator

This is fascinating. I wasn't able to reproduce at first (internet is too fast, fortunately), but adding a time.Sleep(100 * time.Millisecond) in the Read was sufficient to trigger this.

Docker Hub will reply with a 504 after 30 seconds, every time. I'll keep poking at it, but I'm guessing there's some threshold where they close the connection after 30 seconds if they haven't received enough data (to prevent something like a Slowloris attack) -- we can fix this by doing some buffering client side (which will probably speed up everything, a bit), but I'd like to understand more about this before just fixing it.

@jonjohnsonjr
Copy link
Collaborator

Seems like if you're uploading at less than ~450 KB/s for 30s, the connection will get closed (at least on my machine). We can fix this by implementing a janky readahead thing in our gzip reader.

@jonjohnsonjr
Copy link
Collaborator

From your example, we seem to be reading only 240 bytes at a time, which seems... bad. If I use a random source, we're reading ~32K bytes at a time.

I'll add some buffering so that gzip can keep writing to a buffer while we wait for the internet.

@jonjohnsonjr
Copy link
Collaborator

Thank you so much for reporting this -- looks like this was a bottleneck for more than just you!

@dwillist
Copy link
Author

Hey thanks for looking into this & pushing out a fix so quickly 👍

@jonjohnsonjr
Copy link
Collaborator

No problem -- this kind of bug keeps me up at night, so I'm pretty motivated to fix it 😄

Can you confirm that the PR does fix your issue?

@dwillist
Copy link
Author

Yep. #923 fixes it. Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants