Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloads can be truncated and report an incorrect sha256sum #12010

Closed
b0ri5 opened this issue Aug 26, 2020 · 14 comments
Closed

Downloads can be truncated and report an incorrect sha256sum #12010

b0ri5 opened this issue Aug 26, 2020 · 14 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@b0ri5
Copy link

b0ri5 commented Aug 26, 2020

Description of the problem / feature request:

A CircleCI (continuous integration service) build seems to have downloaded a truncated version of the file https://dl.google.com/go/go1.15.linux-amd64.tar.gz and thought the download was complete, computed the sha256sum and reported that it was different than what was expected, which is the published version currently at https://golang.org/dl/.

Expected sha256sum: 2d75848ac606061efe52a8068d0e647b35ce487a15bb52272c427df485193602
Unexpected sha256sum: 22c0e73b372fba0186c83b1b8d6572946b3d50415a5d1eb2699695dff96dd0ec

I asked internally and someone figured out that
$ head -c 3407856 go1.15.linux-amd64.tar.gz | sha256sum
22c0e73b372fba0186c83b1b8d6572946b3d50415a5d1eb2699695dff96dd0ec

produces the unexpected hash. To find the full internal conversation search internally for the unxpected hash.

Their recommendation was to audit the error reporting code that does the downloading since the code seems to have considered the request complete before the Content-Size header was fulfilled.

My expectation is that rather than reporting the wrong hash, that some other error be reported if the file could not be downloaded.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Audit the download code and try to write a unit test for anything that could cause a truncated file to be hashed.

What operating system are you running Bazel on?

Linux? This occurred on a CircleCI machine.

What's the output of bazel info release?

I'm not sure, probably the latest release as I was using bazelisk.

Have you found anything relevant by searching the web?

No, I couldn't find anything else matching the error.

Any other information, logs, or outputs that you want to share?

Attached the logs.
bazel-wrong-sha.txt

@oquenchil oquenchil added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug untriaged labels Aug 27, 2020
@philwo philwo added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Oct 8, 2020
@philwo
Copy link
Member

philwo commented Oct 8, 2020

Seems like we just hit this on Bazel CI: https://buildkite.com/bazel/bazel-bazel/builds/14172#d7d32a9c-47f3-4d71-87cb-d2ccd872b831

(17:02:08) WARNING: Download from https://mirror.bazel.build/cdn.azul.com/zulu/bin/zulu14.28.21-ca-jdk14.0.1-win_x64.zip failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException Checksum was ec5ad62f12d4cff655ce3b56e16e02773309591c17c9e9ed94cb5cb793973e96 but wanted 9cb078b5026a900d61239c866161f0d9558ec759aa15c5b4c7e905370e868284
--
  | (17:02:08) ERROR: An error occurred during the fetch of repository 'openjdk14_windows_archive':
  | Traceback (most recent call last):
  | File "/workdir/tools/build_defs/repo/http.bzl", line 111, column 45, in _http_archive_impl
  | download_info = ctx.download_and_extract(
  | Error in download_and_extract: java.io.IOException: Error downloading [https://mirror.bazel.build/cdn.azul.com/zulu/bin/zulu14.28.21-ca-jdk14.0.1-win_x64.zip] to /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/ec321eb2cc2d0f8f91b676b6d4c66c29/external/openjdk14_windows_archive/zulu14.28.21-ca-jdk14.0.1-win_x64.zip: Checksum was ec5ad62f12d4cff655ce3b56e16e02773309591c17c9e9ed94cb5cb793973e96 but wanted 9cb078b5026a900d61239c866161f0d9558ec759aa15c5b4c7e905370e868284
  | (17:02:08) ERROR: /workdir/src/BUILD:743:10: //src:test_repos depends on @openjdk14_windows_archive//:WORKSPACE in repository @openjdk14_windows_archive which failed to fetch. no such package '@openjdk14_windows_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/cdn.azul.com/zulu/bin/zulu14.28.21-ca-jdk14.0.1-win_x64.zip] to /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/ec321eb2cc2d0f8f91b676b6d4c66c29/external/openjdk14_windows_archive/zulu14.28.21-ca-jdk14.0.1-win_x64.zip: Checksum was ec5ad62f12d4cff655ce3b56e16e02773309591c17c9e9ed94cb5cb793973e96 but wanted 9cb078b5026a900d61239c866161f0d9558ec759aa15c5b4c7e905370e868284

@b0ri5 Do you remember if your log looked the same?

@philwo
Copy link
Member

philwo commented Oct 8, 2020

I found the log thanks to your hint "search internally for the hash"! 🕵️

That looks like the same issue. Bumping this to P1 - we should log a better error and be smarter when the download unexpectedly fails (retry?).

@philwo philwo added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels Oct 8, 2020
@philwo philwo self-assigned this Nov 9, 2020
@philwo philwo added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Dec 8, 2020
@philwo philwo removed their assignment Dec 8, 2020
@philwo philwo added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Dec 8, 2020
@brentleyjones
Copy link
Contributor

We've started to hit this massively today for https://dl.google.com/go/go1.16.5.linux-amd64.tar.gz.

@brentleyjones
Copy link
Contributor

Our logs look like this (with a random SHA each time, since it's not completing the download):

(18:13:15) INFO: Repository go_sdk instantiated at:
--
  | /build/WORKSPACE:128:23: in <toplevel>
  | /root/.cache/bazel/_bazel_root/7b7747ec045ae606eb720a1222f56098/external/io_bazel_rules_go/go/private/sdk.bzl:453:28: in go_register_toolchains
  | /root/.cache/bazel/_bazel_root/7b7747ec045ae606eb720a1222f56098/external/io_bazel_rules_go/go/private/sdk.bzl:129:21: in go_download_sdk
  | Repository rule _go_download_sdk defined at:
  | /root/.cache/bazel/_bazel_root/7b7747ec045ae606eb720a1222f56098/external/io_bazel_rules_go/go/private/sdk.bzl:116:35: in <toplevel>
  | (18:13:15) WARNING: Download from https://dl.google.com/go/go1.16.5.linux-amd64.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException Checksum was aac827b456921bd38dfd02279464a1603be465b4512cea30d6cb669e026a2564 but wanted b12c23023b68de22f74c0524f10b753e7b08b1504cb7e417eccebdd3fae49061
  | (18:13:15) ERROR: An error occurred during the fetch of repository 'go_sdk':
  | Traceback (most recent call last):
  | File "/root/.cache/bazel/_bazel_root/7b7747ec045ae606eb720a1222f56098/external/io_bazel_rules_go/go/private/sdk.bzl", line 100, column 16, in _go_download_sdk_impl
  | _remote_sdk(ctx, [url.format(filename) for url in ctx.attr.urls], ctx.attr.strip_prefix, sha256)
  | File "/root/.cache/bazel/_bazel_root/7b7747ec045ae606eb720a1222f56098/external/io_bazel_rules_go/go/private/sdk.bzl", line 189, column 21, in _remote_sdk
  | ctx.download(
  | Error in download: java.io.IOException: Error downloading [https://dl.google.com/go/go1.16.5.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/7b7747ec045ae606eb720a1222f56098/external/go_sdk/go_sdk.tar.gz: Checksum was aac827b456921bd38dfd02279464a1603be465b4512cea30d6cb669e026a2564 but wanted b12c23023b68de22f74c0524f10b753e7b08b1504cb7e417eccebdd3fae49061

@keith
Copy link
Member

keith commented Aug 30, 2021

What's the retry logic in this case? Seems like for us that is often good enough

@coeuvre
Copy link
Member

coeuvre commented Aug 30, 2021

By looking at the code, the downloader retries only if server supports Accept-Ranges: bytes. The retry logic is here.

The mentioned server does support range requests:

❯ curl --head https://dl.google.com/go/go1.16.5.linux-amd64.tar.gz
HTTP/2 200
accept-ranges: bytes
...

@brentleyjones
Copy link
Contributor

Hmm, but it's not retrying for us (@keith meant that if we retry the build it will usually work).

@coeuvre
Copy link
Member

coeuvre commented Sep 1, 2021

Did you hit the same error for other links as well or just this specific one?

@brentleyjones
Copy link
Contributor

brentleyjones commented Sep 1, 2021

Also zulu11:

WARNING: Download from https://mirror.bazel.build/openjdk/azul-zulu11.37.17-ca-jdk11.0.6/zulu11.37.17-ca-jdk11.0.6-macosx_x64.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException Checksum was c26647ddd42399f514b33d75af2ed61ca0dd71b799d60f8960108eaeb71a0b1c but wanted e1fe56769f32e2aaac95e0a8f86b5a323da5af3a3b4bba73f3086391a6cc056f

Of note: both are google hosted.

@coeuvre
Copy link
Member

coeuvre commented Sep 8, 2021

It's hard to reproduce locally but I can confirm the retry logic is enabled for those links. Not sure why retry didn't work.

What do you think if we retry the entire download when content-length is not fulfilled at DownloadManager? (instead of reconnecting to where we left off, we use a fresh connection)

@brentleyjones
Copy link
Contributor

This has gotten really bad for us, many times an hour. I'm up for whatever might allow these to not fail a build. I'm wondering if accept-ranges: bytes sometimes doesn't come through. Maybe only in that case we should use a fresh connection? A full retry is fine for us as well (though maybe behind a flag since it could be costly for some?).

@coeuvre
Copy link
Member

coeuvre commented Sep 9, 2021

The linked PR will retry ContentLength mismatch error. Let's see whether it helps.

@coeuvre
Copy link
Member

coeuvre commented Sep 10, 2021

Please reopen if that flag doesn't help.

@jkugler
Copy link

jkugler commented Oct 7, 2021

Will this be back-ported to the 4.x branch?

@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
7 participants