Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci_pull fails spuriously #275

Closed
jamiees2 opened this issue Jun 19, 2023 · 6 comments
Closed

oci_pull fails spuriously #275

jamiees2 opened this issue Jun 19, 2023 · 6 comments
Assignees

Comments

@jamiees2
Copy link
Contributor

jamiees2 commented Jun 19, 2023

When using oci_pull as below:

    oci_pull(
        name = "oci_fluentd",
        # tag = "v1.16.0-debian-1.0",
        digest = "sha256:9c71b09b27421d7fe56dd9afb67138f0719d2a68d06b4ead2b1589222022fd6a",
        image = "public.ecr.aws/docker/library/fluentd",
        platforms = [
            "linux/amd64",
            "linux/arm64/v8",
        ],
    )

I get an error that the repository fetch failed.

INFO: Invocation ID: 38bad561-552e-43a7-84e5-57d92719e522
WARNING: Could not fetch the manifest. Either there was an authentication issue or trying to pull an image with OCI image media types.
Falling back to using `curl`. See https://github.com/bazelbuild/bazel/issues/17829 for the context.
WARNING: Could not fetch the manifest. Either there was an authentication issue or trying to pull an image with OCI image media types.
Falling back to using `curl`. See https://github.com/bazelbuild/bazel/issues/17829 for the context.

Repository rule oci_pull defined at:
  /private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl:435:27: in <toplevel>
ERROR: An error occurred during the fetch of repository 'oci_fluentd_linux_amd64':
   Traceback (most recent call last):
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 357, column 46, in _oci_pull_impl
		mf, mf_len = downloader.download_manifest(rctx.attr.identifier, "manifest.json")
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 280, column 74, in lambda
		download_manifest = lambda identifier, output: _download_manifest(rctx, state, identifier, output),
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 255, column 18, in _download_manifest
		_download(
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 223, column 23, in _download
		return download_fn(
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/download.bzl", line 107, column 29, in _download
		cache_it = rctx.download(
Error in download: com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException: Checksum was e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 but wanted 9c71b09b27421d7fe56dd9afb67138f0719d2a68d06b4ead2b1589222022fd6a
ERROR: /Users/jamessigurdarson/sandbox/bazel/WORKSPACE:38:20: fetching oci_pull rule //external:oci_fluentd_linux_amd64: Traceback (most recent call last):
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 357, column 46, in _oci_pull_impl
		mf, mf_len = downloader.download_manifest(rctx.attr.identifier, "manifest.json")
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 280, column 74, in lambda
		download_manifest = lambda identifier, output: _download_manifest(rctx, state, identifier, output),
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 255, column 18, in _download_manifest
		_download(
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/pull.bzl", line 223, column 23, in _download
		return download_fn(
	File "/private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/rules_oci/oci/private/download.bzl", line 107, column 29, in _download
		cache_it = rctx.download(
Error in download: com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException: Checksum was e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 but wanted 9c71b09b27421d7fe56dd9afb67138f0719d2a68d06b4ead2b1589222022fd6a
ERROR: /private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/oci_fluentd/BUILD.bazel:1:6: @oci_fluentd//:oci_fluentd depends on @oci_fluentd_linux_amd64//:oci_fluentd_linux_amd64 in repository @oci_fluentd_linux_amd64 which failed to fetch. no such package '@oci_fluentd_linux_amd64//': com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException: Checksum was e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 but wanted 9c71b09b27421d7fe56dd9afb67138f0719d2a68d06b4ead2b1589222022fd6a
ERROR: Analysis of target '//:oci_fluentd_image' failed; build aborted:
INFO: Elapsed time: 7.489s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (1 packages loaded, 10 targets configured)

This took me a while to figure out, but it turns out that this is partially because public.ecr.aws requires a www-authenticate header, so it gets passed to curl. I added debugging information in order to dig further into this, and found that the curl command is successful, and outputs the manifest to /private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/oci_fluentd_linux_amd64/.output/manifest.json.

The output there is:

cat /private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/oci_fluentd_linux_amd64/.output/manifest.json
{"manifests":[{"digest":"sha256:2b62dafd30ec556da7ce1bd29373b7d25e3d7e1a577262477c20d2cec7553850","mediaType":"application\/vnd.docker.distribution.manifest.v2+json","platform":{"architecture":"amd64","os":"linux"},"size":2202},{"digest":"sha256:958bf7bdeea456fba34521b2141eae053aef8344223587c67e416468b1b20453","mediaType":"application\/vnd.docker.distribution.manifest.v2+json","platform":{"architecture":"arm","os":"linux","variant":"v5"},"size":2201},{"digest":"sha256:638dfa126c2923a8d813fc7f5bf48652bdd1ce335e5cfec4e4bcdfe9dda0899a","mediaType":"application\/vnd.docker.distribution.manifest.v2+json","platform":{"architecture":"arm","os":"linux","variant":"v7"},"size":2201},{"digest":"sha256:265290eac3a3b0e730962af326206b41548e5cdc4dc62eab0aefc61ad410dda2","mediaType":"application\/vnd.docker.distribution.manifest.v2+json","platform":{"architecture":"arm64","os":"linux","variant":"v8"},"size":2201},{"digest":"sha256:d26fdb3de1a9fba3a67d64f2188514041596558b0fcc4bb297e31c69d4b796b1","mediaType":"application\/vnd.docker.distribution.manifest.v2+json","platform":{"architecture":"386","os":"linux"},"size":2202},{"digest":"sha256:de4a254772d7baa6b6acaf231d087f91036436022712625540ffb1b151c4f22d","mediaType":"application\/vnd.docker.distribution.manifest.v2+json","platform":{"architecture":"ppc64le","os":"linux"},"size":2202},{"digest":"sha256:3ea4a6edd4c873532726c47a17be26ed3ca7027055dfe1a5d4e1b6ab3589c9ba","mediaType":"application\/vnd.docker.distribution.manifest.v2+json","platform":{"architecture":"s390x","os":"linux"},"size":2201}],"mediaType":"application\/vnd.docker.distribution.manifest.list.v2+json","schemaVersion":2}

For completeness, the header.txt file is also

cat /private/var/tmp/_bazel_jamessigurdarson/51f04610ee38952253ec4b47d86a991d/external/oci_fluentd_linux_amd64/.output/header.txt
HTTP/2 200
date: Mon, 19 Jun 2023 14:07:32 GMT
content-type: application/vnd.docker.distribution.manifest.list.v2+json
content-length: 1645
docker-distribution-api-version: registry/2.0

The curl command is successful, and fetches the manifest, but the download at

cache_it = rctx.download(
url = "file://{}".format(output_path),
output = output,
executable = executable,
allow_fail = allow_fail,
canonical_id = canonical_id,
integrity = integrity,
sha256 = sha256,
)
gets an empty file, since the sha256 of an empty string echo -n "" | sha256sum is e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.

This seems like a race condition to me, the curl command is functional and works as intended, but the download rule gets an empty file?

What's more is that when I manually fetch this, using bazel fetch @oci_fluentd, it is successful, and from then on, the later bazel build //:oci_fluentd_image is also successful.

Do you have any ideas as to what caused this / what we could do to avoid this race condition?

@alexeagle
Copy link
Collaborator

That is very surprising that bazel fetch gets a different result from running that repository rule than bazel build.

Note that public.ecr.aws appears in our pull code as having a special-case authentication: https://github.com/bazel-contrib/rules_oci/blob/v1.0.0/oci/private/pull.bzl#L42

I see a related change landed five days ago - does using rules_oci at HEAD make a difference?

@jamiees2
Copy link
Contributor Author

It doesn't, this was actually using HEAD, since I had forked it to add more debug logging.

@jamiees2
Copy link
Contributor Author

jamiees2 commented Jun 19, 2023

Initially when I ran into this, I had just pushed some images before to public ECR, and was therefore authenticated in Docker's config.json.
That is, I had run
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws.

I did run docker logout public.ecr.aws after the first time this failed though, and the issue still persisted. However, maybe some bad state got cached?

@jamiees2
Copy link
Contributor Author

This turned out to be exactly bazelbuild/bazel#17771, we were using --experimental_remote_downloader. Either removing the remote downloader flag, or using --experimental_remote_downloader_local_fallback as well fixes this.

@jamiees2
Copy link
Contributor Author

Full credit to @thesayyn for identifying the issue!

@plaird
Copy link

plaird commented Dec 2, 2023

In case this helps others out there...

We are seeing a variant of this issue. A small number of our users see a failure because the manifest.json for an image is being pulled with the v1 schema, whereas our build is expecting the v2 schema, and so the checksums don't match and the download fails.

Error:

Error in download: java.io.IOException: Error downloading [file:/private/var/tmp/_bazel_mbenioff/67718cf62799242089faf32dd567c106/external/oci_docker_our_image_single/.output/manifest.json] to /private/var/tmp/_bazel_mbenioff/67718cf62799242089faf32dd567c106/external/oci_docker_our_image_single/manifest.json: Checksum was a9d0f83502c3b1813e9e0a179be6e1d122e90a45500af428f75841ba7e9937f4 but wanted 74715ac0845aa96b8470a121defa8f96ecbcb360281c4585aa7e1e6b5c897412

manifest.json v1 schema (got):

{
  "schemaVersion" : 1,
  ...

manifest.json v2 schema (expected):

{
   "schemaVersion": 2,
   ...

The same --experimental_remote_downloader_local_fallback=true workaround works though (so far). Unfortunately, the affected users reported that bazel clean did not solve it.

The workaround was inspired by finding this code here:
pull.bzl: Registry responded with a manifest that has schemaVersion=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants