Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman can't load some tars that docker can. #8132

Closed
matejvasek opened this issue Oct 25, 2020 · 9 comments
Closed

Podman can't load some tars that docker can. #8132

matejvasek opened this issue Oct 25, 2020 · 9 comments
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@matejvasek
Copy link
Contributor

One tool I use (pack) seems to produce tar files that are accepted by docker load < a.tar, but podman load < a.tar fails with message:

  Layer tarfile . used for two different DiffID values
  Layer tarfile . used for two different DiffID values
  open /var/tmp/podman506265764/manifest.json: not a directory
  open /var/tmp/podman506265764/index.json: not a directory
Error: error pulling "": Invalid image localhost/

The thing is the archive manifest.json contains "empty layers",

[{"Config":"174bb363c5c85d471af5cabf71e1d0063bc33a3fea9b2caa7512d489a84535e7.json","Layers":["","","","","","","","","","","","","/fb712908333342fbc1daf8c1f5112318f1601f2ad30fa37facf1a41732410025.tar","/56bedd2e790a74634785a53a0b8c9046616068e8e0c3f75ad8cfcf3b493bffa0.tar","/93db97a8614fa09e787905fd1fb1975652548a5dc48ec14fa9463d17e1fc70a6.tar"],"RepoTags":["pack.local/builder/64736669626b65736878:latest"]}]

The config contain expected count of diff_ids.

{"architecture":"","created":"1980-01-01T00:00:01Z","history":[{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"},{"created":"1980-01-01T00:00:01Z"}],"os":"","rootfs":{"type":"layers","diff_ids":["sha256:ccf04fbd6e1943f648d1c2980e96038edc02b543c597556098ab2bcaa4fd1fa8","sha256:b7b591e3443f17f9d8272b8d118b6c031ca826deb09d4b44f296ba934f1b6e57","sha256:af163f426cb3cce357311ad6a57b4b6008a7e331a61ed68a3357944dd078c76b","sha256:767c14b968437b3c055bf6034ce4028ac93ecb1c275f84d69a1fd85c595b9242","sha256:278473c074dd9fe9dd2a84f209c38920a921b773f7ebdaf3174ca044ac0d974c","sha256:5f136bfd5d4a38dd0238c962dc313383567ff9142583b7d04e41397e20a49ac6","sha256:25bce695b07044f5620a92c203aebb085cae8664d7c9b136f9f3be37957b6409","sha256:b4b70738b31ec0b1a81a832c3a6f20475a0cd3b890f674cbba7709595d44c304","sha256:26e65d1a42501c2a74eb083bff17e601b41ed91512dcb904aaa7c43af566458b","sha256:d9ccfddda7b7716eaf2a421b21ca2bd1bc05c4cbb5a3fcd03147a0b255f29951","sha256:ae9412feab77b048406805336bcae4b2640bd03b907327e129cdd7385195c01a","sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef","sha256:25bce695b07044f5620a92c203aebb085cae8664d7c9b136f9f3be37957b6409","sha256:ae9412feab77b048406805336bcae4b2640bd03b907327e129cdd7385195c01a","sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"]},"config":{"Cmd":["/bin/bash"],"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin","container=oci","CNB_USER_ID=1000","CNB_GROUP_ID=1001","CNB_STACK_ID=com.redhat.faas.stacks.go","HOME=/projects/go-function"],"Labels":{"architecture":"x86_64","build-date":"2020-09-01T19:43:46.041620","com.redhat.build-host":"cpt-1008.osbs.prod.upshift.rdu2.redhat.com","com.redhat.component":"ubi8-container","com.redhat.license_terms":"https://www.redhat.com/en/about/red-hat-end-user-license-agreements#UBI","description":"The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.","distribution-scope":"public","io.buildpacks.builder.metadata":"{\"description\":\"\",\"buildpacks\":[{\"id\":\"com.redhat.faas.go\",\"version\":\"0.0.1\"}],\"stack\":{\"runImage\":{\"image\":\"quay.io/boson/faas-stack-run:go-v0.4.0\",\"mirrors\":null}},\"lifecycle\":{\"version\":\"0.9.1\",\"api\":{\"buildpack\":\"0.2\",\"platform\":\"0.3\"}},\"createdBy\":{\"name\":\"Pack CLI\",\"version\":\"0.0.0\"}}","io.buildpacks.buildpack.layers":"{\"com.redhat.faas.go\":{\"0.0.1\":{\"api\":\"0.2\",\"stacks\":[{\"id\":\"com.redhat.faas.stacks.go\"}],\"layerDiffID\":\"sha256:26e65d1a42501c2a74eb083bff17e601b41ed91512dcb904aaa7c43af566458b\"}}}","io.buildpacks.buildpack.order":"[{\"group\":[{\"id\":\"com.redhat.faas.go\"}]}]","io.buildpacks.stack.id":"com.redhat.faas.stacks.go","io.buildpacks.stack.mixins":"null","io.k8s.description":"The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.","io.k8s.display-name":"Red Hat Universal Base Image 8","io.openshift.expose-services":"","io.openshift.tags":"base rhel8","maintainer":"Red Hat, Inc.","name":"ubi8","release":"347","summary":"Provides the latest release of Red Hat Universal Base Image 8.","url":"https://access.redhat.com/containers/#/registry.access.redhat.com/ubi8/images/8.2-347","vcs-ref":"663db861f0ff7a9c526c1c169a62c14c01a32dcc","vcs-type":"git","vendor":"Red Hat, Inc.","version":"8.2"},"User":"cnb","WorkingDir":"/layers"}}

The error seems to came from validation in github.com/containers/image/v5/docker/internal/tarfile/src.go

There is another odd thing about the archive, some items in it seems to have absolute path, which may cause problem to arch linux util, but I don't think it's causing troubles to podman.

I am not sure if this is bug, or docker is really permissive.

@matejvasek
Copy link
Contributor Author

@zhangguanzhang
Copy link
Collaborator

@mtrmac PTAL

@mtrmac
Copy link
Collaborator

mtrmac commented Oct 26, 2020

At least reading https://github.com/moby/moby/blob/master/image/spec/v1.2.md , I can’t see anything to suggest that "" is a valid layer path.


Looking at the bottom layer ccf04fbd6e1943f648d1c2980e96038edc02b543c597556098ab2bcaa4fd1fa8, that comes from https://catalog.redhat.com/software/containers/ubi8/5c647760bed8bd28d0e38f9f?tag=8.2-347&container-tabs=gti and inspecting it shows that that layer contains 73 MB of data when compressed.

It’s just not valid to drop 73 MB of data from an image and expect it to load.

And, actually, Docker doesn't:

$ docker load < …/api_load.tar799613769 
ccf04fbd6e19: Loading layer [>                                                  ]      0 B/4.096 kB
read /var/lib/docker/tmp/docker-import-887967241: is a directory

Apparently someone figured out that the Docker implementation mostly ignores the file paths if the layers are already present on the local machine, so it is possible to create archives that only contain the top layers if the destination already contains the parent layers. I guess congratulations for being clever, but that’s never been valid in the format or supported in any sense, as the unhelpful error message above suggests.

The c/image implementation works differently, and to generate a consistent internal representation, it needs to compute the sizes of all layers, and that’s impossible if most layers are just plain missing.


A “simple” fix is to use full docker save-formatted archives instead of this stripped-down invalid variant.

But I’d strongly recommend to use an actual registry (even if running only shortly, in a container, with a volume mounted as the backing storage, once to copy the image into the registry and once to copy it out); a registry has automatic fully supported deduplication, copying images from/to registries does not create multiple on-disk copies of the uncompressed data just to make a copy. It’s a much more efficient way to transfer images.

@matejvasek
Copy link
Contributor Author

matejvasek commented Oct 26, 2020

@mtrmac thanks for looking into this. Looks like the pack CLI tool is using this hack format (it pulls base image in advance so layers are always there).
I create issue for that in their repo: buildpacks/pack#925.

@vlk-charles
Copy link

To make Podman accept this kind of delta archive with missing layers, the manifest.json needs to reference dummy files. These files need to be present in the archive but they can be empty. Also, you cannot have multiple layers referencing the same file. Podman does not import layers that are already in the repository but it still requires them to be in the archive.

Get manifest.json out of the archive:

# tar -x manifest.json <a.tar

Edit Layers in the file to look like this:

[{"Config":"174bb363c5c85d471af5cabf71e1d0063bc33a3fea9b2caa7512d489a84535e7.json","Layers":["0","1","2","3","4","5","6","7","8","9","10","11","/fb712908333342fbc1daf8c1f5112318f1601f2ad30fa37facf1a41732410025.tar","/56bedd2e790a74634785a53a0b8c9046616068e8e0c3f75ad8cfcf3b493bffa0.tar","/93db97a8614fa09e787905fd1fb1975652548a5dc48ec14fa9463d17e1fc70a6.tar"],"RepoTags":["pack.local/builder/64736669626b65736878:latest"]}]

and add the new file along with the dummy ones:

# tar --delete manifest.json -f a.tar
# touch `seq 0 11`
# tar -rf a.tar manifest.json `seq 0 11`

Note that this only concerns the default docker-archive format. For some reason, the oci-archive format is more forgiving and missing files in the blobs/sha256 directory are OK if they don't need to be imported, even if they are referenced by the manifest (which is now also one of the files in blobs/sha256).

Tested on Podman 3.3.0.

@matejvasek
Copy link
Contributor Author

@vlk-charles thanks a lot! I'll try that tomorrow.

@mtrmac
Copy link
Collaborator

mtrmac commented Sep 16, 2021

To make Podman accept this kind of delta archive with missing layers, the manifest.json needs to reference dummy files. These files need to be present in the archive but they can be empty.

Let me be very clear that this is not a maintained behavior. If it works for you, great; if it breaks for any reason at all (a specific work to enable a new feature, a bug, a refactoring for purely aesthetically reasons), I’m not promising to do anything to make it work again; possibly including not accepting PRs that fix it, unless they make the rest of the code better regardless of this use case.

My recommendation continues to be to use a registry, which automatically provides cross-image deduplication, parallel operation, and higher performance in general.

@vlk-charles
Copy link

This definitely is unsupported behavior. I could have stressed that.

If a PR were to change the behavior to accept archives with missing layers and issue a warning in such case (or error if not already present on local machine), would that be acceptable? I'm not saying I will implement it, just curious.

@mtrmac
Copy link
Collaborator

mtrmac commented Sep 17, 2021

I think you’ll find that it’s not easily possible — we didn’t take an extra effort to refuse such archives, they are refused because the current implementation approach (generating an in-memory schema2 manifest) requires knowing sizes of layers, and that requires reading the layers.

This is kind of what I was hinting at – a PR that would add special carve-outs for the generic schema2 formats to make this invalid format work is IMHO not worth it. [It would conceptually be interesting to rework this so that the docker-archive manifest format is natively supported in the core codebase, without the schema2 intermediate steps, and that might avoid the need to know the size; but it’s also quite a bit of work and churn, even just reviewing the PRs.]


To clarify a bit, I am interested in the use case. There might well be some improvement that we should do; e.g. containers/skopeo#1440 discusses some non-registry format. Right now it doesn’t seem urgent but that might be just me not knowing something.

But I’m very skeptical that docker-archive: is the right solution, almost regardless of the question. Especially if we are talking about a c/image / Podman-specific version of docker-archive:, i.e. interoperability with Docker is no longer the core requirement, using that format just doesn’t make any sense to me. Either a short-term registry or dir: or perhaps oci: is almost certainly better right now. And if we had to introduce a Podman-specific file format, it should certainly allow reading (a subset of) files in arbitrary order — at least like .zip if nothing better.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants