Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local images ID does not match registry manifest digest #1662

Closed
deitch opened this issue Apr 21, 2016 · 33 comments
Closed

Local images ID does not match registry manifest digest #1662

deitch opened this issue Apr 21, 2016 · 33 comments
Labels

Comments

@deitch
Copy link

deitch commented Apr 21, 2016

Despite having read many of the specs (in flux), I cannot get a clear picture.

If I run docker images, I get an image ID for each image. I can get a more detailed result by doing docker inspect <repo>:<tag>. E.g. I look at deitch/mysql-backup:0.7.0 and I get "Id": "sha256:c37d6f99d22771d16b09ec67d29e0f7327b018d4793415acb79d954967249f71".

If I do docker pull deitch/mysql-backup:0.7.0, or if I just use the API and GET https://registry.hub.docker.com/v2/deitch/mysql-backup/manifests/0.7.0 and look at the docker_content_digest header, I get sha256:8a069edacec55b930b983a7c740b9802696cdc70acadfa249007950475f033ff.

  1. What is the relationship between the sha256 hash provided by docker images or docker inspect on the one hand, and the one provided by docker pull or docker_content_digest header on the other?
  2. How can I use this to determine if a local image is identical to the one in the registry? Without ripping deeply into the docker engine code, what algorithm can I follow to check?
@dcowden
Copy link

dcowden commented Apr 21, 2016

I am also interested in this answer. It might also help to understand my use case.

Given a local image that was pulled using a tag, I would like to know whether its i up-to-date with the remote registry version ( and more specifically, with a V2 registry ). IE, if i pull, will i get a new image, or not?

I know i can get the sha256 digest by fetching /v2/myapp/manifests/mytag

and, i see that when I pull that image, docker will display the hash after the pull.

I also ran across a discussion ( sorry cant find it now ), in which it was said that it is intentional that the digest is not stored alongside the image when a tag is pulled.

Because it appears that the docker imageID is not the same as the digest, and because it appears that it is intentional that the digest is NOT stored alongside the image ( IE, you cannot do docker inspect and somewhere find the digest ), it appears it is now not possible to determine if a pull will result in an updated image.

I'm interested academically in knowing how docker computes the 'imageID', but my main need is to understand how to know if a pull will download a new image, without actually pulling.

One naive solution would be to capture stdout when I pull the first time, and use the digest from there. The problem with that is that i'm not using docker directly, i'm using K8S, so i dont have access to the stdout when the pull runs. I can get the image ID, but of course that's useless for the purposes of comparison with a remote registry digest....

I found this answer, but it only works with v1 registry--

http://stackoverflow.com/questions/26423515/how-to-automatically-update-your-docker-containers-if-base-images-are-updated

@dmcgowan
Copy link
Collaborator

I will try and answer this question as best as I can and hopefully can get a better understanding of where we are missing documentation. Most of these answers only apply in docker 1.10+ and docker registry 2.3+ since those are the releases in which we made the image identifier content addressable and introduced schema 2 of the manifest used by the registry. When using older versions of docker or the registry the answer may require pulling an image to get the identifier used within docker.

What is the relationship between the sha256 hash provided by docker images or docker inspect on the one hand, and the one provided by docker pull or docker_content_digest header on the other?

The digest used for docker pull represents the digest of image manifest which is stored in a registry. This digest is considered the root of a hash chain since the manifest itself contains the hash of the content which will be downloaded and imported into docker. See the schema 2 spec for a description of this manifest https://docs.docker.com/registry/spec/manifest-v2-2/. The image id used within docker can be found in this manifest as config.digest. This config represents the image configuration which will be used within docker. So you could say the manifest is the envelope and image is what is inside. The manifest digest will always be different than the image id BUT for any given manifest the same image id should always be produced. Since it is a hash chain, we cannot guarantee that the manifest digest will always be the same for a given image id. In most cases it should usually produce the same digest, we just cannot guarantee it but do a best effort. The possible difference in manifest digest is because we do not store gzipped blobs locally and exporting of layers may produce a different digest, even though the uncompressed content should remain the same. The image id itself verifies that uncompressed content is the same, this is what we mean when we say the image id is now a content addressable identifier.

How can I use this to determine if a local image is identical to the one in the registry? Without ripping deeply into the docker engine code, what algorithm can I follow to check?

The best way to determine this is to compare the image id within a manifest to a local image id. This does require pulling and parsing the manifest.

I also ran across a discussion ( sorry cant find it now ), in which it was said that it is intentional that the digest is not stored alongside the image when a tag is pulled.

The reason we do not store the manifest identifier is because we do not store the manifest. It does not make sense to store the manifest since we are not storing the compressed blobs referenced by the manifest. We have taken measures to try and ensure the exact manifest is pushed (cross repository push, caching compressed blob identifiers associated with each layer). We do store the manifest hash when pulling by digest, but this is only a read only value to reference the correct image id in the future without pulling and not exportable since you cannot push by digest.

If you have upgraded from an older version of the registry or docker client and do not find these schema 2 manifests to be present in your registry, then repushing images should produce the new manifest. Also a disclaimer that the schema 2 manifests are currently not enabled on the public hub registry.

@dcowden
Copy link

dcowden commented Apr 21, 2016

Thanks for the detailed answer! But i'm still a bit confused:

The image id used within docker can be found in this manifest as config.digest

I was unable to find this field when i looked at a manifest retrived from a v2 api. I guess this means i'm just running an old registry? I'm using registry:v2 image to run my registry.

Essentially what i did in my environment was to look at the docker image id, and then download the manifest for that same image from my registry, and look for the imageid anywhere in that file. It was nowhere in there. You are saying it should be, right?

@dmcgowan
Copy link
Collaborator

@dcowden if the images was pushed with 1.10+ onto a registry 2.3+ then the image id should be in there. If it is a schema 2 manifest (see schemaVersion key) and the config.digest differs, try pulling the image and seeing if the image id of the pulled image matches. If it not trying a re push and you should get a schema 2 manifest referencing the image id.

If you are still seeing issues would be helpful to see the output of the docker commands as well as the manifest in question.

@dcowden
Copy link

dcowden commented Apr 21, 2016

Yep, that's my problem. I checked, and I have a version 1 schema still.
I'll upgrade and repush.

So I think the super-simple answer to my question is:

IF you are using a 2.3+ registry and 1.10+, then the manifest contains
config.digest, which is used as the docker image id.
IF NOT, then the docker image ID is something else, and will not match
anything in the manifest.

On Thu, Apr 21, 2016 at 4:21 PM, Derek McGowan [email protected]
wrote:

@dcowden https://github.com/dcowden if the images was pushed with 1.10+
onto a registry 2.3+ then the image id should be in there. If it is a
schema 2 manifest (see schemaVersion key) and the config.digest differs,
try pulling the image and seeing if the image id of the pulled image
matches. If it not trying a re push and you should get a schema 2 manifest
referencing the image id.

If you are still seeing issues would be helpful to see the output of the
docker commands as well as the manifest in question.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1662 (comment)

@dmcgowan
Copy link
Collaborator

@dcowden I think that is a fair statement. A schema1 manifest should always produce the same image id but defining the steps to produce directly from the manifest is not straight forward.

@dcowden
Copy link

dcowden commented Apr 21, 2016

Awesome! Thanks!
On Apr 21, 2016 4:30 PM, "Derek McGowan" [email protected] wrote:

@dcowden https://github.com/dcowden I think that is a fair statement. A
schema1 manifest should always produce the same image id but defining the
steps to produce directly from the manifest is not straight forward.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1662 (comment)

@deitch
Copy link
Author

deitch commented Apr 22, 2016

@dmcgowan thanks for the detailed answer.

Also a disclaimer that the schema 2 manifests are currently not enabled on the public hub registry.

Well, that explains why I keep getting "schemaVersion":1 every time I pull an image manifest from https://hub.docker.com ! And I just got a marketing email from Docker that the public hub will be upgraded 1st June.

So to paraphrase what @dcowden and you said together:

  • If (registry version >= 2.3) && (image pushed by docker version >= 1.10), then manifest will have "schemaVersion":2 and field config.digest. Value ofconfig.digestequals the ID returned bydocker images`.
  • If (registry version <2.3) || (image pushed by docker version < 1.10), then ????

There must be some logic that recreates it. After all, if I do docker pull on an image, it does some type of hash checking against the registry and doesn't pull if I am completely up to date.

It always is possible to dig into the docker engine source code, but really don't want to go down that path.

@deitch
Copy link
Author

deitch commented Apr 22, 2016

To add to it:

When I POST a new image (followed by the PUTs for the actual image blob), is there anything in the response that will be the image ID? Or do I need to do a new HEAD or GET for the manifest to get it?

@RichardScothern
Copy link

@deitch you will have to HEAD or GET on the manifest with the correct Accept header.

@deitch
Copy link
Author

deitch commented Jun 16, 2016

@RichardScothern so after a POST, you need to do a HEAD or GET on the manifest, which should match what was already available locally.

Would it make sense for me to rename this issue, "Documentation missing explaining how to reconcile local image ID with registry image ID"? At heart, it sounds like this works, it just needs some clear documentation?

@deitch
Copy link
Author

deitch commented Jun 16, 2016

@dmcgowan :

Also a disclaimer that the schema 2 manifests are currently not enabled on the public hub registry.

I thought I remembered an announcement about the public docker registry moving to schemaVersion: 2 as of June 1st, but I just did some HEAD on the latest for swarm, registry, ubuntu and a few other official library versions, all are still showing schemaVersion:1.

@dmcgowan
Copy link
Collaborator

@deitch schema 2 is now enabled. The images you mentioned have not been repushed since then and therefore have not had a schema 2 manifest pushed yet. We do not convert schema 1 manifests to schema 2 on the registry.

@deitch
Copy link
Author

deitch commented Jun 16, 2016

Got it. Is there a library image that is pushed that I can test against?

@RichardScothern
Copy link

hi @deitch , sorry I mislead you there. When a manifest is PUT a Location header is returned with the response containing a URL for the uploaded content.

@dmcgowan
Copy link
Collaborator

@deitch try the registry rc candidate, just pushed a few days ago registry:2.5.0-rc.1

@deitch
Copy link
Author

deitch commented Jun 17, 2016

@dmcgowan nope. Response to GET /v2/library/registry/manifests/2.5.0-rc.1 performed 1 minute ago

{
  "schemaVersion": 1,
  "name": "library/registry",
  "tag": "2.5.0-rc.1",

@deitch
Copy link
Author

deitch commented Jun 17, 2016

I guess it is possible someone pushed it using an older version of docker?

@deitch
Copy link
Author

deitch commented Jun 17, 2016

@RichardScothern does it matter? I would still need to do a HEAD or GET on that location, right?

@dliappis
Copy link

I also confirm that trying to GET /v2/library/registry/manifests/2.5.0-rc.1 still shows schemaVersion: 1.

In fact I also tested pushing a newly built image against a registry:2.4 private registry I am running and that also shows schemaVersion: 1 so I am not sure what is the process to use version 2 schemas.

docker-engine version used:

 $ docker --version
Docker version 1.11.2, build b9f10c9

registry tested:

$ docker images registry:2.4
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
registry            2.4                 5d322e774cf2        13 days ago         171.5 MB

@RichardScothern
Copy link

@deitch : when you put a manifest, the headers will contain the canonical digest in the registry.

If you are using curl to inspect the manifest, you will need this header ( Accept: application/vnd.docker.distribution.manifest.v2+json) to prevent the registry serving you a schema 1 manifest.

@deitch
Copy link
Author

deitch commented Jun 28, 2016

Oho! I will try that right now.

@deitch
Copy link
Author

deitch commented Jun 28, 2016

OK, I can confirm that using the header correctly works... but there is a bug. The following works:

Accept: application/vnd.docker.distribution.manifest.v2+json

The following does not:

Accept: application/vnd.docker.distribution.manifest.v2+json, text/plain

According to the RFC (check 2616, although 7231 is more up to date, both say the same thing), multiple options are valid in the Accept header, separated by commas. They are to be listed in descending order of priority.

Docker registry is more than welcome to ignore the second, third, fourth and any other, they can even be invalid, but the above is a valid header that registry does not recognize and so falls back to v1.

@tobico
Copy link

tobico commented Jul 29, 2016

I've been trying to work out how to do this, and hitting a lot of brick walls. Would anyone be able to share a working curl statement that returns something I can use to compare against the current image?

@dmcgowan
Copy link
Collaborator

@tobico see this reply here #1490 (comment). You could curl the registry and docker api and have a digest to compare. Since Docker 1.12 the manifest digest is always being saved, which should allow determining what image is associated with a registry manifest or whether there is an updated image on the registry.

@sandys
Copy link

sandys commented Jan 9, 2018

this needs to be fixed because this has an implication on stack deploy - since local hashes are not computed properly, stack deploy actually recreates each and every service ... even those that havent changed.

@stevvooe
Copy link
Collaborator

stevvooe commented Mar 9, 2018

@sandys You're welcome to lobby for a fix in https://github.com/moby/moby, but we can't really fix this here.

@deitch
Copy link
Author

deitch commented Dec 19, 2019

This is really funny. I was doing some work on manifests again, saw the hash differences between manifest and local, and thought I remembered someone opening an issue on it. Did a Google search and came across one opened by... me! 😆

This completely makes sense, and thanks for the help a few years ago @dmcgowan . I can compare the config digest and it matches to the one shown in docker image ls (or docker image inspect).

Do you mind explaining how the layers digests work? The digests of layers in the manifest are the digests of the compressed blobs, and that works. On the other hand, the digests of the layers in docker image inspect do not match up to those. Since docker doesn't store the compressed blobs, I can see why you might not want those, but what do the hashes in docker image inspect represent? Hashes of what?

@adamcohen
Copy link

adamcohen commented Mar 27, 2020

@deitch

Do you mind explaining how the layers digests work? The digests of layers in the manifest are the digests of the compressed blobs, and that works. On the other hand, the digests of the layers in docker image inspect do not match up to those. Since docker doesn't store the compressed blobs, I can see why you might not want those, but what do the hashes in docker image inspect represent? Hashes of what?

If you're referring to the sha256 hashes in the RootFS->Layers array from docker inspect, it refers to the sha256 of the layer.tar from the corresponding Docker image:

$ docker pull alpine:3.11.5

$ docker inspect alpine:3.11.5

[
  {
    "Id": "sha256:a187dde48cd289ac374ad8539930628314bc581a481cdb41409c9289419ddb72",
    "RepoTags": [
      "alpine:3.11.5"
    ],
    "RepoDigests": [
      "alpine@sha256:b276d875eeed9c7d3f1cfa7edb06b22ed22b14219a7d67c52c56612330348239"
    ],
    "Container": "fb71ddde5f6411a82eb056a9190f0cc1c80d7f77a8509ee90a2054428edb0024",
    <snip>
    "DockerVersion": "18.09.7",
    "RootFS": {
      "Type": "layers",
      "Layers": [
        "sha256:beee9f30bc1f711043e78d4a2be0668955d4b761d587d6f60c2c8dc081efb203"
      ]
    }
  }
]

$ docker save alpine:3.11.5 > alpine-3.11.5.tar

$ tar -xf alpine-3.11.5.tar

$ ls -al 
total 11496
drwxr-xr-x   7 adam  staff      224 Mar 28 01:08 .
drwxr-xr-x  42 adam  staff     1344 Mar 28 01:07 ..
drwxr-xr-x   5 adam  staff      160 Mar 24 08:19 485d7306187faf0cc9b77fc210e8def9d67b1953f0669e675877e90e6542cb6d
-rw-r--r--   1 adam  staff     1509 Mar 24 08:19 a187dde48cd289ac374ad8539930628314bc581a481cdb41409c9289419ddb72.json
-rw-r--r--   1 adam  staff      202 Jan  1  1970 manifest.json
-rw-r--r--   1 adam  staff       89 Jan  1  1970 repositories

$ shasum -a 256 485d7306187faf0cc9b77fc210e8def9d67b1953f0669e675877e90e6542cb6d/layer.tar

beee9f30bc1f711043e78d4a2be0668955d4b761d587d6f60c2c8dc081efb203  485d7306187faf0cc9b77fc210e8def9d67b1953f0669e675877e90e6542cb6d/layer.tar

As you can see from the above, once you extract the contents of the alpine-3.11.5.tar file and calculate the sha256 for the 485d7306187faf0cc9b77fc210e8def9d67b1953f0669e675877e90e6542cb6d/layer.tar file, it matches the value returned in the RootFS->Layers array from the docker inspect command

@deitch
Copy link
Author

deitch commented Mar 29, 2020

Yep. I had to do some more detailed work on image contents and came across the same. Thanks @adamcohen

@infogulch
Copy link

@adamcohen @deitch I'm trying to figure out how to go from the RootFS->Layers[] / shasum -a 256 xxx/layer.tar to a LayerID that can be downloaded from the registry blobs endpoint https://registry.hub.docker.com/v2/$repo/$image/blobs/$what_is_this_sha256_value.

How does the layer.tar hash relate to the final layer id / blob hash for that layer?

@ns-saggarwal
Copy link

@adamcohen @deitch I'm trying to figure out how to go from the RootFS->Layers[] / shasum -a 256 xxx/layer.tar to a LayerID that can be downloaded from the registry blobs endpoint https://registry.hub.docker.com/v2/$repo/$image/blobs/$what_is_this_sha256_value.

How does the layer.tar hash relate to the final layer id / blob hash for that layer?

Hey @infogulch
I know its an old thread, but were you able to figure this out?

@infogulch
Copy link

I put a bounty on this question on stack overflow and got a good answer, though I still had trouble using it for my goal. Maybe you will have more luck:

https://stackoverflow.com/questions/61366738/how-are-the-docker-image-layer-ids-derived/69688979?noredirect=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests