-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
media-types: Define layer media types with and without '+gzip' #388
media-types: Define layer media types with and without '+gzip' #388
Conversation
This Looks Good To Me but, I will defer to those with stronger opinions from #332. |
581cc54
to
fd97198
Compare
hmmm. I'm more inclined for this one. It would shift around media type variable names for some. And there are no MUSTs around implementations supporting all (though that really seems like not a huge deal to support a media type difference like this). |
@@ -28,7 +28,8 @@ Changing it means creating a new derived image, instead of changing the existing | |||
A layer DiffID is a SHA256 digest over the layer's uncompressed tar archive and serialized in the descriptor digest format, e.g., `sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`. | |||
Layers must be packed and unpacked reproducibly to avoid changing the layer DiffID, for example by using tar-split to save the tar headers. | |||
|
|||
NOTE: the DiffID is different than the digest in the manifest list because the manifest digest is taken over the gzipped layer for `application/vnd.oci.image.layer.v1.tar+gzip` types. | |||
The difference between DiffIDs and the layer digests in the [manifest's `layers`](manifest.md#image-manifest-property-descriptions) is that the layer digest is taken over the blob regardless of compression, while the DiffID is taken after removing any compression. | |||
For an `application/vnd.oci.image.layer.tar+gzip` layer, the layer digest is taken over the `application/vnd.oci.image.layer.tar+gzip` content, while the DiffID is take over the `application/vnd.oci.image.layer.tar` content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not always be a given. A layer digest may be taken over the uncompressed content. The only limitation is that DiffID
is always uncompressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not always be a given.
What may not be given? Are you just talking about the last sentence? It's qualified with “For an application/vnd.oci.image.layer.tar+gzip
layer” (although that's missing the v1
added in #347), and I think the rest of that sentence is always true in that case. Of course, there may be layers that are not application/vnd.oci.image.layer.v1.tar+gzip
…
A layer digest may be taken over the uncompressed content. The only limitation is that DiffID is always uncompressed
Isn't that what I'm saying (e.g. with “regardless of compresson” for layers and “after removing any compression” for DiffIDs)? Which wording in particular do you think is saying something different, and what do you think it's saying. Once we identify the sticky phrasing we can reword it for additional clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one side of this statement is true: "the DiffID is take over the application/vnd.oci.image.layer.tar
content."
For this PR to make even a remote bit of sense, a layer digest may be taken over application/vnd.oci.image.layer.tar
or application/vnd.oci.image.layer.tar+gzip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR to make even a remote bit of sense, a layer digest may be taken over
application/vnd.oci.image.layer.tar
orapplication/vnd.oci.image.layer.tar+gzip
.
For a generic layer yes, and those types are still not exhaustive because we only SHOULD layer media types. This sentence, on the other hand, is qualified with “For an application/vnd.oci.image.layer.tar+gzip
layer”, and in that case the layer hash MUST be taken over the application/vnd.oci.image.layer.tar+gzip
content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Could you please stop arguing every little point? I would like for this PR to land but you seem to just want to waste everyone's time. Please get this under control or I am going to start closing your PRs so we can focus on changes that are more likely to succeed.
A DiffID is narrow. It is part of the config, in rootfs
. A DiffID is only ever taken over the uncompressed content.
A layer digest may be taken over the uncompressed or compressed version of the content. Arguably, this is not relevant to this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -311,11 +311,10 @@ Any given image is likely to be composed of several of these Image Filesystem Ch | |||
Certain layers, due to legal requirements, may not be regularly distributable. | |||
Typically, such layers are downloaded directly from a distributor but are never uploaded. | |||
|
|||
Layers that have these restrictions SHOULD be tagged with an alternative mediatype of `application/vnd.oci.image.layer.nondistributable.v1.tar+gzip`. | |||
Layers that have these restrictions SHOULD be tagged with an alternative media type of `application/vnd.oci.image.layer.nondistributable.v1.tar`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or +gzip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true of many instances where I currently only talk about the uncompressed type (which is one reason why I prefer the structured syntax suffix approach from #332). With a structured syntax suffix off the table, this PR tries to stay clean by only defining the +gzip
forms in media-types.md
. For example, this section defines a nondistributable, uncompressed layer. If you're wondering what a compressed, nondistributable layer is, you look it up in the media type list, and see that it's a gzipped version of the uncompressed, nondistributable layer (linking to this section). You can also look up the uncompressed, nondistributable layer in the media-type table, and that also links to this section.
If, on the other hand, you're reading this section and think “hmm, I wonder what media type this would be if I gzip it?”, the current state of this PR requires to you to grep or remember that we define +gzip
forms for these specific types. If we want to continually remind readers that we define those +gzip
types, I'll reluctantly inject whatever boilerplate you need in most locations where we reference an uncompressed layer type; just tell me what you want that boilerplate to be. Maybe (or ['…+gzip'][media-types.md#oci-image-media-types)
almost everywhere we list an uncompressed-layer media type? Do you really want that?
- [`application/vnd.oci.image.layer.v1.tar`](layer.md) | ||
- [`application/vnd.oci.image.layer.v1.tar+gzip`](media-types.md#oci-image-media-types) | ||
- [`application/vnd.oci.image.layer.nondistributable.v1.tar`](layer.md#non-distributable-layers) | ||
- [`application/vnd.oci.image.layer.nondistributable.v1.tar+gzip`](media-types.md#oci-image-media-types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link doesn't look right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What doesn't look right about it? We list both the uncompressed and +gzip
layer types there, and define the +gzip
form as a gzipped version of the uncompressed layer. I don't think we need to define the +gzip
forms anywhere else, although there are other cases where it makes sense to list them as “MUST be supported”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't it link to layer.md
like the one above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because layer.md
has nothing to say about gzipping (which happens on top of the layer tar archive described in layer.md
):
$ git grep gzip 63ce6ddc -- layer.md
…no hits…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you link them to a random table? Link to what is contained. For all intents and purposes, a reader can infer that +gzip
is nearly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to what is contained.
I'm having trouble parsing this.
For all intents and purposes, a reader can infer that
+gzip
is nearly the same.
Why infer anything when we have a normative definition of the +gzip
types in media-types.md
? Maybe you want me to move that normative description into layers.md
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you want me to move that normative description into
layers.md
?
In case this was what you were looking for, I've worked it up as a fixup commit with a387698 → b0105cb. If the changes in b0105cb look good to you, let me know and I'll squash them in. If they don't look like what you want, just tell me what to change (or file a fixup PR against this branch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to layer.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While there will be complexities in making this a reality, I'm more in favor of this today than before. For this PR to work it needs to make it clear that a layer Please clean this up and rebase and let's see if we can make it work. |
fd97198
to
63ce6dd
Compare
On Wed, Jan 18, 2017 at 04:13:34PM -0800, Stephen Day wrote:
For this PR to work it needs to make it clear that a layer `DiffID`
is _always_ the uncompressed hash, while the digest may be
compressed or uncompressed based on the media type.
I read my currently proposed wording as clear on this point. If you
have a particular phrase that is bothering you, please point it out.
Please clean this up and rebase and let's see if we can make it work.
Rebased onto master with fd97198 → 63ce6dd. This also fixed some
missing ‘.v1’s [1] and simplified some Markdown links
(e.g. ‘[…](./media-types.md)’ → ‘[…](media-types.md)’), but didn't
make major changes to my proposed wording. Points that still need
discussion include:
* What portion of my proposed DiffID vs. layer description is unclear,
and how it should be modified for increased clarity [2]
* Whether we need +gzip references at most locations where we
currently have an uncompressed-layer type reference [3], and if so,
what that +gzip reference should look like.
* Whether the +gzip forms should have their definitions moved to a
location besides media-types.md [4].
[1]: #388 (comment)
[2]: #388 (comment)
[3]: #388 (comment)
[4]: #388 (comment)
|
On Wed, Nov 30, 2016 at 01:30:58PM -0800, Vincent Batts wrote:
And there are no MUSTs around implementations supporting all (though
that really seems like not a huge deal to support a media type
difference like this).
I'm not quite clear on what you're saying here (are you calling for
“MUST support”? Suggesting we avoid MUST support?), but we currently
REQUIRE support for the layer types we define [1], and with this PR we
will continue that [2] (although now we are defining more layer
types).
[1]: https://github.com/opencontainers/image-spec/blame/v1.0.0-rc3/manifest.md#L55
[2]: https://github.com/opencontainers/image-spec/pull/388/files#diff-bf8d9e5146f89ef0a8b9bf8cd1026901R55
|
@@ -28,7 +28,8 @@ Changing it means creating a new derived image, instead of changing the existing | |||
A layer DiffID is a SHA256 digest over the layer's uncompressed tar archive and serialized in the descriptor digest format, e.g., `sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`. | |||
Layers must be packed and unpacked reproducibly to avoid changing the layer DiffID, for example by using tar-split to save the tar headers. | |||
|
|||
NOTE: the DiffID is different than the layer digest because it is taken over the uncompressed gzipped layer for `application/vnd.oci.image.layer.v1.tar+gzip` types. | |||
The difference between DiffIDs and the layer digests in the [manifest's `layers`](manifest.md#image-manifest-property-descriptions) is that the layer digest is taken over the blob regardless of compression, while the DiffID is taken after removing any compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Right here: this sentence is super unclear. It is backwards. Assert the positive notion (what a DiffID is taken over), then qualify it. Clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert the positive notion (what a DiffID is taken over)…
This happens in the preceding paragraph (“A layer DiffID is…”).
…then qualify it.
This paragraph is not attempting to qualify what a DiffID is. It is just underlining the difference between what a DiffID is (defined in the previous paragraph) and what a layer digest is (defined in manifest.md
). A given uncompressed blob may have multiple compressed forms, but “the uncompressed blob associated with this potentially compressed blob” is unambiguous, so it made more sense to me to work it in the layer digest → DiffID direction.
However, if you would prefer different wording for this section, I'm happy to push a commit with whatever you want. Just tell me what it needs to be in a comment here or file a PR against my branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably remain as a note. It is a little odd to mention a manifest at this level and introducing this alternative without more context is jarring.
This should probably be along the lines of "Do not confuse DiffID with layer digests, often referenced in the manifest, which allow digests over compressed or uncompressed content."
On Thu, Jan 19, 2017 at 02:48:16PM -0800, Stephen Day wrote [1]: > A DiffID is narrow. It is part of the config, in `rootfs`. A DiffID > is only ever taken over the uncompressed content. > > A layer digest may be taken over the uncompressed or compressed > version of the content. I agree with both of these points, but have been unable to convince Stephen that this is what I'm trying to show with this example. > Arguably, this is not relevant to this document. This paragraph was intended as a clarifying example, but it was never normative. Since we can't agree on clear wording, we can hopefully agree that it's not needed ;). [1]: opencontainers#388 (comment) Signed-off-by: W. Trevor King <[email protected]>
@@ -28,8 +28,6 @@ Changing it means creating a new derived image, instead of changing the existing | |||
A layer DiffID is a SHA256 digest over the layer's uncompressed tar archive and serialized in the descriptor digest format, e.g., `sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`. | |||
Layers must be packed and unpacked reproducibly to avoid changing the layer DiffID, for example by using tar-split to save the tar headers. | |||
|
|||
NOTE: the DiffID is different than the layer digest because it is taken over the uncompressed gzipped layer for `application/vnd.oci.image.layer.v1.tar+gzip` types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not confuse DiffID with layer digests, often referenced in the manifest, which allow digests over compressed or uncompressed content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. Should I make a commit with you as an author and your usual Signed-off-by?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this new wording no longer mentions media types at all, so we can spin it off into an orthogonal PR. Done with #515 and dropped from here.
46ec151
to
a387698
Compare
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
a387698
to
a929074
Compare
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
a929074
to
b0105cb
Compare
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
b0105cb
to
efade57
Compare
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
LGTM after merging #515. |
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers#332 [2]: opencontainers#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
Generated with: $ make img/media-types.png and Graphviz version 2.38.0. Signed-off-by: W. Trevor King <[email protected]>
efade57
to
35b9fae
Compare
On Thu, Jan 19, 2017 at 03:43:09PM -0800, Stephen Day wrote:
LGTM after merging #515.
|
1 similar comment
Catch up with aad7f24 (media-types: Define layer media types with and without '+gzip', 2016-09-20, opencontainers#388). Signed-off-by: W. Trevor King <[email protected]>
Catch up with aad7f24 (media-types: Define layer media types with and without '+gzip', 2016-09-20, opencontainers#388). Signed-off-by: W. Trevor King <[email protected]>
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers/image-spec#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers/image-spec#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers/image-spec#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers/image-spec#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
Trevor and I couldn't agree on more detailed wording [1], and this paragraph was never normative anyway. [1]: opencontainers/image-spec#388 (comment) Signed-off-by: Stephen J Day <[email protected]> Signed-off-by: W. Trevor King <[email protected]>
This is a version of #332 that doesn't define a structured syntax suffix (and just defines layer types with and without +gzip). I prefer #332, because it automatically provides compressed and uncompressed versions of any existing media type and is exactly the sort of thing that the RFC 6839 approach is useful for. But for reasons that I still don't understand, the maintainer consensus seems to be against involving structured syntax suffixes.
git diff b9173d5..581cc54
shows the current tips of this PR and #332 are largely the same, with this PR being a bit wordier in places that list media types (because it has to list both the bare and+gzip
forms) and #332 being a bit wordier where it defines the structured syntax suffix. Currently this PR introduces fewer new words than #332, but I expect that as the number of potentially-compressed media types grows this approach will become more tedious. But we may not see further potentially-compressed media types, and if we do we can always return to the structured syntax suffix approach later.