Redunant definition of layers #115

s-urbaniak · 2016-06-02T14:13:47Z

We have two definitions of layers:

Which list should we consider for unpacking/oci-runtime-bundling?

philips · 2016-06-02T18:29:59Z

I would assume 1. is the correct one and that 2. is deprecated

@vbatts @stevvooe seems like we should drop 2, right?

stevvooe · 2016-06-02T19:59:04Z

@philips I'm not sure saying we have redundant definitions of layers is correct. We have two different identifiers depending on the context. With the manifest, we have a simple, byte-stream digest. Within image config, DiffID was added to protect against hash instability introduced during compression. DiffID is used as part of the ChainID to ensure correct assembly. It is necessary if implementations prefer reassembly to reproduce artifacts over saving them in a cache.

We can look at these as two components, numbered according to the above:

This is the generic content description. Collecting the content requires no knowledge about whats inside. After processing and verifying, the result should be all the elements required to assemble the image, without having to understand how to actually assemble it.
This is the application component. At this stage, content is verified but image assembly may need to further process the content to correctly identify it. In this case, the raw tar stream, without compression, is hashed.

The benefit of this distinction is that images can be fetched without having to understand anything about a container runtime or actual image format. It also simplifies content storage, in that we no longer need to couple storage with the content format.

Most of this can be avoided if we opt to store artifacts, rather than try to reassemble. DiffID can be deprecated, if we converge it with content digest, but it imposes an architecture on implementations that may not be ideal for disk space usage.

wking · 2016-06-02T20:11:10Z

On Thu, Jun 02, 2016 at 12:59:05PM -0700, Stephen Day wrote:

Within image config, DiffID was added to protect against hash
instability introduced during compression.

I haven't wrapped my head around DiffID, so this may be off target.
But isn't hash-instability a reconstructor issue? And DiffID has to
be populated (and signed) by the original image author? How does
“reconstruction-instability protection” work in a verified, Merkle-DAG
context?

Most of this can be avoided if we opt to store artifacts, rather
than try to reassemble. DiffID can be deprecated, if we converge
it with content digest, but it imposes an architecture on
implementations that may not be ideal for disk space usage.

This sounds like the best approach to me. Disk space is cheap and
reconstruction is difficult. How many unpacked bundles do you expect
to have lying around at once? And as a later improvement, we can
increase blob-storage efficiency by adjusting the Merkle
representation to use smaller, more-likely-to-be-shared objects
(asymptotically approaching IPFS and it's sub-file, potentially
Rabin-fingerprinted blobs ;).

stevvooe · 2016-06-02T20:29:51Z

I haven't wrapped my head around DiffID, so this may be off target.
But isn't hash-instability a reconstructor issue?

Ideally, but with compressed layers, there is little that can be done to ensure this. Including the DiffID just ensures that internally these IDs remain as stable as possible. This design was done to secure image identification in a manner that was least disruptive to users.

This sounds like the best approach to me. Disk space is cheap and
reconstruction is difficult.

👃 👈

I'd love to resolve this in OCI, but doing this in the 1.0 time frame is unrealistic. We need to be careful not to impose an artifact store on implementations, as there are cases where that is untenable or impractical.

wking · 2016-06-02T20:58:22Z

On Thu, Jun 02, 2016 at 01:29:52PM -0700, Stephen Day wrote:

This sounds like the best approach to me. Disk space is cheap and
reconstruction is difficult.

I'd love to resolve this in OCI, but doing this in the 1.0 time
frame is unrealistic. We need to be careful not to impose an
artifact store on implementations, as there are cases where that is
untenable or impractical.

If we drop DiffID in favor of descriptors for layers, the only
downside seems to be potentially inefficient blobs (e.g. recompressing
tar stream 12 to blob 56, when you could have recycled an existing
blob 34 compression of tar stream 12). Background discussion starting
at 1. So if we go with descriptors, I'd recommend publishers used a
blob store to avoid compression instability, but I don't think we'd
need to require it. If your publishers are pushing many copies of the
same tarball with different hashes due to recompression, and you have
a problem with that, then you need to incentivize your publishers to
be more efficient (e.g. by charging them for adding new blobs to your
registry).

I'm not clear on the security angle (although I have a guess 2).

philips · 2016-06-03T04:43:14Z

@stevvooe So, if I were to add a clarifying point to help out @s-urbaniak and other people new to the spec the config diffID is the ungzipped hash? While the layers digest is gzipd?

stevvooe · 2016-06-06T22:25:37Z

@philips That is a fair differentiation but not binding.

The digest is the hash of the unprocessed content. It can be verified without understand tar or layers or images. The diffID must be the ungzipped hash.

philips · 2016-06-07T00:24:06Z

@stevvooe what do you mean by non-binding?

I will try and get some language together to close out this bug.

stevvooe · 2016-06-07T00:42:27Z

This is what I meant:

The digest is the hash of the unprocessed content. It can be verified without understand tar or layers or images. The diffID must be the ungzipped hash.

A digest reference can be verified without understanding the content, a diffID cannot.

DiffIDs and Manifest list digests were a bit confusing. Explain the difference. Fixes: opencontainers#115

DiffIDs and Manifest list digests were a bit confusing. Explain the difference. Fixes: opencontainers#115 Signed-off-by: Brandon Philips <[email protected]>

s-urbaniak mentioned this issue Jun 2, 2016

oci-image-tool: implement create-runtime-bundle #114

Merged

philips added component/serialization spec component/manifest spec priority/P0 labels Jun 2, 2016

philips added this to the v1.0.0-rc milestone Jun 2, 2016

philips pushed a commit to philips/image-spec that referenced this issue Jun 15, 2016

serialization: add explanation of DiffIDs

7191600

DiffIDs and Manifest list digests were a bit confusing. Explain the difference. Fixes: opencontainers#115

philips mentioned this issue Jun 15, 2016

serialization: add explanation of DiffIDs #142

Merged

philips pushed a commit to philips/image-spec that referenced this issue Jun 15, 2016

serialization: add explanation of DiffIDs

a448fa2

DiffIDs and Manifest list digests were a bit confusing. Explain the difference. Fixes: opencontainers#115 Signed-off-by: Brandon Philips <[email protected]>

philips pushed a commit to philips/image-spec that referenced this issue Jun 15, 2016

serialization: add explanation of DiffIDs

25a2cc5

DiffIDs and Manifest list digests were a bit confusing. Explain the difference. Fixes: opencontainers#115 Signed-off-by: Brandon Philips <[email protected]>

philips pushed a commit to philips/image-spec that referenced this issue Jun 17, 2016

serialization: add explanation of DiffIDs

67893d5

DiffIDs and Manifest list digests were a bit confusing. Explain the difference. Fixes: opencontainers#115 Signed-off-by: Brandon Philips <[email protected]>

stevvooe closed this as completed in #142 Jun 22, 2016

wking mentioned this issue Sep 20, 2016

Make the order more clear #330

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redunant definition of layers #115

Redunant definition of layers #115

s-urbaniak commented Jun 2, 2016

philips commented Jun 2, 2016

stevvooe commented Jun 2, 2016

wking commented Jun 2, 2016

stevvooe commented Jun 2, 2016

wking commented Jun 2, 2016

philips commented Jun 3, 2016

stevvooe commented Jun 6, 2016

philips commented Jun 7, 2016

stevvooe commented Jun 7, 2016

Redunant definition of layers #115

Redunant definition of layers #115

Comments

s-urbaniak commented Jun 2, 2016

philips commented Jun 2, 2016

stevvooe commented Jun 2, 2016

wking commented Jun 2, 2016

stevvooe commented Jun 2, 2016

wking commented Jun 2, 2016

philips commented Jun 3, 2016

stevvooe commented Jun 6, 2016

philips commented Jun 7, 2016

stevvooe commented Jun 7, 2016