Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identity: add implementation of ChainID #486

Merged
merged 2 commits into from
Jan 17, 2017

Conversation

stevvooe
Copy link
Contributor

@stevvooe stevvooe commented Dec 8, 2016

The specification defines an algorithm to calculate a ChainID, which
can be used to identify the result of subsequent applications of layers.
Because this algorithm is subtle and only needs to implemented in a
single place, we provide a reference implementation.

For convenience, we provide functions that calculate all the chain ids
and just the top-level one. It is is integrated with the
distribution/digest type for safety and convenience.

As part of this, the identity package has been introduced. For
consuming code, a few helpers have been provide to ease transition as
the name of the upstream package has not yet been finalized. Users of
this package should employ FromBytes, FromString and FromReader
where appropriate, which should ease the transition if these packages
change.

Tests are formulated based on pre-calculation of chain identifiers to
ensure correctness.

Signed-off-by: Stephen J Day [email protected]

Relates to #482.
Requires #487.

@stevvooe
Copy link
Contributor Author

stevvooe commented Dec 8, 2016

PTAL @opencontainers/image-spec-maintainers

@stevvooe stevvooe mentioned this pull request Dec 8, 2016
4 tasks
@wking
Copy link
Contributor

wking commented Dec 8, 2016 via email

@wking
Copy link
Contributor

wking commented Dec 8, 2016 via email

@stevvooe
Copy link
Contributor Author

stevvooe commented Dec 8, 2016

See #487.

Also, note that the dependency hasn't been added. Considering adding digest package here, as well.

import "github.com/docker/distribution/digest"

// ChainID takes a slice of digests and returns the ChainID corresponding to
// the last entry. Typically, these are a list of layer DiffIDs, with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather drop “to the last entry”. The ChainID is a property of the slice as a whole, not a property of one entry in particular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My did you quote yourself as the canonical definition of ChainID?

Actually, a list of digests has a ChainID for each entry in the array. This function returns the last entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My did you quote yourself as the canonical definition of ChainID?

I quoted myself as the clearest (to me ;) description of ChainID being a property of an array (which you point out is not blindingly obvious in the current spec wording, #482.

Actually, a list of digests has a ChainID for each entry in the array. This function returns the last entry.

The spec definition is for ChainID, singular, not a ChainIDs array. And the ChainID being returned here corresponds to the whole slice, not just the last entry. Yes, the returned ChainID is the same digest as the last entry in the mutated slice from your ChainIDs function, but I think that's an implementation detail that's too peripheral for the opening sentence of this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking Seriously, if you continue to split hairs on topics you have literally no clue about, I am going to block you. In fact, I remember explaining chain ids to you months ago and having you walk away with no understanding in their use or calculation.

Now, you show up, quoting yourself, rather than the proposed explanation, as canonical?

And the ChainID being returned here corresponds to the whole slice, not just the last entry.

These are the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the ChainID being returned here corresponds to the whole slice, not just the last entry.

These are the same thing.

No they aren't, and you pointed this out in #482. If the chain ID was a property of an entry, the function signature here would be:

func ChainID(digest digest.Digest) digest.Digest

and you'd call it with ChainID(config.RootFS.DiffIDs[3]), etc. But the chain ID is really a property of the slice (defined for slices with length > 0), so the signature is:

func ChainID(digests []digest.Digest) (digest.Digest, error)

And you call it with ChainID(config.RootFS.DiffIDs[:4]), etc.

// ChainID takes a slice of digests and returns the ChainID corresponding to
// the last entry. Typically, these are a list of layer DiffIDs, with the
// result providing the ChainID identifying the result of sequential
// application of the preceeding layers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“preceeding” → “preceding”

ChainIDs(chainIDs)

if len(chainIDs) == 0 {
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither the current definition nor any of the alternative phrasings proposed in #482 define chainIDs for empty arrays. I think we want:

func ChainID(digests []digest.Digest) (digest.Digest, error)

with an error being returned when len(digests) == 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to create an error for this condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reference implementation for the definiton, so I think we do need an “undefined behavior” error for the undefined case of ChainID([]). Alternatively, we could define the ChainID for that case to be the empty string, but I don't see the point since we always require at least one layer (with #407 still in flight).

It would have been nice if the chainID algorithm was:

ChainID([L₀, …, Lₙ₋₁]) = { SHA256hex(''), when n == 0
                         { SHA256hex(ChainID([L₀, …, Lₙ₋₂]) + " " + DiffID(Lₙ₋₁)), when n > 0

to cleanly cover the n == 0 case, but I think changing to that now would create more confusion than the increased cleanliness is worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking And, as I said in #482, that definition needs to be adjusted. As the specification is written currently, it is undefined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And with all of the proposals in #482, the chainID for an empty array is still undefined. In the absence of a spec definition (which is where things stand with PR), I think we should be raising an “undefined” error.

// and after the call.
//
// As an exmaple, given the chain of ids `[A, B, C]`, the result `[A,
// ChainID(A|B), ChainID(A|B|C)]` will be written back to the slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“exmaple” → “example”

And the pipe syntax is not particularly intuitive for me. Can we use:

As an example, given the digests [A, B, C], the result [A, ChainID([A, B]), ChainID([A, B, C])] will be written back to the slice.

// As an exmaple, given the chain of ids `[A, B, C]`, the result `[A,
// ChainID(A|B), ChainID(A|B|C)]` will be written back to the slice.
//
// The input is provided as a return value for convenience.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather drop this, since it might mask the mutation you're doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not. The trick to avoiding these bugs is to read the function documentation and understand the behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that reading the docs is an important step for the dev who first uses the function. Maybe they know (for whatever reason) that the current code base no longer needs the original diffIDs, and they save a copy by using:

middleChainIDs := chain.ChainIDs(config.RootFS.DiffIDs)[2:4]

All very nice.

Some time later, another dev is going to come along and read the consuming code, not realize that config.RootFS.DiffIDs has been mutated, and use the mutated values. Are they going to go back and read the docs (“I wonder if ChainIDs is mutating its argument…”)? Probably not.

You could guard against that with the initial dev using:

// warning: mutates config.RootFS.DiffIDs
middleChainIDs := chain.ChainIDs(config.RootFS.DiffIDs)[2:4]  

On the other hand, not returning the mutated slice makes it calls that look more suspicious for mutation:

chain.ChainIDs(config.RootFS.DiffIDs)

^ that does not sound like a logging function, so it's probably mutating the argument.

And I doubt the diffID slice will ever be so long that allocating a new slice for the chainIDs is a big performance drag. I'd rather allocate and return a new slice. But there is some performance impact to that, so if you have to keep the mutation, I'm ok with that, but I'd rather make the mutation as obvious as possible in the function signature.

@wking
Copy link
Contributor

wking commented Dec 8, 2016 via email

@stevvooe
Copy link
Contributor Author

stevvooe commented Dec 8, 2016

I don't like
the spec consumes Docker, which consumes the spec, etc. reference
cycle.

The spec does not "consume" docker. The digest package is just a Go package from docker. Is the problem that the dependency has the name "docker" in it, like it is some big-bad word?

@wking wking mentioned this pull request Dec 8, 2016
@wking
Copy link
Contributor

wking commented Dec 8, 2016 via email

@stevvooe
Copy link
Contributor Author

@vbatts @philips Could I get some feedback on this before I take another pass?

Copy link
Contributor

@philips philips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall implementation looks fine, but there is the library to sort out.

// identifying the result of subsequent layer applications.
package chain

import "github.com/docker/distribution/digest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to vendor this, right?

Or, we move that package into this repo. Given the strong dependence on that library and that these string digests are part of the spec seems we should add that package here as well. We already have one open-coded impl here:

sum := sha256.Sum256([]byte(tt.manifest))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have a solid package and mature implementation, I'm thinking that we can bring the distribution/digest into OCI. I'm seeing some disconcerting patterns being propagated that this package solves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd be strongly in favour of moving it in here so that any spec implementers can/will easily leverage it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle @philips After some consideration, I've broken this out into a separate package, https://github.com/docker/go-digest. The reason being that this has wider use that just OCI images.

If possible, it might be good to move this over to opencontainers/go-digest but I think this meets the current concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear my concerns were around importing an unversioned package from a large out-of-org repository (nothing to with the particular provenance or import path of said package). Moving to importing an unversioned package from a small out-of-org repository is a minor improvement but ideally I'd rather see this in-tree, with a secondary preference for another opencontainers/* repo and vendored from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle @philips I'm not sure if your implications apply here, seeing the history of the digest repository. In fact, moving it out will make it less stable, but I want to make everyone happy.

If you can get me an opencontainers/go-digest repository by the 1pm PST, I'll be more than happy to move it over and support the transition in Docker itself.

@stevvooe stevvooe force-pushed the chainid-implementation branch 2 times, most recently from 1979514 to 3ea8049 Compare December 19, 2016 23:29
@stevvooe stevvooe changed the title chain: add implementation of ChainID identity: add implementation of ChainID Dec 19, 2016
@stevvooe
Copy link
Contributor Author

@philips Updated to vendor docker/go-digest. Moved to package identity and added some helpers to ease transition if package naming is changes.

digest "github.com/docker/go-digest"
)

// FromReader returns the most valid digest for the underlying content using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "most valid digest" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the tarsum days, this used to do content resolution. I'll update the comment.

I've fixed this upstream and will update this PR

@jonboulle
Copy link
Contributor

Thanks, generally this looks good - unfortunately it looks like the git validation checker is choking on the vendor commits again :-( - @vbatts what did we end up doing for that?

@stevvooe stevvooe force-pushed the chainid-implementation branch 2 times, most recently from de7ed1b to 2201048 Compare December 20, 2016 20:14
@stevvooe
Copy link
Contributor Author

@jonboulle I looked for an easy patch to git-validation but it doesn't look straightforward.

@jonboulle
Copy link
Contributor

@stevvooe I forgot about #148 (comment) - did you use make update-deps?

@stevvooe
Copy link
Contributor Author

@philips @jonboulle Updated to use go-digest package.

There may be some follow up after we work out some administrative details on go-digest (such as a release), but let's get this in.

@@ -1,29 +0,0 @@
Blackfriday is distributed under the Simplified BSD License:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental removal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking I ran the make update-deps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that glide-vc is not taking test dependencies into account.

We also need to stop committing broken garbage into the Makefile. None of this is making this easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this PR should be correct now.

I'm sending a separate PR with the fixes to the Makefile.

@@ -1,19 +0,0 @@
Copyright (c) 2015 Dmitri Shuralyov
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another accidental removal?

The specification defines an algorithm to calculate a `ChainID`, which
can be used to identify the result of subsequent applications of layers.
Because this algorithm is subtle and only needs to implemented in a
single place, we provide a reference implementation.

For convenience, we provide functions that calculate all the chain ids
and just the top-level one. It is is integrated with the
distribution/digest type for safety and convenience.

As part of this, the `identity` package has been introduced. For
consuming code, a few helpers have been provide to ease transition as
the name of the upstream package has not yet been finalized. Users of
this package should employ `FromBytes`, `FromString` and `FromReader`
where appropriate, which should ease the transition if these packages
change.

Tests are formulated based on pre-calculation of chain identifiers to
ensure correctness.

Signed-off-by: Stephen J Day <[email protected]>
@stevvooe
Copy link
Contributor Author

@opencontainers/image-spec-maintainers PTAL

- package: github.com/xeipuuv/gojsonschema
version: d5336c75940ef31c9ceeb0ae64cf92944bccb4ee
- package: github.com/russross/blackfriday
version: ~v1.4
- package: github.com/shurcooL/sanitized_anchor_name
version: 10ef21a441db47d8b13ebcc5fd2310f636973c77
- package: github.com/opencontainers/go-digest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a version, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a version, please?

We still need to cut one (opencontainers/go-digest#23), as @stevvooe mentions at in his earlier comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the immediately prior dependency. Version can be a hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is in the glide.lock file. Only specify a version here if it matters when updating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's here in glide.yaml - every other package has an associated version. Can we just keep it consistent for now? IMO glide-update should ideally be idempotent unless someone intentionally requests otherwise.

@jonboulle
Copy link
Contributor

jonboulle commented Jan 16, 2017

LGTM
I still prefer https://github.com/opencontainers/image-spec/pull/486/files#r96237294 but am not going to block right now.

Approved with PullApprove

@philips
Copy link
Contributor

philips commented Jan 17, 2017

LGTM

Approved with PullApprove

@stevvooe stevvooe merged commit 0ff14aa into opencontainers:master Jan 17, 2017
@vbatts vbatts mentioned this pull request Jan 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants