-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
identity: add implementation of ChainID #486
identity: add implementation of ChainID #486
Conversation
PTAL @opencontainers/image-spec-maintainers |
1ab16c3
to
122313b
Compare
Travis points out some missing license headers [1].
[1]: https://travis-ci.org/opencontainers/image-spec/builds/182417754#L327
|
And it has some problems with *testing.T.Run [1].
[1]: https://travis-ci.org/opencontainers/image-spec/builds/182417754#L336
|
See #487. Also, note that the dependency hasn't been added. Considering adding digest package here, as well. |
122313b
to
8d32d51
Compare
import "github.com/docker/distribution/digest" | ||
|
||
// ChainID takes a slice of digests and returns the ChainID corresponding to | ||
// the last entry. Typically, these are a list of layer DiffIDs, with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather drop “to the last entry”. The ChainID is a property of the slice as a whole, not a property of one entry in particular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My did you quote yourself as the canonical definition of ChainID?
Actually, a list of digests has a ChainID for each entry in the array. This function returns the last entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My did you quote yourself as the canonical definition of ChainID?
I quoted myself as the clearest (to me ;) description of ChainID being a property of an array (which you point out is not blindingly obvious in the current spec wording, #482.
Actually, a list of digests has a ChainID for each entry in the array. This function returns the last entry.
The spec definition is for ChainID, singular, not a ChainIDs array. And the ChainID being returned here corresponds to the whole slice, not just the last entry. Yes, the returned ChainID is the same digest as the last entry in the mutated slice from your ChainIDs
function, but I think that's an implementation detail that's too peripheral for the opening sentence of this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Seriously, if you continue to split hairs on topics you have literally no clue about, I am going to block you. In fact, I remember explaining chain ids to you months ago and having you walk away with no understanding in their use or calculation.
Now, you show up, quoting yourself, rather than the proposed explanation, as canonical?
And the ChainID being returned here corresponds to the whole slice, not just the last entry.
These are the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the ChainID being returned here corresponds to the whole slice, not just the last entry.
These are the same thing.
No they aren't, and you pointed this out in #482. If the chain ID was a property of an entry, the function signature here would be:
func ChainID(digest digest.Digest) digest.Digest
and you'd call it with ChainID(config.RootFS.DiffIDs[3])
, etc. But the chain ID is really a property of the slice (defined for slices with length > 0), so the signature is:
func ChainID(digests []digest.Digest) (digest.Digest, error)
And you call it with ChainID(config.RootFS.DiffIDs[:4])
, etc.
// ChainID takes a slice of digests and returns the ChainID corresponding to | ||
// the last entry. Typically, these are a list of layer DiffIDs, with the | ||
// result providing the ChainID identifying the result of sequential | ||
// application of the preceeding layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“preceeding” → “preceding”
ChainIDs(chainIDs) | ||
|
||
if len(chainIDs) == 0 { | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither the current definition nor any of the alternative phrasings proposed in #482 define chainIDs for empty arrays. I think we want:
func ChainID(digests []digest.Digest) (digest.Digest, error)
with an error being returned when len(digests) == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to create an error for this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reference implementation for the definiton, so I think we do need an “undefined behavior” error for the undefined case of ChainID([])
. Alternatively, we could define the ChainID for that case to be the empty string, but I don't see the point since we always require at least one layer (with #407 still in flight).
It would have been nice if the chainID algorithm was:
ChainID([L₀, …, Lₙ₋₁]) = { SHA256hex(''), when n == 0
{ SHA256hex(ChainID([L₀, …, Lₙ₋₂]) + " " + DiffID(Lₙ₋₁)), when n > 0
to cleanly cover the n == 0
case, but I think changing to that now would create more confusion than the increased cleanliness is worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And with all of the proposals in #482, the chainID for an empty array is still undefined. In the absence of a spec definition (which is where things stand with PR), I think we should be raising an “undefined” error.
// and after the call. | ||
// | ||
// As an exmaple, given the chain of ids `[A, B, C]`, the result `[A, | ||
// ChainID(A|B), ChainID(A|B|C)]` will be written back to the slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“exmaple” → “example”
And the pipe syntax is not particularly intuitive for me. Can we use:
As an example, given the digests
[A, B, C]
, the result[A, ChainID([A, B]), ChainID([A, B, C])]
will be written back to the slice.
// As an exmaple, given the chain of ids `[A, B, C]`, the result `[A, | ||
// ChainID(A|B), ChainID(A|B|C)]` will be written back to the slice. | ||
// | ||
// The input is provided as a return value for convenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather drop this, since it might mask the mutation you're doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not. The trick to avoiding these bugs is to read the function documentation and understand the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that reading the docs is an important step for the dev who first uses the function. Maybe they know (for whatever reason) that the current code base no longer needs the original diffIDs, and they save a copy by using:
middleChainIDs := chain.ChainIDs(config.RootFS.DiffIDs)[2:4]
All very nice.
Some time later, another dev is going to come along and read the consuming code, not realize that config.RootFS.DiffIDs
has been mutated, and use the mutated values. Are they going to go back and read the docs (“I wonder if ChainIDs
is mutating its argument…”)? Probably not.
You could guard against that with the initial dev using:
// warning: mutates config.RootFS.DiffIDs
middleChainIDs := chain.ChainIDs(config.RootFS.DiffIDs)[2:4]
On the other hand, not returning the mutated slice makes it calls that look more suspicious for mutation:
chain.ChainIDs(config.RootFS.DiffIDs)
^ that does not sound like a logging function, so it's probably mutating the argument.
And I doubt the diffID slice will ever be so long that allocating a new slice for the chainIDs is a big performance drag. I'd rather allocate and return a new slice. But there is some performance impact to that, so if you have to keep the mutation, I'm ok with that, but I'd rather make the mutation as obvious as possible in the function signature.
8d32d51
to
45ca19a
Compare
On Thu, Dec 08, 2016 at 02:03:15PM -0800, Stephen Day wrote:
Also, note that the dependency hasn't been added. Considering adding
digest package here, as well.
If we are including a reference ChainID implementation in this spec, I
think we *have* to include the digest package here too (or enough of
it to get ChainIDs working, which is not a big piece). I don't like
the spec consumes Docker, which consumes the spec, etc. reference
cycle.
|
The spec does not "consume" docker. The digest package is just a Go package from docker. Is the problem that the dependency has the name "docker" in it, like it is some big-bad word? |
On Thu, Dec 08, 2016 at 02:22:48PM -0800, Stephen Day wrote:
The spec does not "consume" docker. The digest package is just a Go
package from docker. Is the problem that the dependency has the name
"docker" in it, like it is some big-bad word?
Yes ;). To a spec reader, it is unclear how tightly
github.com/docker/distribution/digest is tied into the rest of
github.com/docker/distribution or what the process is for managing
github.com/docker/distribution/digest releases (are they independent
of github.com/docker/distribution as a whole?).
I'd be completely fine with a github.com/docker/digest that had no
non-Go-stdlib deps.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall implementation looks fine, but there is the library to sort out.
// identifying the result of subsequent layer applications. | ||
package chain | ||
|
||
import "github.com/docker/distribution/digest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to vendor this, right?
Or, we move that package into this repo. Given the strong dependence on that library and that these string digests are part of the spec seems we should add that package here as well. We already have one open-coded impl here:
sum := sha256.Sum256([]byte(tt.manifest)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we have a solid package and mature implementation, I'm thinking that we can bring the distribution/digest
into OCI. I'm seeing some disconcerting patterns being propagated that this package solves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd be strongly in favour of moving it in here so that any spec implementers can/will easily leverage it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonboulle @philips After some consideration, I've broken this out into a separate package, https://github.com/docker/go-digest. The reason being that this has wider use that just OCI images.
If possible, it might be good to move this over to opencontainers/go-digest
but I think this meets the current concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear my concerns were around importing an unversioned package from a large out-of-org repository (nothing to with the particular provenance or import path of said package). Moving to importing an unversioned package from a small out-of-org repository is a minor improvement but ideally I'd rather see this in-tree, with a secondary preference for another opencontainers/* repo and vendored from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonboulle @philips I'm not sure if your implications apply here, seeing the history of the digest repository. In fact, moving it out will make it less stable, but I want to make everyone happy.
If you can get me an opencontainers/go-digest
repository by the 1pm PST, I'll be more than happy to move it over and support the transition in Docker itself.
1979514
to
3ea8049
Compare
@philips Updated to vendor |
5189fe3
to
7d6ba7b
Compare
digest "github.com/docker/go-digest" | ||
) | ||
|
||
// FromReader returns the most valid digest for the underlying content using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "most valid digest" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the tarsum days, this used to do content resolution. I'll update the comment.
I've fixed this upstream and will update this PR
Thanks, generally this looks good - unfortunately it looks like the git validation checker is choking on the vendor commits again :-( - @vbatts what did we end up doing for that? |
de7ed1b
to
2201048
Compare
@jonboulle I looked for an easy patch to |
@stevvooe I forgot about #148 (comment) - did you use |
2201048
to
76520cb
Compare
@philips @jonboulle Updated to use There may be some follow up after we work out some administrative details on go-digest (such as a release), but let's get this in. |
@@ -1,29 +0,0 @@ | |||
Blackfriday is distributed under the Simplified BSD License: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking I ran the make update-deps
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that glide-vc
is not taking test dependencies into account.
We also need to stop committing broken garbage into the Makefile. None of this is making this easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this PR should be correct now.
I'm sending a separate PR with the fixes to the Makefile
.
@@ -1,19 +0,0 @@ | |||
Copyright (c) 2015 Dmitri Shuralyov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another accidental removal?
The specification defines an algorithm to calculate a `ChainID`, which can be used to identify the result of subsequent applications of layers. Because this algorithm is subtle and only needs to implemented in a single place, we provide a reference implementation. For convenience, we provide functions that calculate all the chain ids and just the top-level one. It is is integrated with the distribution/digest type for safety and convenience. As part of this, the `identity` package has been introduced. For consuming code, a few helpers have been provide to ease transition as the name of the upstream package has not yet been finalized. Users of this package should employ `FromBytes`, `FromString` and `FromReader` where appropriate, which should ease the transition if these packages change. Tests are formulated based on pre-calculation of chain identifiers to ensure correctness. Signed-off-by: Stephen J Day <[email protected]>
Signed-off-by: Stephen J Day <[email protected]>
76520cb
to
05bcdc7
Compare
@opencontainers/image-spec-maintainers PTAL |
- package: github.com/xeipuuv/gojsonschema | ||
version: d5336c75940ef31c9ceeb0ae64cf92944bccb4ee | ||
- package: github.com/russross/blackfriday | ||
version: ~v1.4 | ||
- package: github.com/shurcooL/sanitized_anchor_name | ||
version: 10ef21a441db47d8b13ebcc5fd2310f636973c77 | ||
- package: github.com/opencontainers/go-digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a version, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a version, please?
We still need to cut one (opencontainers/go-digest#23), as @stevvooe mentions at in his earlier comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the immediately prior dependency. Version can be a hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is in the glide.lock file. Only specify a version here if it matters when updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's here in glide.yaml - every other package has an associated version. Can we just keep it consistent for now? IMO glide-update should ideally be idempotent unless someone intentionally requests otherwise.
LGTM |
The specification defines an algorithm to calculate a
ChainID
, whichcan be used to identify the result of subsequent applications of layers.
Because this algorithm is subtle and only needs to implemented in a
single place, we provide a reference implementation.
For convenience, we provide functions that calculate all the chain ids
and just the top-level one. It is is integrated with the
distribution/digest type for safety and convenience.
As part of this, the
identity
package has been introduced. Forconsuming code, a few helpers have been provide to ease transition as
the name of the upstream package has not yet been finalized. Users of
this package should employ
FromBytes
,FromString
andFromReader
where appropriate, which should ease the transition if these packages
change.
Tests are formulated based on pre-calculation of chain identifiers to
ensure correctness.
Signed-off-by: Stephen J Day [email protected]
Relates to #482.
Requires #487.