Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: additional Pre-Defined Annotation Keys #1046

Open
evankanderson opened this issue Apr 5, 2023 · 17 comments · May be fixed by #1062
Open

Proposal: additional Pre-Defined Annotation Keys #1046

evankanderson opened this issue Apr 5, 2023 · 17 comments · May be fixed by #1062

Comments

@evankanderson
Copy link

evankanderson commented Apr 5, 2023

While attempting to use the Pre-Defined Annotation Keys to track container images back to the corresponding source code, we (re)discovered that org.opencontainers.image.source (URL to get source code for building the image) is insufficient to determine the source code associated with the image when building from a monorepo. (See #886 for an earlier, less action-oriented version of this issue.)

The particular flavor of this issue is that org.opencontainers.image.source is likely to be something like https://github.com/opencontainers/image-spec.git, but those URLs don't provide a way to specify a sub-path within a git repository, which may be useful when building multiple images from a single repo. This is a common practice for many projects, including Kubernetes, Tekton, and Knative (three projects I've interacted with the most).

I'd like to propose documenting the following additional Pre-Defined Annotation Key:

org.opencontainers.image.source.subpath: A relative path within the source repository used as the base directory to build the container (string)

A few questions on definition which we can either ignore, document, or finesse to avoid defining:

  • Windows: do windows image builds use windows path conventions?
  • Default / empty value: Does this mean ., or unknown? If it means "unknown", does that mean tools should generally set .? If it means ., how much do all the existing images mess with this?

I'm happy to propose a PR if this would be useful. @imjasonh @sudo-bmitch as the last two to shepherd changes to this file.

@imjasonh
Copy link
Member

imjasonh commented Apr 5, 2023

In theory there's nothing stopping you from including subpaths in your image.source value:

"org.opencontainers.image.source": "https://github.com/opencontainers/image-spec/path/to/sub/thing"

The spec language is pretty loose about how that's structured, it just says "URL to get source code", so if producers and consumers can agree on the format to make it useful to them, I interpret that as allowed under the spec.

@evankanderson
Copy link
Author

Given that source can be published as a .tgz or .zip, it feels like it's still useful to have a mechanism to indicate a path within the source code which is the basis for the build. I don't have a strong feeling about whether this would indicate the directory containing something like a Dockerfile or main.go, but making org.opencontainers.image.source be a URL which can't actually be fetched without special knowledge of e.g. GitHub URL structures doesn't feel great, especially since some other Git hosting providers like GitLab allow arbitrarily-nested paths to repos.

@sudo-bmitch
Copy link
Contributor

Docker has a syntax where they use # and : to separate the git repo name from the path and tag/ref. I'm not sure how I feel about that for this use case, at the very least the tag/ref is going to conflict with org.opencontainers.image.revision. Given that we have a revision annotation already, I get the logic of adding a path, but the value may be lower than hoped.

First, the value of having the source and revision annotations has a dual use. For builders they know where to look to rebuild. But more importantly, for image users, they can easily see if the image is stale by looking at the commit history.

The diminishing return we are going to see here is that there is more than one build tool, and commands to run to generate a build will differ by project. So just knowing the directory is not enough to create the image. Even if there's a Dockerfile, there may be commands that need to run on the host to setup the build, and args that need to be passed in.

Without knowing the path, the second use case of detecting stale images is possible with what we have now. And even if we add the path, the first use case of knowing how to rebuild the image falls short in various use cases. As an alternative, perhaps we need a free form build command, but that could result in a quine style issue since the annotation to build the image may be included in the command itself.

@imjasonh
Copy link
Member

imjasonh commented Apr 6, 2023

In general I interpret source and even revision as hints intended for humans, and not binding contracts for computers. If you happen to control both the producer and consumer of an image, you can stuff contractually-helpful context into it for both to use, but I don't think it's OCI's place to enforce that.

If your goal is source provenance, or reproducibility, which you'd maybe like to use to automate things, there's much more expressive and robust places to express that, e.g., in in-toto attestations that state the full build steps to reproduce, including where exactly to get source.

It occurs to me I probably should have asked "what are you trying to do" earlier in this thread. 😄

@evankanderson
Copy link
Author

Imagine you have a big pile of images... no not quite that big. You'd like to understand which images are sequential updates to other images, for example to determine if a certain image is stale.

This works great as @sudo-bmitch points out in terms of determining if an individual image is stale, but becomes more difficult to do if you have e.g. 12 independently-released functions which are developed in a monorepo style. source and revision are no longer sufficient to determine whether each function image is up-to-date without starting to take dependencies on conventions like pushing each function to a separate image name (and not copying them later into a shared repo).

Basically, we're attempting to correlate built images in a repository with earlier versions of the same image from the same supply chain / build tool, and we've discovered that Git monorepos (vs a Piper-style "I am the world" monorepo) don't have enough annotation metadata to distinguish e.g. the "fetch" and "ingest" functions in this repo:

Both containers have:

org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.revision: 318fcc8

I want:

org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.source.subpath: fetch
org.opencontainers.image.revision: 318fcc8

and

org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.source.subpath: ingest
org.opencontainers.image.revision: 318fcc8

@evankanderson
Copy link
Author

Obviously, I could make up my own annotation key, but it would be nice to document it in a way that other tools could also benefit. Also obviously, many tools aren't setting these annotations today, but that doesn't stop us from trying to do better.

@sudo-bmitch
Copy link
Contributor

For that problem, the directory may not be enough. From the same directory, multiple images may be generated. E.g. different build tools, docker can point to different Dockerfiles, build args can change the build. Some kind of additional identifier is needed but the directory for your use case may end up being a build arg value for another.

@evankanderson
Copy link
Author

I defined the argument as subPath to allow for tools which might need to reference a file instead of a directory.

I'm not sure how I'd handle "this make target in this subdirectory", but "this Dockerfile" should be referencable. I'd be willing to make the string suitably generic to support "a bazel target at this bazel path" if someone wants to help wordsmith the description.

Maybe:

A tool-specific path within the source repository which may be used to distinguish different build targets in the same repository.

For my purposes, I'd like source + subpath to be a primary key for "equivalent" artifacts over time, and source + subPath + revision to identify specific instances of those artifacts. This can be used (for example) with container scan results or SBOMs to track vulnerability exposure over time.

@evankanderson
Copy link
Author

It doesn't sound like there's a strong objection (though perhaps a bit of "who will use that") for adding the following pre-defined annotation key:

key meaning
org.opencontainers.image.source.subpath A tool-specific path within the source repository which may be used to distinguish different build targets in the same repository. For example, the path to a Dockerfile or a directory to invoke a CNCF Buildpack in.

If that seems acceptable, I'll send a PR for that shortly.

@imjasonh
Copy link
Member

I'm not wholly opposed to adding the annotation key, but I would remove the example. The rest of the proposed meaning makes sense and is I think the the right amount of vague.

@evankanderson evankanderson linked a pull request May 12, 2023 that will close this issue
@tianon
Copy link
Member

tianon commented May 13, 2023

Sorry, but my opinion here is aligned with Brandon's in #1046 (comment); namely, I don't think a single additional key is sufficient for most tools, even if it covers your use case, and I would still suggest solving your specific problem with your own annotation. 🙈

@evankanderson
Copy link
Author

@tianon -- what do you think about the proposition that org.opencontainers.image.source is not sufficient to distinguish whether two containers were built from the same source code? Is there a different mechanism you would suggest to correlate container images from the same source code over time (or do you think that's not a good general purpose use case)?

@tianon
Copy link
Member

tianon commented Jun 23, 2023

I don't think I would interpret org.opencontainers.image.source to be unique, nor that it was intended to be. I think it was intended to be a hint or clue, and that for a given build, more data is definitely necessary in order to reproduce it, and just how much additional data is necessary is going to differ from tool to tool and even build to build within a given tool. 😅

In Docker's BuildKit tooling, this manifests as full provenance objects, for example: https://explore.ggcr.dev/?blob=docker/dockerfile@sha256:5bb344bbbc250f42b6cf85904aaec1feb8125af97d0e3f0302620e17d54224cc&mt=application%2Fvnd.in-toto%2Bjson&size=14542

@evankanderson
Copy link
Author

I'm currently trying to consume annotations to determine, for example "is container X a replacement for container Y" across different tools. While I certainly can invent new metadata and parsing code for each build tool in order to make that determination, I was hoping that this group could recommend a standard annotation so that build tool authors and container-curation tool authors could answer the question about past/future relationship between two container images.

Is that a reasonable thing to attempt, regardless of whether or not this annotation is the right mechanism for implementing it?

@imjasonh
Copy link
Member

I don't think it's a good fit for OCI to define an annotation for "are these the same thing" or "is this a replacement". The specifics of those statements are nebulous and shifting and context-dependent.

I'd recommend defining your own annotation, where you can define the semantics of it yourself.

@vbatts
Copy link
Member

vbatts commented Jun 23, 2023 via email

@evankanderson
Copy link
Author

I don't think it's a good fit for OCI to define an annotation for "are these the same thing" or "is this a replacement". The specifics of those statements are nebulous and shifting and context-dependent.

Sorry, I was trying to explain a specific use-case, but the annotation would effectively be a correlation ID in a stream-processing sense -- image X and image Y are both related to the same underlying application built out of a (git) monorepo. Considering a case like e.g. https://github.com/tektoncd/pipeline/tree/main/cmd, where the source would be https://github.com/tektoncd/pipeline.git, I would like to be able to use image metadata to determine whether a given image is a release of ./cmd/entrypoint or ./cmd/controller (for example). Practical examples of this correlation include understanding the release cadence of specific images or trends in image size or included vulnerabilities over time. In the particular case of Tekton, I imagine that the ko tool would need to store this metadata (today it stores it in a history[n].created_by string as e.g. "ko build ko://github.com/tektoncd/pipeline/cmd/events", which doesn't seem standard).

I agree with not putting this on a wiki -- if we don't think that it's worth adding an attribute to enable correlation of images built from a mono-repo, then I think it's better to drop this feature altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants