Skip to content
This repository has been archived by the owner on Jul 18, 2023. It is now read-only.

OCI artifact manifest, Phase 1-Reference Types #29

Closed

Conversation

SteveLasker
Copy link
Contributor

@SteveLasker SteveLasker commented Feb 10, 2021

PR Status

On the July 21, 2021 OCI call, and additional OCI TOB discussion, the following plan of action was decided:

  • Over the next few weeks the OCI TOB will vote on TOB#99, then a modified version of #96 to reflect the finalized OCI Working group template.
  • While OCI TOB finalizes the working group process, the implementation of Artifacts PR#29 will take place under the artifacts-spec repo that has been created under the oras-project.
    -The artifacts-spec project README reflects the intent that the project will be proposed to be onboarded to the OCI once the working group process is defined.
  • To avoid OCI branding and trademark concerns, the artifacts-spec will use oras mediaTypes and oras paths for apis, avoiding dependencies or conflicts to the distribution-spec based apis.
  • Once OCI defines a working group process that enables the collaboration of the artifacts-spec working group, onboarding of the artifact-spec repo to OCI can begin.

I'm leaving this PR open, and intact with the current files, to preserve the comments.
We continue to implement and take input under oras-project/artifacts-spec


The OCI artifact manifest generalizes the use of OCI image manifest, by reducing the constraints on all artifacts, enabling specific artifact-specs to set constraints for their type. Phase 1 adds support for artifacts to reference other artifacts through a subjectManifest property enabling reference graphs, as those required for secure supply chain efforts.

Phase 1: Reference Types

The PR focuses on Phase 1, enabling reference type support in 2021, supporting secure supply chain artifact types including signatures and SBoMs.

Phase 2 Generic Artifact Versioning Support

Phase 2 will focus on the scenarios outlined in PR #37.

By splitting these out into phases, we can reduce the scope, for 2021, while providing time for phase 2 to evolve.

See: artifact-manifest.md for the overview of content, and artifact-manifest-spec.md for spec details.

Signed-off-by: Steve Lasker [email protected]

@sudo-bmitch
Copy link

This is looking really good. I am still trying to figure out how to tie in use cases that aren't attached to a specific image manifest, but instead the entire repository. Examples that come to mind are TUF targets and snapshots that represent the current state of all known signed images in a repository. Another example could be repository metadata of when it was created, who owns the repo, number of stars, number of pulls, etc.

Ideally, I'd like to have a way to query for these that doesn't conflict the the image tag namespace. If there's a way to query for an artifact by type, but without specifying the attached image digest, I think we'd have a solution.

@SteveLasker
Copy link
Contributor Author

I am still trying to figure out how to tie in use cases that aren't attached to a specific image manifest, but instead the entire repository

Due to the high concurrency of content pushed/pulled to a registry, I don't believe we have a design to handle this. I'm also not sure we have a requirement.

Another example could be repository metadata of when it was created, who owns the repo, number of stars, number of pulls, etc.

This is yet another round of updates I'm hoping we can layer in, once we get past the new OCI Artifact Manifest discussions. See Adding Metadata Services to OCI Distribution-Draft for some initial thoughts. It would account for registries serving [read-only] content, such as pull count, "stars upon thars". I suspect the meta-data queries will come into the list API requirements as well. See Show/Get-Info API Requirements #232-Data Returned

artifact-manifest.md Outdated Show resolved Hide resolved
artifact-manifest.md Outdated Show resolved Hide resolved
artifact-manifest.md Outdated Show resolved Hide resolved
@nishakm
Copy link

nishakm commented Mar 9, 2021

Is there any resolution for @jonjohnsonjr's suggestion on using the OCI index to map references? Something like:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.index.v1+json",
      "size": 7143,
      "digest": "sha256:0228f90e926ba6b96e4f39cf294b2586d38fbb5a1e385c05cd1ee40ea54fe7fd",
      "annotations": {
        "org.opencontainers.image.ref.name": "stable-release"
      }
    },
    {
      "mediaType": "application/vnd.cncf.notary.v2+json",
      "size": 7143,
      "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f",
      "references":[
         {"type": "signature",
          "artifact": "sha256:0228f90e926ba6b96e4f39cf294b2586d38fbb5a1e385c05cd1ee40ea54fe7fd"
          }]
      }
  ],
  "annotations": {
    "com.example.index.revision": "r124356"
  }
}

@SteveLasker
Copy link
Contributor Author

We feel it's best to move forward with the proposal in this PR to decouple from image.manifest and image.index.
Using the new oci.artifact.manifest provides a clear definition for the references required for Notary & SBoMs. It also allows us to eventually add support for weak references, as sketched in #27

@nishakm
Copy link

nishakm commented Mar 9, 2021

We feel it's best to move forward with the proposal in this PR to decouple from image.manifest and image.index.
Using the new oci.artifact.manifest provides a clear definition for the references required for Notary & SBoMs. It also allows us to eventually add support for weak references, as sketched in #27

There is some overlap between references and SPDX relationships. It seems to me that this could be useful here. Or maybe it's overkill and all we need is something describing an undirected/directed and mandatory/optional references.

@SteveLasker
Copy link
Contributor Author

These manifests (oci.image, oci.index. oci.artifacts) are very coupled to how content is stored in a registry, enabling content discovery, acquisition and eventual cleanup. Storing documents like SPDX, 3T-SBoM or others also makes sense as they are content that just happens to be in a registry. Mixing different manifest types can confuse things.
This is why I keep going back to what requirements are we trying to solve.

@jonjohnsonjr
Copy link

jonjohnsonjr commented Mar 9, 2021

It would help me understand what's being proposed if you could revisit the OCI Artifact Manifest Properties section a bit.

I'd like to see a really rigorous description of these fields, similarly to how index and image are defined. Specifically, I want to understand what they mean.

Your current descriptions are really abstract and don't really describe the actual semantics of the fields. Let's separate out the format semantics from your expectations of how registries handle these so that we can discuss those individually. You also introduce a concept of "Extension artifacts" without defining what that is.

I have a feeling that you don't really care about the artifact format -- you actually care about the semantics of the relationships between artifacts. If I'm right, I would suggest that defining a new artifact is a terrible idea, and that what you actually want is to augment the properties of a descriptor such that we can express new kinds of relationships.

Your current proposal seems to be limited in that only new artifact manifests are allowed to have these new kinds of relationships, which seems inflexible and less powerful than enhancing the existing relationship abstraction we already have (the descriptor). I'd like to be able to express these kinds of things using existing formats and new formats. I don't want to have to invent a new format to express any other kinds of relationships we come up with.

@SteveLasker
Copy link
Contributor Author

I just did a presentation on the new OCI Artifact Reference types and their supported scenarios and needs. The deck is here. As the videos are uploaded here I'll update with the specific link.

I have a few Notary v2 and ORAS updates to complete for Notary prototype-2. After that, I'll convert the current examples to an actual spec oci-artifact-manifest-spec.md, identifying the specifics you and others have been asking for.

For example:

  • [manifests] references must be in the same repo as they extend another artifact.
  • [manifests] entries are optional. Individual artifacts, like OPA or even the current implementation of Helm and CNAB could use this new manifest and simply not use [manifests] until they need them.
  • Artifact that uses oci.artifact.manifest, and includes a [manifest] entry, are subject to deletion when their referenced manifest is deleted. If you delete net-monitor:v1, all the Notary v2 signatures and associated SBoMs would be deleted. (ref counted -1)

We have been through several rounds of discussions for changing the descriptor or one of the existing manifests. These were all non-starters, with lots of filibustering. Rather than thrash existing schemas, implying a lot of instability to tooling that's already making lots of assumptions about the current manifests, we're focused on the new manifest to address the new needs. Since it's a superset of image.manifest, there's nothing stopping the current image tools from adopting it. It could be the basis of the versioning problem we're having with any changes to image.manifest.

@jonjohnsonjr
Copy link

jonjohnsonjr commented Mar 9, 2021

The deck is here.

Is this available in a less hostile file format?

filibustering

sigh

Since it's a superset of image.manifest

I don't believe you understand what superset means.

It could be the basis of the versioning problem we're having with any changes to image.manifest.

This does not solve any problems with versioning. It's just a new version. There aren't any proposed mechanisms for how to change it that differ in any way from what we have today, as far as I can tell.

@nishakm
Copy link

nishakm commented Mar 9, 2021

Regarding requirements: What are they exactly? This is what I have been able to grok thus far:

  • We want to store artifacts that are related to a container image (signatures, SBoM, supplemental artifacts, etc)
  • We want to store artifacts that reference one or more container images edit: and their related artifacts (Helm charts, CNAB, k8s deployments, etc)
  • We want all of these collections of related and referenced artifacts to be movable from registry to registry without changing their relationships

From the garbage collection point of view, it makes sense to me that there needs to be a "root" that has all the connections to all of the artifacts, and OCI index seems to be a good candidate for it. But I can also see the need for something that describes all of these artifacts and their relationships and this is where the SBoM can actually help. Things like Helm charts and CNABs can have their own SBoM that describes all the related and required artifacts such as the container images and the signatures for the container images.

Regarding the digest of index.json, I don't think this is a problem. Folks want to know what changed and where in the artifact tree the change happened. IMHO, the digests are the versions.

@nishakm
Copy link

nishakm commented Mar 10, 2021

Your current proposal seems to be limited in that only new artifact manifests are allowed to have these new kinds of relationships, which seems inflexible and less powerful than enhancing the existing relationship abstraction we already have (the descriptor). I'd like to be able to express these kinds of things using existing formats and new formats. I don't want to have to invent a new format to express any other kinds of relationships we come up with.

IIRC, there were some concerns on allowing arbitrary content descriptors with regards to backwards compatibility with existing client tools. Initially, I had looked at content descriptors to describe things and their relationships. Unfortunately, "backwards compatibility" seems to be the de-facto reason for not including something in the spec so my recollection may be faulty.

Personally, I think there is nothing stopping registries from being instantiated as an "everything else" storage solution like bundle.bar and creating a whole distributed thingy around that, including a new artifact merkle DAG that has nothing to do with the image spec.

@jonjohnsonjr
Copy link

jonjohnsonjr commented Mar 10, 2021

Initially, I had looked at content descriptors to describe things and their relationships.

I do this all over the place, and it's a good pattern. The content descriptor is a generic and useful abstraction, even outside of OCI, and I've been trying to get more people to adopt it instead of inventing new stuff.

Personally, I think there is nothing stopping registries from being instantiated as an "everything else" storage solution like bundle.bar and creating a whole distributed thingy around that, including a new artifact merkle DAG that has nothing to do with the image spec.

This is exactly how the registry is designed and works today. I'm fine with creating a new kind of generic node in the DAG if we think we need one, but defining the semantics of that will be tricky. As far as I know, all registries today are "strongly typed" in that they only know how to parse a small number node types (by their mediaType, as indicated in the Content-Type header): image and index.

Index is a list of pointers, so you can implement any kind of graph you want -- if you squint and think about Lisp, this is really powerful.

Image is a list of pointers + a special pointer. This is convenient, but not any more powerful than an index, really.

One unfortunate reality of dealing with registries in the wild is that there are vastly different interpretations of the image and registry specs, especially around garbage collection and what an image or index is allowed to reference. Can images only reference blobs? Can indexes reference blobs, or just manifests? What do we do if the registry doesn't understand a media type of a descriptor within a manifest? Should we just ignore it? Assume it's a blob? Assume it's a manifest? Are blobs and manifests in the same CAS namespace, or should those be treated separately -- e.g. if I push something through /manifests/ should it be readable through /blobs/ -- vice versa?

I've had a couple ideas around this (off topic but we can get into that if anyone is interested), but they would require registry operators to all agree on some semantics that are currently undefined and with mutually incompatible implementations :(

This is one reason I really want Steve to spell out the semantics of these new artifact types. Up until this point, we haven't defined anything about ref counting or garbage collection expectations. This new artifact type introduces requirements around that, so we need to address the baseline expectations of registries if we're going to layer on top of them. It doesn't make sense to define a weak reference if we don't also define a strong reference, or at least contrast the weak reference with "every other kind of reference is undefined behavior and registries can do whatever they want".

Unfortunately, "backwards compatibility" seems to be the de-facto reason for not including something in the spec so my recollection may be faulty.

I've brought up ~two separate concerns around backward compatibility, and I don't think I've done a great job of expressing my points, so let me try to clarify:

  1. If it's possible to adapt your use case to work with existing clients and registries such that we don't have to change anything and everything continues to work, we should do that. This was roughly the conclusion of the OCI Artifacts stuff, I believe.
  2. If we really need to add new functionality to clients or registries to support a new use case, let's do it in the least disruptive way possible:

I think we've gone past the first point and into the second point now, since registries will need to maintain or produce an inverted index for weak references. As I've said before, weak references and inverted indexes would be generally useful constructs for other artifacts, and I think they should be pulled out of this massive, confused proposal so that we can talk about the best way to go about implementing them in isolation.

I have a huge problem with just adding another artifact type and defining entirely new semantics for only that artifact type because it doesn't fit into the existing design of OCI data structures at all. We also ran into a similar problem with foreign layers, which I believe similarly landed in docker and OCI by fiat from Microsoft because it was a business requirement. It doesn't fit into the model, doesn't compose with other abstractions, is completely under-specced, and is a huge source of bugs -- they even have a CVE!

I'll try to explain again my issue with this, abstractly, in terms of boxes and arrows:

The current proposal defines a new type of box that is very slightly different in shape from the existing boxes, but the primary feature of this new type of box is that it has a new kind of arrow, even though those arrows are defined in the exact same way as arrows coming out of other boxes, and look identical, so there's no indication that they should be treated differently outside of the definition of the box. Also, only some of the arrows coming out of the new box are of the new kind.

image

At this point I don't really care about stopping Steve from defining a new artifact type. I think it's a bad idea, but my primary goal is just to make the design of the new mechanism not bad. These dashed arrows shouldn't be specific to an artifact manifest. We have already formally specified the behavior of arrows. Why can't we make "dashed" a property of an arrow instead of a property of the box that contains the arrow? The Descriptor definition specifically calls out that it should be considered for extension before doing format-specific things:

Extended Descriptor field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification.

@nishakm
Copy link

nishakm commented Mar 10, 2021

Initially, I had looked at content descriptors to describe things and their relationships.

I do this all over the place, and it's a good pattern. The content descriptor is a generic and useful abstraction, even outside of OCI, and I've been trying to get more people to adopt it instead of inventing new stuff.

This section probably needs more examples then. I don't quite understand how This section defines the application/vnd.oci.descriptor.v1+json media type. and mediaType string: This REQUIRED property contains the media type of the referenced content relate.

Unfortunately, "backwards compatibility" seems to be the de-facto reason for not including something in the spec so my recollection may be faulty.

I've brought up ~two separate concerns around backward compatibility, and I don't think I've done a great job of expressing my points, so let me try to clarify:

1. If it's possible to adapt your use case to work with existing clients and registries such that we don't have to change _anything_ and everything continues to work, we should do that. This was roughly the conclusion of the OCI Artifacts stuff, I believe.

I thought this was not possible as existing clients will either try to spin up a set of blobs when they shouldn't or barf when encountering a manifest layout they do not understand.

2. If we really need to add new functionality to clients or registries to support a new use case, let's do it in the least disruptive way possible:

I'm not sure existing clients are capable of addressing supplemental or related artifacts. However, index.json sounds like it's capable of accommodating an "artifacts" manifest as Steve has described. The relationships/references thing can be discussed some more. My other concern with the content descriptor is the requirement to adhere to IANA descriptors. I suppose one could just use json, but I am still unsure how to actually use them 😅.

Unfortunately, most of my concerns around this proposal aren't really captured by the notary requirements, so it's hard to argue with Steve who will only consider concerns valid if they can be mapped directly to a notary v2 requirement.

I think other folks also have the need to be able to reference supplemental artifacts to verify supply chain integrity, provenance, etc. The spec, as it is, doesn't meet the base 3 requirements I had listed above. Can we start there instead?

@SteveLasker
Copy link
Contributor Author

If it's possible to adapt your use case to work with existing clients and registries such that we don't have to change anything and everything continues to work, we should do that. This was roughly the conclusion of the OCI Artifacts stuff, I believe.

Artifacts "v1" was really about formalizing what people were already doing: stuffing additional content types in a registry, and just making them look like images, by using the same mediaTypes of an oci.image. While it was easier to identify the type through a formal manifest.artifactType property, it was felt to be too risky to make a breaking change to the schema, and we could just use manifest.config.mediaType. So we did.

The new oci.artifact.manifest supports a new reference type. To your point, these are considered strong references. the Weak references (#27) were deferred, for now. If the referenced artifact under [manifests] is deleted, the artifact referencing it should also be deleted (ref count -1). I'll get this written up in the oci-artifact-manifest-spec.md next week.

If we really need to add new functionality to clients or registries to support a new use case, let's do it in the least disruptive way possible:

The new oci.artifact.manifest is new, but not intended for the existing clients. In fact, it's explicitly avoiding the existing clients as a new manifest.mediaType, to assure we can innovate without breaking compact.

I think we've gone past the first point and into the second point now since registries will need to maintain or produce an inverted index for weak references. As I've said before, weak references and inverted indexes would be generally useful constructs for other artifacts

Yes, we will need a new index, which registry operators can choose their specific implementation. Just a minor point of clarity, as I'd like to think of these as strong/hard references. When you post an oci.image.manifest, the digests of the manifest must already exist in the registry/repo. If not, the manifest put fails. This will be the same for entries in [manifests]. It would not be the case with [references] as defined in the punted #27 proposal.

I think they should be pulled out of this massive, confused proposal

What is massive and confusing?

The new manifest is pretty straightforward. It's a new manifest to decouple from image-specific scenarios. This frees up OCI Image v2, and allows artifacts, which could be images, to evolve cleanly.

  1. A new manifest.artifactType property to decouple from manifest.config.mediaType
  2. [layers] renamed to [blobs]
  3. [manifests] collection for "hard links" to existing manifests in the same repo.

image

At this point I don't really care about stopping Steve from defining a new artifact type. I think it's a bad idea,
Unfortunately, most of my concerns around this proposal aren't really captured by the notary requirements, so it's hard to argue with Steve who will only consider concerns valid if they can be mapped directly to a notary v2 requirement.

I'm mapping designs to meet requirements. Notary, SBoM, GPL Source, Nydus and other artifact types benefit from these. So, yes, these designs do map to requirements, not just Notary. If Notary v2 isn't adopted, these enhancements have value unto themselves. So, I'm not really sure what you're objecting to.

Usable workflows, enabled for the masses to easily create and consume Notary v2 signatures

We've incorporated a lot of great feedback, including the flow to push the image as a digest, push the signature, then do the tag update, so I think we're incorporating all relevant and actionable feedback. We've also demonstrated pretty clean workflows (nv2 demo script and nv2 video, so I'm still not sure what you're objecting to, or even what you're proposing. There's just a lot of debate. You don't have to agree. That's the beauty of opinions and extensions. You don't have to agree or even implement them.

The spec, as it is, doesn't meet the base 3 requirements I had listed above. Can we start there instead?

Can you list the 3 requirements?

@nishakm
Copy link

nishakm commented Mar 11, 2021

Can you list the 3 requirements?

  • We want to store artifacts that are related to a container image (signatures, SBoM, supplemental artifacts, etc)
  • We want to store artifacts that reference one or more container images and their related artifacts (Helm charts, CNAB, k8s deployments, etc)
  • We want all of these collections of related and referenced artifacts to be movable from registry to registry without changing their relationships

I'm going to add a 4th one here: We need to be able to append artifacts based on their relationships

@SteveLasker
Copy link
Contributor Author

SteveLasker commented Mar 11, 2021

Thanks @nishakm,
All 3 are covered in this proposal.
The PR has some examples manifests, for a signature and SBoM here

Below is an image that shows how the individual artifacts are linked together:

  1. net-monitor:v1 image
  2. 3 inked signatures of the net-monitor:v1 image
  3. An SBoM, linked to the net-monitor:v1 image
  4. A signature of the net-monitor:v1 SBoM
  5. Yet Another Artifact Type (YAAT), linked to the SBoM
  6. A signature of the net-monitor:v1- SBoM - YAAT.

All the downward arrows are represented by the existing manifests, and the config and [blobs] collection of the oci.artifact.manifest. The upward arrors represent the entries in the new [manifests] collection.

image

The target experience we're shooting for with the Notary prototype-2 is sketched here

ORAS will be used as a CLI, for demonstration purposes, but ORAS and nv2 will also provide libraries, so you can build this docker type experience

@SteveLasker
Copy link
Contributor Author

Details on the oci.artifact.manifest spec provided. Including a change from manifests to references.

@SteveLasker
Copy link
Contributor Author

SteveLasker commented Mar 18, 2021

I'm on the fence between using [manifests], [references], [manifest-refs] or something else.
The intent is a collection of manifests, as OCI artifacts can refer to other manifests. It's not intended to refer to other blobs.
While it dupes the name of manifests in the OCI Index, that's actually ok, as they both are a collection of manifests. The difference is the OCI Index is a "downward" collection of manifests that make up a thing, pivoted on platform/arch. While the OCI Artifact manifests are a reverse ("upward") reference to manifests, to extend their data.

The other thing to notice in this manifest is it's a subset of the oci-image restrictions. The intent dates back to the refactoring of various artifact types.
Distribution supports all types of artifacts, based on a few manifests.
OCI Artifacts is the means to generically define how something can be structured, to be stored.
Then, you have various Artifact specs, including the image-spec, that take advantage of the various manifests.

The setup here is the image-spec could be a more narrowly defined use of the oci.artifact.manifest spec as it provides a superset of capabilities, with a subset of constraints. It also has clearly defined versioning semantics.

image

manifest-referrers-api.md Outdated Show resolved Hide resolved
aviral26 added a commit to aviral26/distribution that referenced this pull request Jun 17, 2021
aviral26 added a commit to aviral26/nv2 that referenced this pull request Jun 17, 2021
artifact-reftype-spec.md Outdated Show resolved Hide resolved
artifact-manifest.md Outdated Show resolved Hide resolved
@SteveLasker
Copy link
Contributor Author

SteveLasker commented Jul 14, 2021

To clarify the current high-level differences with the artifact-manifest and the existing image-manifest:

Existing Image Manifest Proposed Artifacts Manifest
config REQUIRED config optional as it's just another entry in the blobs collection with a config mediaType
layers REQUIRED blobs, which renamed layers to reflect general usage are OPTIONAL
layers ORDINAL blobs are defined by the specific artifact spec. Helm isn't ordinal, while other artifact types, like container images MAY make them ordinal
manifest.config.mediaType used to uniquely identify different artifact types. manifest.artifactType added to lift the workaround for using manifest.config.mediaType on a REQUIRED, but not always used property, decoupling config.mediaType from artifactType.
subjectManifest OPTIONAL, enabling an artifact to extend another artifact (SBOM, Signatures, Nydus, Scan Results, )
/referrers api for discovering referenced artifacts, with the ability to filter by artifactType
Lifecycle management defined, starting to provide standard expectations for how users can manage their content. It doesn't define GC as an internal detail

The artifact manifest approach to reference types is based on a new manifest, enabling registries and clients to opt-into the behavior, with clear and consistent expectations, rather than slipping new content into a registry, or client, that may, or may not know how to lifecycle manage the new content. See Discussion of a new manifest #41

@SteveLasker SteveLasker force-pushed the oci-artifact-manifest branch from 8b7d4ff to 67857e6 Compare July 14, 2021 23:37
- The max number of blobs is not defined, but MAY be limited by [distribution-spec][oci-distribution-spec] implementations.
- An encountered `descriptor.mediaType` that is unknown to the implementation MUST be ignored.

- **`subjectManifest`** *descriptor*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just call this subject, since there is not reason to restrict to a specific manifest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that be different than another entry in the [blobs] collection?
This has been the subject of the cycles and lifecycle management question that revolve around a manifest is the thing we reason about for user interaction, where the blobs have been implementation details the client and registry can optimize around.
I'd like to have more conversation around this to understand what it would mean for an artifact to contain blobs, but link to other blobs, as opposed to focusing on artifacts reference other artifacts (manifests)


This field contains the `mediaType` of this document, differentiating from [image-manifest][oci-image-manifest-spec] and [oci-image-index]. The mediaType for this manifest type MUST be `application/vnd.oci.artifact.manifest.v1+json`, where the version WILL change to reflect newer versions. Artifact authors SHOULD support multiple `mediaType` versions to provide the best user experience for their artifact type.

- **`artifactType`** *string*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is an artifactType different than a mediaType? Why not just have a mediaType?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you determined above this spec reads as it is looking at "this" format (manifest) to be defined by mediaType and artifactType is a sub-type within artifacts registered with iana.. with certain additional rules and content types expected in the blobs (artifact blobs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a second type system then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm may be missing some level of detail to the question here.
There are a few manifests that registries would know how to process. Hopefully, we can converge on a few, and not have too many, but we are adding one more to generalize and add capabilities over the oci.image.manifest
The registries need to know about these manifest, as defined by the manifest.mediaType. At least that's how I read the difference between image.manifest and image.index

This decouples the registry from knowing about the specific artifacts, just like a filesystem knows how to store files, but doesn't care about the file.extension.

artifactTypes are the means by which Helm, WASM, Notary, SPDX, CycloneDX, Cosign identify themselves. The registry doesn't care from a processing perspective for how to store and retrieve it. But, the registry UX could show icons and details for the type and security scanners can determine how they scan and process the different types of artifacts, as they know what it is, so they know how to scan them. Similarly, each artifact tooling would know if they should continue processing the manifest, or reject it. Just like Word knows how to reject opening a .mp4 file, vs a .docx file.
And, the /referreres API can filter by the artifactType so it could be requested to only return artifactType=cncf.notary.v2 manifests.

Copy link

@deleteriousEffect deleteriousEffect Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Steve here, having a spec that allows any kind of typed object to be put in a field requires registry implementations to become stricter with what they accept compared to today. For example, registries might reject an otherwise well-formed and to spec OCI Image because they don't recognize the config blob's media type, since they need to verify each object is something they know how to deal with in relatively fundamental ways. For distribution, this is as basic as knowing where this object should be located at all, given the digest.

I don't believe ignoring the unknown mediatypes is appropriate for registries accepting pushes either, as this has implications for lifecycle management. For example, I believe some implementations accepted buildkit caches, but since they are OCI Image Index which presumably contained manifests, the associated blobs were deleted during garbage collection. This is pretty opaque to the end user, and I believe that registries which do lifecycle management of manifests and their referenced objects should return MANIFEST_INVALID during push if they are not able to maintain the lifecycle of the manifest correctly.

Having a broad category of "manifests" and "blobs", which are distinctions that I think most people here understand conventionally, allow registries to have a basic understanding of the level of expectations around an object without having to know the media type beforehand. There are already separate API routes for blobs and manifests, so this distinction exists within the v2 API spec today.

For manifests, this does mean that registry implementations will have to know about the mediatype to read the manifest content correctly since registry implementations are expected to perform actions that require inspecting the manifest, such as validation. However, for an object that's classified as a blob, the registry implementation is safe to make certain assumptions such as, that that object will not reference other objects in a way that's relevant to the registry, and that it should not be parsed as JSON. Having this clarity would allow registries and clients enough wiggle room to work together without having to keep in lock step.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deleteriousEffect I generally agree with you, but I don't see how this relates to artifactType.

This REQUIRED property specifies the artifact manifest schema version.
For this version of the specification, this MUST be `3`. The value of this field WILL change as the manifest schema evolves. Minor version changes to the `oci.artifact.manifest` spec MUST be additive, while major version changes MAY be breaking. Artifact clients MUST implement version checking to allow for future, yet unknown changes. Artifact clients MUST ignore additive properties to minor versions. Artifact clients MAY support major changes, with no guarantee major changes MAY impose breaking changing behaviors. Artifact authors MAY support new and older schemaVersions to provide the best user experience.

- **`mediaType`** *string*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding the mediaType directly in the object itself makes future migration challenging or impossible. I know this supports being able to detect the type of the object, but usually this should be done through a specific descriptor path.

If we want to have this ability, we should come up with a different name. Using the mediaType to set the content of "THIS" document will be confusing, since we use it in both roles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this correlates with the ^ q&a.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good thread to tug on, for some background: opencontainers/image-spec#411 (review)


For **Phase 1**, an artifact manifest provides an optional collection of blobs and a reference to the manifest of another artifact.

- **`schemaVersion`** *int*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just drop this, as it is a vestige of old types. No need to carry this into new versions.

Copy link
Member

@mikebrow mikebrow Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I get a hell yes! ( to removing it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always wondered how schemaVersion relates to the embedded mediaType version: application/vnd.oci.artifact.manifest.v1+json
I'd be happy to avoid the duplication.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of everything we see today is schemaVersion 2. It was just used to differentiate it early on. It's not needed for new manifests where this doesn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just asking for clarity.
There is a difference in the schema of the .json document between image.manifest, image.index and artifact.manifest
What I believe you're saying is the manifest.mediaType version provides enough info for how to process the .json document?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I believe you're saying is the manifest.mediaType version provides enough info for how to process the .json document?

Implementations generally shouldn't be trying to peek inside an artifact in order to understand how to process it. By that point, it's often too late. Instead, content should generally be presented alongside a descriptor, which contains the mediaType of the content.

For a registry, these descriptors are sometimes communicated via HTTP headers by the client on push and the server on pull. Sometimes, the descriptors are embedded within other content, e.g. when pulling a manifest list, you select one of the manifests entries (based on some criteria, e.g. platform) and process what you fetch based on the mediaType in that descriptor.

If you try to use the embedded mediaType to process everything, you'll probably end up having to parse content twice. Once to find its mediaType, then again to parse based on the mediaType.

Very old clients didn't send or process HTTP headers sufficiently to behave properly, so this schemaVersion existed to differentiate between schema 1 and schema 2 images. Now that we live in a universe with a nice type system, that shouldn't be necessary.

If you were to add a schemaVersion field to a new kind of artifact, you absolutely don't want to set it to 1 or 2, because this could break old clients that are using schemaVersion to select between two manifest schemas. You could use schemaVersion: 3, but it's easier to just drop it altogether and rely on mediaTypes and descriptors in order to parse content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jon,
Based on all this feedback and more understanding for how versioning is processed for the document to safely evolve, I did remove schemaVersion from the current artifact.manifest and examples


Phase 1 of the OCI Artifact spec will support reference types to existing [OCI Artifacts][oci-artifacts]. The REQUIRED `artifactType` is unique value, as registered with iana.org. See [registering unique types.][registering-iana]. The `artifactType` is equivalent to OCI Artifacts that used the `manifest.config.mediaType` to differentiate the type of artifact. Artifact authors that implement `oci.artifact.manifest` use `artifactType` to differentiate the type of artifact. example:(`example.sbom` from `cncf.notary`).

- **`blobs`** *array of objects*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self.. need to officially define layer in the distribution spec.. missed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I like defining blobs here gut says CAS should own blobs.. and image would do layers is array of blobs.. having ordinality .. over here in artifacts spec.. this would be artifacts or artifactBlobs is array of blobs where ordinality is optional..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

course we could do the same thing by changing objects to CAS objects or something similar..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer something pointing towards their use. Should this be "subjects"? How does it relate to the manifest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be discussing the differences between a manifest, which represents an artifact (container image, helm, wasm, opa, ...) and the content that makes up that artifact, represented as blobs

These are distinct types. They both use descriptors to define how they're persisted as CAS objects, but they do have different meanings, where the registry, and a client processes a manifest for it's content, but should just store and serve blobs (aka layers)
I'm going to defer this one for now, as this concept, as I understand it, is core to the referenceTypes and the work being built atop it.
I'll suggest we continue this conversation as we transition to the artifacts-spec repo.

@SteveLasker
Copy link
Contributor Author

Thanks for all the great feedback, including Hayley's great feedback above ^ around lifecycle management importance, and richer standards around manifests.

On the July 21, 2021 OCI call, and additional OCI TOB discussion, the following plan of action was decided:

  • Over the next few weeks the OCI TOB will vote on TOB#99, then a modified version of #96 to reflect the finalized OCI Working group template.
  • While OCI TOB finalizes the working group process, the implementation of Artifacts PR#29 will take place under the artifacts-spec repo that has been created under the oras-project.
    -The artifacts-spec project README reflects the intent that the project will be proposed to be onboarded to the OCI once the working group process is defined.
  • To avoid OCI branding and trademark concerns, the artifacts-spec will use oras mediaTypes and oras paths for apis, avoiding dependencies or conflicts to the distribution-spec based apis.
  • Once OCI defines a working group process that enables the collaboration of the artifacts-spec working group, onboarding of the artifact-spec repo to OCI can begin.

Thank you for all the great feedback, and please help us round out artifacts under the oras-project, to continue to standardize registry capabilities for all artifact types.

@SteveLasker
Copy link
Contributor Author

I got pinged by a few folks to keep this PR open for continued feedback, while the OCI Working group process evolves.

@SteveLasker SteveLasker reopened this Jul 23, 2021
| `layers` REQUIRED | `blobs`, which renamed `layers` to reflect general usage are OPTIONAL |
| `layers` ORDINAL | `blobs` are defined by the specific artifact spec. Helm isn't ordinal, while other artifact types, like container images MAY make them ordinal |
| `manifest.config.mediaType` used to uniquely identify different artifact types. | `manifest.artifactType` added to lift the workaround for using `manifest.config.mediaType` on a REQUIRED, but not always used property. |
| | `subjectManifest` OPTIONAL, enabling an artifact to extend another artifact (SBOM, Signatures, Nydus, Scan Results, )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern in having subjectManifest REQUIRED? In the case it extends another artifact like another SBOM etc, it will include the manifest descriptor of the corresponding artifact

Copy link
Contributor Author

@SteveLasker SteveLasker Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the current version at https://github.com/oras-project/artifacts-spec/blob/main/artifact-manifest.md
subjectManifest is now optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is a 404 for me. https://github.com/oras-project/artifacts-spec/blob/main/artifact-manifest.md seems to be the new location

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ughh, github is good with repo renames, but not so good with file renames. We merged oras-project/artifacts-spec#5 to cleanup filenames.

I fixed the link above.

@mikebrow
Copy link
Member

A lot of great content here... alas this draft will go read only soon as the artifacts mission is being moved to opencontainers/image-spec and this repo is being archived.

@mikebrow
Copy link
Member

closing for now due to pending archive action.. pls reopen if archive is not completed and/or if you believe this close to be in error

@mikebrow mikebrow closed this Jul 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.