Skip to content
This repository has been archived by the owner on Jul 18, 2023. It is now read-only.

Latest commit

 

History

History
226 lines (159 loc) · 17.9 KB

artifact-authors.md

File metadata and controls

226 lines (159 loc) · 17.9 KB

Open Containers Initiative

Artifact Authors Guidance

This reference provides guidance for authoring new artifact types.

OCI Artifacts Support

OCI Artifacts provides a reference for artifact authors and registry implementors to support new artifact types with pre 1.0 implementations of the distribution-spec. More particularly, this repository has been tasked by the OCI TOB to serve 3 primary goals:

  1. Artifact Authors - guidance for authoring new artifact types, including a clearing house for well known artifact types.
  2. Registry Operators and Vendors - guidance for how operators and vendors can support new artifact types, including how they can opt-in or out of well known artifact types. Registry operators that already implement mediaType filtering will not have to change. The artifact repo will provide context on how new mediaTypes can be used, and how mediaTypes can be associated with a type of artifact.
  3. Clearing House for Well-known Artifacts - artifact authors can submit their artifact definitions, providing registry operators a list by which they can easily support.

Dependencies for the Guidance Provided by this Reference

Scope

This reference is dependent on OCI manifest 1.0 to represent individual artifact types including OCI Image, Helm and Singularity.

Future Scope

Future versions will support new artifact types representing collections of artifacts using OCI Index. A means to identify an index as a type of artifact will be required.

Defining OCI Artifact Types

As registries become content addressable distribution points, tools that pull artifacts must know if they can operate on the artifact. Artifact types are equivalent to file extensions.

A set of comparable file extension use cases include:

  • When users open files, the host operating system typically launches the appropriate program.
  • When users open a file, from within a program, the open dialog filters to the supported types.
  • When search or security software scan the contents of a storage solution, the software must know how to process the different types of content.
  • When users view the storage contents, information about the content is rendered based on the derived type(s) of the content being shown.

OCI Artifacts enables these use cases for OCI distribution spec based registries by specifying a manifest.config.mediaType on the content pushed to a registry.

Visualizing Artifacts

The manifest config.mediaType is the equivalent of a file extension, enabling artifact type differentiation. An example of completed artifacts follows:

Icon Artifact config.mediaType
OCI Image application/vnd.oci.image.config.v1+json
Helm Chart application/vnd.cncf.helm.config.v1+json
Singularity, by Sylabs application/vnd.sylabs.sif.config.v1+json

Defining a Unique Artifact Type

A unique artifact type is similar to defining a file extension. Defining a unique artifact allows various tools to know how to uniquely work with the type. It allows a registry to display the type and tooling, such as vulnerability scanners, a means to know if and how they should interact with the contents.

Defining a unique type involves uniqueness for computer processing, and uniqueness for humans.

For computer processing, artifacts are defined by setting the manifest.config.mediaType to a globally unique value.

Note: The config.mediaType of application/vnd.oci.image.config.v1+json is reserved for artifacts intended to be run and instanced by docker, containerd and other OCI Image runtimes and tool chains.

Each new artifact type MUST be uniquely defined. See registering unique types with IANA for registering unique well-known mediaTypes.

Each mediaType SHOULD use all lower case characters. Multi-part names MAY concatenate words, or use - to separate words. eg: (subtype, sub-type)

The following config.mediaType format is used to differentiate the type of artifact:

[registration-tree].[org|company|entity].[objectType].[optional-subType].config.[version]+[optional-configFormat]

  • registration-tree - represents an IANA root tree. See iana registration trees for public and private tree options.
  • org|company|entity - represents an open source foundation (oci, cncf) or a company (microsoft, ibm) or some unique entity.
  • objectType - a value representing the short name of the type.
  • optional-subType - provides additional extensibility of an objectType
  • version - provides artifact authors the ability to revision their schemas and formatting, enabling tools to identify which format they will process.
  • optional-configFormat - while config.mediaType represents the unique identifier of an artifact, providing a config object is optional. If a config object is provided, .json, .yaml or other standard textual formats are preferred. By utilizing standard text formats, registry operators MAY parse the contents to provide additional context to users. If the config object is null, the optional-configFormat MUST be empty.

Example config.mediaTypes

Artifact Type mediaType
null config application/vnd.oci.null-sample.config.v1
Config as .json application/vnd.oci.json-sample.config.v2+json
Config as .yaml application/vnd.oci.yaml-sample.config.v3+yaml

See visualizing artifacts for additional mediaType examples.

Defining Supported Layer Types

The content of an artifact is represented through a collection of layers and an optional config object. The layers of an artifact may be independent files, collections of files, or ordered collection of tarballs persisted as blobs within a registries storage provider.

Defining artifact layers involves:

  1. Defining layer content, understanding layer de-duping
  2. Optional versioning of layer content
  3. Defining layer.mediaTypes

As an example, OCI Images are represented through an ordered collection of tar archives. Each tar archive represents a layer. Each layer overlays the previous layer.

Some artifacts may be represented as a single file, such as a config artifact representing a deployment details for an application. While other artifacts may include a config object, and a collection of binaries compressed as yet another layer. By separating the layers, the artifact author can benefit from layer de-duplication and concurrent downloading of each layer using various distribution-spec clients. How an artifact client processes layers, merging them with overlay, or extracting them to different locations or hosts is up to the artifact authors.

Layer Content Format

The content layer format is up to the artifact author and may utilize standard or custom formats. Authors are encouraged to utilize existing formats which may benefit from existing libraries to parse those formats. Authors may choose to persist individual files with standard or custom formats, such as .json, .yaml, .bin, .sif, or store a collection of files as a .tar archive. Optionally, the author MAY choose to compress the contents. The extension of the layer.mediaType MUST reflect the format of the blob and its optional compression.

Artifacts authors may define different layers for a range of reasons:

  1. Split layers due to size:
    • Splitting an artifact into multiple layers enables concurrent downloading.
    • Some artifacts will benefit from common layers, such as OCI Images having common base layers and/or common framework/runtime layers.
      For example, in OCI, the layers are ordered where each layer overlays the previous layer, creating a unified collection of files.
      All layer types are the same, with the position in the index being the important aspect.
  2. Split layers for different groupings, reuse and optimization:
    • An artifact may have non-unique environmental config and runtime collections of files. The runtime layer is sent to a host, while the configuration information may be sent to another location on the same host, or to a different host altogether.
    • Layers may split for different localization options, where an artifact client can determine which layers it requires for locales and language requirements. The runtime layer, or the localization layers may be individually updated, isolating the storage and download time for delta changes.
    • Layers may include optional source code, licenses or software bill of materials (SBoM) that are required for legal requirements or policy management. Different elements of a pipeline may pull the layers they require, while saving bandwidth, memory and compute processing for the layers they don't require.
    • In these cases, the ordered position of the layer may not be important, rather the sub-type of the layer identifies its purpose.

De-duping Layers

When designing layer formats, consider how a registry may de-dupe layers that share the same content across different artifacts. Layers may be shared across different instances of an artifact as they're persisted to a registry. The registry may handle de-duping based on the digest of each layer. Artifact layers types are not expected, nor required to be shared across different artifact types. A layer type that represents an OCI Image is not expected to be shared with a helm chart. However, two helm charts that have the same content may be de-duped within the registry. The layer.mediaType provides the definition of the contents.

Layer Versioning

Layer formats that may change should define a version to future proof new enhancements. How the artifact tooling processes forward and backward compatibility is up to the artifact and tooling authors.

Defining layer.mediaTypes

layer.mediaTypes fall into 2 categories:

  1. Existing generic formats: as defined by IANA.
    These types are neither artifact nor vendor|org|entity specific, nor are they versioned.
    Generic formats enable simple single layer scenarios, or where there's little to no differentiation between the different layers.
  2. Vendor|Org|Entity definitions: identify different versioned layer types within the same artifact.
    Unique definitions enable layers to be handled individually without assuming ordered positioning. Each layer may use an existing IANA format, but represent different usages within the artifact type.
    A Registry Doc artifact (application/vnd.oci.sample-repo-doc.config.v1) which represents the contents of a repo cloud be represented as:
    • application/vnd.oci.sample-repo-doc.summary.layer.v1.md - used as the readme of a repo, displayed on a registry webpage.
    • application/vnd.oci.sample-repo-doc.content.layer.v1.tar - provides detailed documentation, including multiple .md files for offline reading.

Vendor|Org|Entity definitions utilize the layer.mediaType with the following format:
[registration-tree].[org|company|entity].[layerType].[optional-layerSubType].layer.[version].[fileFormat]+[optional-compressionFormat]

  • registration-tree - represents an IANA root tree. See iana registration trees for public and private tree options.
  • org|company|entity - represents an open source foundation (oci, cncf), a company or some unique entity.
  • layerType - a value representing the short name of the type.
  • optional-layerSubType - provides additional extensibility of a layerType
  • version - provides artifact authors the ability to revision their schemas and formatting, enabling tools to identify which format they will process. Using a version enables artifact tools to identify the format for processing.
  • fileFormat - provides for standard, or custom formats, enabling artifact authors to leverage existing or custom content processing libraries.
  • optional-compressionFormat - in addition to the fileFormat, optional compression is used. (eg: +gzip, +zstd)

Example Layer Types

Artifact Type mediaType
Simple text file layer application/text
Non-compressed OCI Image layer application/vnd.oci.image.layer.v1.tar
gzip Compressed OCI Image layer application/vnd.oci.image.layer.v1.tar+gzip
gzip Compressed Docker Image layer application/vnd.docker.image.rootfs.diff.tar+gzip
gzip Compressed Helm Chart layer application/vnd.cncf.helm.chart.content.v1.tar+gzip
Singularity SIF layer application/vnd.sylabs.sif.layer.v1.sif
Sample Doc Artifact Summary layer application/vnd.oci.sample-repo-doc.summary.layer.v1.md
Sample Doc Artifact Content layer application/vnd.oci.sample-repo-doc.content.layer.v1.tar

Optional: Defining Config Schema

When defining an artifact type, the persistence of the artifact may be broken up into content and configuration. Configuration may be stored as a layer, or it may be stored and referenced by the manifest.config.

While the value of manifest.config.mediaType is used to determine the artifact type, the persistance of a config object is OPTIONAL. Artifacts can push a null reference for config.json persistance.

Some benefits of placing content in manifest.config include:

  • Tooling can pull the configuration object prior to layers. An artifact component may use the config to determine how and where the layer should be instantiated. The artifact component might send layer request to different compute instances, such as OCI Image layers being distributed differently based on the target OS.
  • Registries may opt-into parsing the manifest.config object if it provides meaningful top-level information. For example, OCI Image Config stores OS, Architecture and Platform information that some registry operators may wish to display. The config object is easy to pull & parse, as opposed to getting a layer url to pull, possibly decompress and parse.

Registries which implement the artifact spec MAY:

  • Parse and process the contents of manifest.config, based on the provided schema of manifest.config.mediaType, offering additional information or actions.
  • Ignore the contents and validation of the config object.

Registering Unique Types with IANA

New artifact types SHOULD register their unique config.mediaType and unique layer.mediaType with IANA to assure ownership of the type. Well-known types MUST register their unique mediaTypes to be published for registry consumption.

References

RFC reference: