diff --git a/tap19.md b/tap19.md index 19fa521..93dfb61 100644 --- a/tap19.md +++ b/tap19.md @@ -13,13 +13,13 @@ # Abstract -This TAP proposes extending TUF to content addressed applications and -ecosystems. These systems typically have non-file representations of objects -with specific hashing routines. This document describes how TUF implementations -can adopt these hashing routines and the properties these content addressed -systems must have to ensure their hash values are robust. Some popular -content addressed ecosystems or applications are Git, IPFS, and OSTree, while -these semantics are also visible elsewhere such as in containers. +This TAP explores how TUF can be adapted to content addressed ecosystems that +have in-built integrity checks for their artifacts. While the TUF specification +supports verifying artifact integrity, it also describes many other semantics +such as key distribution and the ability to delegate trust to different +entities. Essentially, content addressed systems such as Git, IPFS, and OSTree +have artifact integrity capabilities which can be complemented by all of TUF's +other features. # Motivation @@ -104,35 +104,39 @@ uses a Merkle tree under the hood. # Specification -The key differences between regular file targets and content addressable -objects such as Merkle DAG nodes are in how their hashes are computed and how -the TUF verification workflow applies to them. As such, the key focus of this -document is to articulate what is required to design a TUF implementation -capable of recording Merkle objects. This TAP considers two content addressed -systems that both use Merkle DAGs--Git and the Interplanetary Filesystem (IPFS). -These systems differ significantly in the type of data each Merkle node -represents. In Git, each node in the DAG represents a _commit_, or a record of -changes made, while in IPFS, each DAG node represents a file, or the root of a -tree of nodes that collectively represent a file. These systems are different -enough to ensure the contents of this TAP can apply to multiple types of Merkle -Tree or DAG systems not explicitly considered here. +The TUF specification uses file hashes in a number of contexts. All Targets +metadata entries are expected to record hashes of the corresponding entries +using one or more algorithms. Snapshot metadata records the hashes of all +Targets metadata considered valid at the time of its issuance, and Timestamp +metadata points to the currently valid Snapshot metadata file. In each of these +contexts, TUF operates with the assumption that the artifacts whose hashes are +recorded are regular files. + +If these artifacts, TUF metadata or otherwise, were stored in a content +addressed system instead, they would each already be associated with a unique +identifier by that system created using the content of the artifact. Typically, +the identifier is a hash calculated using an ecosystem-specific representation +of the artifact. For examples, see the [motivating examples](#motivation). + +TUF can directly use these identifiers in its metadata instead of requiring +users to calculate separate hash values. As TUF's metadata is agnostic to the +hashing routine employed, this change does not require a change to the schema of +how hashes are recorded. That said, TUF metadata will need to be updated to +indicate the ecosystem in question. Presently, each entry in TUF's targets metadata has two key parts--the identification of the target, and the characteristics of the target. -Incorporating Merkle objects will require consideration to both of these -aspects, as well as to how they are handled during verification. ## Identifying the Target -Currently, file targets are identified by a path that is relative to the -repository's base URL. As discussed before, a Merkle DAG is a hash-based data -structure, so every node is associated with a hash value. Therefore, as the -identifier of each node is ecosystem specific, the strategy used to identify a -target node will vary accordingly. +As TUF is centered around regular file artifacts, each entry uses a path that is +relative to the repository's base URL. In content addressed systems, the name is +not as straightforward, and can instead be ecosystem specific. For example, in +the Git use case, the entry's name can identify the repository and the Git ref +the entry applies to. -In order to support different Merkle DAG ecosystems, this TAP proposes using -RFC 3986's URI structure for the target identifier. This has the following -structure. +Therefore, the entry must clearly identify the ecosystem it pertains to. This +TAP proposes using RFC 3986's URI structure for the entry's identifier. ``` : @@ -140,13 +144,10 @@ structure. The `scheme` contains a token that uniquely identifies the Merkle DAG ecosystem while `hier-part` contains the location or identifier of the specific target. - -For example, every Git repository contains a Merkle DAG, in which every node is -a commit object, and each commit has a unique identifier generated using SHA-1. -So, when the Merkle DAG in question is that of a Git repository, the target -identifier may point to the repository as a whole or perhaps a specific branch -or tag within it. The details of a Git-specific implementation of this TAP -must be communicated using a POUF. +In the Git example, the `scheme` may be `git` and the `hier-part` can indicate +the repository and other information. Note that the specifics of how this TAP +applies to Git repositories must be recorded in the corresponding POUF, this +document does not formally specify how it applies to any particular ecosystem. ``` git: @@ -154,24 +155,9 @@ git:?branch= git:?tag= ``` -On the other hand, IPFS introduces the concept of locating arbitrary artifacts -by their content, rather than by a particular location. When a file is added to -IPFS, it is then available at an endpoint that uses the cryptographic hash of -its contents. In this instance, it makes sense to use this identifier in TUF -metadata. - -``` -ipfs: -``` - -It is important to note that a file can encompass multiple nodes in the IPFS -Merkle DAG, and in such situations, the identifier should be the root node -which points to the other nodes that make up the file. - -As noted above, this TAP considers these ecosystems at a high level to -demonstrate the proposed changes. More detailed descriptions of how to record -Git or IPFS artifacts considering various use and edge cases must be published -as a POUF dedicated to each ecosystem. +If an ecosystem only relies on hash identifiers, the `hier-part` can record that +directly. In these instances, the `hashes` field may be omitted. As before, this +must be unambiguously described in the ecosystem's POUF. ## Recording the Characteristics of the Target