(WIP) Rewrite TAP 19 to be more focused on TUF properties

Signed-off-by: Aditya Sirish <[email protected]>
theupdateframework · Mar 17, 2023 · 2fd3632 · 2fd3632
1 parent 91d5caf
commit 2fd3632
Showing 1 changed file with 40 additions and 54 deletions.
diff --git a/tap19.md b/tap19.md
@@ -13,13 +13,13 @@
 
 # Abstract
 
-This TAP proposes extending TUF to content addressed applications and
-ecosystems. These systems typically have non-file representations of objects
-with specific hashing routines. This document describes how TUF implementations
-can adopt these hashing routines and the properties these content addressed
-systems must have to ensure their hash values are robust. Some popular
-content addressed ecosystems or applications are Git, IPFS, and OSTree, while
-these semantics are also visible elsewhere such as in containers.
+This TAP explores how TUF can be adapted to content addressed ecosystems that
+have in-built integrity checks for their artifacts. While the TUF specification
+supports verifying artifact integrity, it also describes many other semantics
+such as key distribution and the ability to delegate trust to different
+entities. Essentially, content addressed systems such as Git, IPFS, and OSTree
+have artifact integrity capabilities which can be complemented by all of TUF's
+other features.
 
 # Motivation
 
@@ -104,74 +104,60 @@ uses a Merkle tree under the hood.
 
 # Specification
 
-The key differences between regular file targets and content addressable
-objects such as Merkle DAG nodes are in how their hashes are computed and how
-the TUF verification workflow applies to them. As such, the key focus of this
-document is to articulate what is required to design a TUF implementation
-capable of recording Merkle objects. This TAP considers two content addressed
-systems that both use Merkle DAGs--Git and the Interplanetary Filesystem (IPFS).
-These systems differ significantly in the type of data each Merkle node
-represents. In Git, each node in the DAG represents a _commit_, or a record of
-changes made, while in IPFS, each DAG node represents a file, or the root of a
-tree of nodes that collectively represent a file. These systems are different
-enough to ensure the contents of this TAP can apply to multiple types of Merkle
-Tree or DAG systems not explicitly considered here.
+The TUF specification uses file hashes in a number of contexts. All Targets
+metadata entries are expected to record hashes of the corresponding entries
+using one or more algorithms. Snapshot metadata records the hashes of all
+Targets metadata considered valid at the time of its issuance, and Timestamp
+metadata points to the currently valid Snapshot metadata file. In each of these
+contexts, TUF operates with the assumption that the artifacts whose hashes are
+recorded are regular files.
+
+If these artifacts, TUF metadata or otherwise, were stored in a content
+addressed system instead, they would each already be associated with a unique
+identifier by that system created using the content of the artifact. Typically,
+the identifier is a hash calculated using an ecosystem-specific representation
+of the artifact. For examples, see the [motivating examples](#motivation).
+
+TUF can directly use these identifiers in its metadata instead of requiring
+users to calculate separate hash values. As TUF's metadata is agnostic to the
+hashing routine employed, this change does not require a change to the schema of
+how hashes are recorded. That said, TUF metadata will need to be updated to
+indicate the ecosystem in question.
 
 Presently, each entry in TUF's targets metadata has two key parts--the
 identification of the target, and the characteristics of the target.
-Incorporating Merkle objects will require consideration to both of these
-aspects, as well as to how they are handled during verification.
 
 ## Identifying the Target
 
-Currently, file targets are identified by a path that is relative to the
-repository's base URL. As discussed before, a Merkle DAG is a hash-based data
-structure, so every node is associated with a hash value. Therefore, as the
-identifier of each node is ecosystem specific, the strategy used to identify a
-target node will vary accordingly.
+As TUF is centered around regular file artifacts, each entry uses a path that is
+relative to the repository's base URL. In content addressed systems, the name is
+not as straightforward, and can instead be ecosystem specific. For example, in
+the Git use case, the entry's name can identify the repository and the Git ref
+the entry applies to.
 
-In order to support different Merkle DAG ecosystems, this TAP proposes using
-RFC 3986's URI structure for the target identifier. This has the following
-structure.
+Therefore, the entry must clearly identify the ecosystem it pertains to. This
+TAP proposes using RFC 3986's URI structure for the entry's identifier.
 
 ```
 <scheme>:<hier-part>
 ```
 
 The `scheme` contains a token that uniquely identifies the Merkle DAG ecosystem
 while `hier-part` contains the location or identifier of the specific target.
-
-For example, every Git repository contains a Merkle DAG, in which every node is
-a commit object, and each commit has a unique identifier generated using SHA-1.
-So, when the Merkle DAG in question is that of a Git repository, the target
-identifier may point to the repository as a whole or perhaps a specific branch
-or tag within it. The details of a Git-specific implementation of this TAP
-must be communicated using a POUF.
+In the Git example, the `scheme` may be `git` and the `hier-part` can indicate
+the repository and other information. Note that the specifics of how this TAP
+applies to Git repositories must be recorded in the corresponding POUF, this
+document does not formally specify how it applies to any particular ecosystem.
 
 ```
 git:<repo identifier>
 git:<repo identifier>?branch=<branch name>
 git:<repo identifier>?tag=<tag name>
 ```
 
-On the other hand, IPFS introduces the concept of locating arbitrary artifacts
-by their content, rather than by a particular location. When a file is added to
-IPFS, it is then available at an endpoint that uses the cryptographic hash of
-its contents. In this instance, it makes sense to use this identifier in TUF
-metadata.
-
-```
-ipfs:<node identifier>
-```
-
-It is important to note that a file can encompass multiple nodes in the IPFS
-Merkle DAG, and in such situations, the identifier should be the root node
-which points to the other nodes that make up the file.
-
-As noted above, this TAP considers these ecosystems at a high level to
-demonstrate the proposed changes. More detailed descriptions of how to record
-Git or IPFS artifacts considering various use and edge cases must be published
-as a POUF dedicated to each ecosystem.
+If an ecosystem only relies on hash identifiers, the `hier-part` can record that
+directly. In these instances, the `hashes` field may be omitted. As before, this
+must be unambiguously described in the ecosystem's POUF.
 
 ## Recording the Characteristics of the Target