Skip to content

Commit

Permalink
(WIP) Rewrite TAP 19 to be more focused on TUF properties
Browse files Browse the repository at this point in the history
Signed-off-by: Aditya Sirish <[email protected]>
  • Loading branch information
adityasaky committed Mar 17, 2023
1 parent 91d5caf commit 2fd3632
Showing 1 changed file with 40 additions and 54 deletions.
94 changes: 40 additions & 54 deletions tap19.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@

# Abstract

This TAP proposes extending TUF to content addressed applications and
ecosystems. These systems typically have non-file representations of objects
with specific hashing routines. This document describes how TUF implementations
can adopt these hashing routines and the properties these content addressed
systems must have to ensure their hash values are robust. Some popular
content addressed ecosystems or applications are Git, IPFS, and OSTree, while
these semantics are also visible elsewhere such as in containers.
This TAP explores how TUF can be adapted to content addressed ecosystems that
have in-built integrity checks for their artifacts. While the TUF specification
supports verifying artifact integrity, it also describes many other semantics
such as key distribution and the ability to delegate trust to different
entities. Essentially, content addressed systems such as Git, IPFS, and OSTree
have artifact integrity capabilities which can be complemented by all of TUF's
other features.

# Motivation

Expand Down Expand Up @@ -104,74 +104,60 @@ uses a Merkle tree under the hood.

# Specification

The key differences between regular file targets and content addressable
objects such as Merkle DAG nodes are in how their hashes are computed and how
the TUF verification workflow applies to them. As such, the key focus of this
document is to articulate what is required to design a TUF implementation
capable of recording Merkle objects. This TAP considers two content addressed
systems that both use Merkle DAGs--Git and the Interplanetary Filesystem (IPFS).
These systems differ significantly in the type of data each Merkle node
represents. In Git, each node in the DAG represents a _commit_, or a record of
changes made, while in IPFS, each DAG node represents a file, or the root of a
tree of nodes that collectively represent a file. These systems are different
enough to ensure the contents of this TAP can apply to multiple types of Merkle
Tree or DAG systems not explicitly considered here.
The TUF specification uses file hashes in a number of contexts. All Targets
metadata entries are expected to record hashes of the corresponding entries
using one or more algorithms. Snapshot metadata records the hashes of all
Targets metadata considered valid at the time of its issuance, and Timestamp
metadata points to the currently valid Snapshot metadata file. In each of these
contexts, TUF operates with the assumption that the artifacts whose hashes are
recorded are regular files.

If these artifacts, TUF metadata or otherwise, were stored in a content
addressed system instead, they would each already be associated with a unique
identifier by that system created using the content of the artifact. Typically,
the identifier is a hash calculated using an ecosystem-specific representation
of the artifact. For examples, see the [motivating examples](#motivation).

TUF can directly use these identifiers in its metadata instead of requiring
users to calculate separate hash values. As TUF's metadata is agnostic to the
hashing routine employed, this change does not require a change to the schema of
how hashes are recorded. That said, TUF metadata will need to be updated to
indicate the ecosystem in question.

Presently, each entry in TUF's targets metadata has two key parts--the
identification of the target, and the characteristics of the target.
Incorporating Merkle objects will require consideration to both of these
aspects, as well as to how they are handled during verification.

## Identifying the Target

Currently, file targets are identified by a path that is relative to the
repository's base URL. As discussed before, a Merkle DAG is a hash-based data
structure, so every node is associated with a hash value. Therefore, as the
identifier of each node is ecosystem specific, the strategy used to identify a
target node will vary accordingly.
As TUF is centered around regular file artifacts, each entry uses a path that is
relative to the repository's base URL. In content addressed systems, the name is
not as straightforward, and can instead be ecosystem specific. For example, in
the Git use case, the entry's name can identify the repository and the Git ref
the entry applies to.

In order to support different Merkle DAG ecosystems, this TAP proposes using
RFC 3986's URI structure for the target identifier. This has the following
structure.
Therefore, the entry must clearly identify the ecosystem it pertains to. This
TAP proposes using RFC 3986's URI structure for the entry's identifier.

```
<scheme>:<hier-part>
```

The `scheme` contains a token that uniquely identifies the Merkle DAG ecosystem
while `hier-part` contains the location or identifier of the specific target.

For example, every Git repository contains a Merkle DAG, in which every node is
a commit object, and each commit has a unique identifier generated using SHA-1.
So, when the Merkle DAG in question is that of a Git repository, the target
identifier may point to the repository as a whole or perhaps a specific branch
or tag within it. The details of a Git-specific implementation of this TAP
must be communicated using a POUF.
In the Git example, the `scheme` may be `git` and the `hier-part` can indicate
the repository and other information. Note that the specifics of how this TAP
applies to Git repositories must be recorded in the corresponding POUF, this
document does not formally specify how it applies to any particular ecosystem.

```
git:<repo identifier>
git:<repo identifier>?branch=<branch name>
git:<repo identifier>?tag=<tag name>
```

On the other hand, IPFS introduces the concept of locating arbitrary artifacts
by their content, rather than by a particular location. When a file is added to
IPFS, it is then available at an endpoint that uses the cryptographic hash of
its contents. In this instance, it makes sense to use this identifier in TUF
metadata.

```
ipfs:<node identifier>
```

It is important to note that a file can encompass multiple nodes in the IPFS
Merkle DAG, and in such situations, the identifier should be the root node
which points to the other nodes that make up the file.

As noted above, this TAP considers these ecosystems at a high level to
demonstrate the proposed changes. More detailed descriptions of how to record
Git or IPFS artifacts considering various use and edge cases must be published
as a POUF dedicated to each ecosystem.
If an ecosystem only relies on hash identifiers, the `hier-part` can record that
directly. In these instances, the `hashes` field may be omitted. As before, this
must be unambiguously described in the ecosystem's POUF.

## Recording the Characteristics of the Target

Expand Down

0 comments on commit 2fd3632

Please sign in to comment.