Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TAP supporting content addressed targets #156

Merged
merged 1 commit into from
Jun 2, 2023

Conversation

adityasaky
Copy link
Contributor

@adityasaky adityasaky commented Jun 20, 2022

This TAP proposes supporting Merkle DAG objects / nodes as targets in TUF metadata. This allows us to extend TUF's protections to various popular applications like Git and other content addressable systems like IPFS and OSTree.

  • One key change proposed in this TAP is that the ecosystem or application decides how to calculate the hash of its artifact, and this value is re-used.
  • Generally speaking, to record the hash of a non-file artifact, we need some representation of it. This TAP indirectly proposes using the representation used by the application or ecosystem in question by re-using the hash based identifier.

The TAP goes into some detail about the properties the application of ecosystem under consideration must possess to ensure the integrity of the hash values. Feedback as a whole is welcome! There are some discussion points in line that I'm also going to copy here:

  • Should conforming implementations also implement base TUF for regular files?
  • What makes sense for length for Git objects? The commit object files pre-compression?
    • This is a use-case specific question that's perhaps better handled when considering the ecosystem specifically rather than this TAP as a whole, but I also wonder if we can provide more guidance on fields that are affected by these changes.

@adityasaky adityasaky force-pushed the merkle-dag-tap branch 2 times, most recently from 39bbc23 to fee9704 Compare June 22, 2022 14:45
@znewman01
Copy link

Very cool, I can see many ways in which this would be useful :)

However, I fail to see what about this proposal is specific to Merkle DAGs. For instance, it seems like Docker images could also benefit (longer digression below). But it really just seems to be describing a pluggable way of adding targets that aren't simple files to TUF.

I agree that Merkle DAGs are a good candidate for non-file types of things we want to hash, and that we need to be really careful about the characteristics of such hashing techniques when we add. But they seem to be inessential to the proposal.


Docker's not the best example here because there is a single file that you can sha256 hash to get the digest---the "manifest", which is a JSON file. I suppose in some sense the manifest represents a Merkle DAG, because it contains hashes of the layers of the image. But checking that a full Docker image has the appropriate digest involves hashing the layers and comparing them to the manifest. So I think it fits.

@trishankatdatadog
Copy link
Member

Cc @erickt

@adityasaky
Copy link
Contributor Author

Hi Zack, thanks for your comments! I want to note some thoughts:

  • I agree that Merkle DAG objects are a subset of entities we can record in TUF metadata. Initially, the idea with this specific
    TAP was to not clear the way for all non-file entities (that would require something like in-toto's ITE-4: https://github.com/in-toto/ITE/blob/master/ITE/4/README.adoc). In most cases, we'd have to specifically define how to calculate the hashes of these abstract entities, there may well be nothing we can use as is. ITE-4 handles that using something called a "hashable representation".
  • The reason it talks about Merkle DAG ecosystems specifically is because apart from opening the door to non-file entities, it also allows for hash values that the TUF implementation didn't explicitly calculate, but was instead provided with. While there are varying degrees of verifiability for these values (it's quite straightforward to verify a Git commit's ID for example), one other thing this draft says is that the valid existence of a node in the graph can be used for that node's verification (Git does this well, for example, erroring out when there are invalid / tampered commit objects). This is where the idea of trusted ecosystems that validate their Merkle DAGs came from, given regular auditing of these applications / ecosystems to ensure new changes haven't undermined the assumptions made.
  • We may not want to go this route for any ecosystems, instead requiring values to be explicitly validated each time. In this case, the TUF implementation would likely have to be aware of the nuances of the ecosystem in greater detail than it'd need to verify the existence of a node in a trusted application. I think this could get quite complicated when we get to "storage backends" like IPFS. I'm currently working on a python-tuf proof-of-concept that's looked at Git so far, but IPFS is what I want to play with next.

There was interest in the last community meeting for something like ITE-4, i.e., more open to non-file entities than what is here, so you're not alone in that regard.

merkle-dag-targets.md Outdated Show resolved Hide resolved
merkle-dag-targets.md Outdated Show resolved Hide resolved
merkle-dag-targets.md Outdated Show resolved Hide resolved
merkle-dag-targets.md Outdated Show resolved Hide resolved
@adityasaky adityasaky force-pushed the merkle-dag-tap branch 2 times, most recently from 6d83f29 to 50b65db Compare November 30, 2022 15:45
@adityasaky adityasaky changed the title Add TAP supporting Merkle DAG targets Add TAP supporting content addressed targets Jan 25, 2023
POUFs/TAF-POUF/pouf2.md Outdated Show resolved Hide resolved
@lukpueh
Copy link
Member

lukpueh commented Feb 23, 2023

This is interesting. Which of TUFs security properties do we actually care for here? Is TUFs integrity protection even relevant, when targets are content addressable? In other words, does the client need to verify the target hash at all, or is it enough that the target path is in targets metadata?

@adityasaky
Copy link
Contributor Author

adityasaky commented Mar 7, 2023

Which of TUFs security properties do we actually care for here?

That's a great point and I think emphasizing that in the text will greatly help. With content addressed systems, we care about all of TUF's properties minus artifact integrity. Let me take a pass on that.

Also note that we've proposed a prototype of this TAP as a GSoC 2023 task. That should help us clarify some of these ideas and better evaluate how this TAP would work in practice.

@adityasaky adityasaky force-pushed the merkle-dag-tap branch 3 times, most recently from 941d7c5 to 91d5caf Compare March 15, 2023 21:46
@adityasaky
Copy link
Contributor Author

@lukpueh I reworked the TAP to focus on TUF properties that matter outside of artifact integrity validation. LMK what you think!

tap19.md Outdated Show resolved Hide resolved
tap19.md Show resolved Hide resolved
@trishankatdatadog
Copy link
Member

(Sorry, just wanted to say pls count me out of reviewing this TAP now as I will be on a few weeks of PTO. Thanks!)

Copy link
Contributor

@jkjell jkjell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and sounds really interesting! I dropped a bunch of noob questions in my review. One of the things I was quite sure where to put was around the common and implicit nature of many of these content addressable systems to be used in a distributed environment. This often leads to separate integrity checks at the "server" and the "client" of the application. I don't know if that needs to be more explicitly addressed or would just be simply covered in the "Security Assessment" of the ecosystem.

POUFs/TAF-POUF/pouf2.md Outdated Show resolved Hide resolved
tap19.md Show resolved Hide resolved
tap19.md Show resolved Hide resolved
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, @adityasaky, this looks a lot better!

tap19.md Outdated Show resolved Hide resolved
tap19.md Outdated Show resolved Hide resolved
Copy link
Contributor

@renatav renatav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there are a couple of things that should be covered by the POUF according to the TAP that we haven't addressed in the TAF POUF, like backwards compatibility. Is that a blocker?

tap19.md Outdated Show resolved Hide resolved
tap19.md Outdated Show resolved Hide resolved
tap19.md Show resolved Hide resolved
tap19.md Outdated Show resolved Hide resolved
tap19.md Show resolved Hide resolved
Copy link
Member

@JustinCappos JustinCappos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor concerns with this as stated in my comments. I am supportive in general, but would like to hear more from other community members (especially Lukas, Jussi, and Marina).

If we had a more definitive policy about having "core TUF" and "TUF extensions" this would be an easy, immediate approve from me as a TUF extension.

@adityasaky adityasaky force-pushed the merkle-dag-tap branch 4 times, most recently from a97fb67 to 442951d Compare April 25, 2023 21:44
@znewman01
Copy link

CC @sudo-bmitch who mentioned that OCI is a little weird—they wrap the content-addressed blobs in some metadata, then use that as the hash

(I know OCI isn't a primary use case here, but it could be an interesting one.)

@sudo-bmitch
Copy link

The TL;DR on OCI is you have the following:

  • A tag that points to a manifest (effectively a mutable symbolic link to a hash)
  • A manifest which is an OCI json structure containing either a list of manifests or blob hashes (but not both)
  • Blobs are any data you want, stored by hash

If a blob isn't referenced by a manifest, a registry will usually garbage collect it after some time. So to structure things in OCI you'd want to identify what the individual blobs need to be, manifests to reference those blobs, and what tags to use to locate the manifests. The sha256 hash of the blobs will be the same across different CAS implementations, but the hash of an OCI manifest will probably only exist in the OCI implementation (but it may be similar in concept to the Git directory listing and hash).

mnm678
mnm678 previously approved these changes May 17, 2023
Copy link
Contributor

@mnm678 mnm678 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, but this looks ready to merge as a draft

tap19.md Outdated Show resolved Hide resolved
Signed-off-by: Aditya Sirish <[email protected]>
Co-authored-by: Renata Vaderna <[email protected]>
Co-authored-by: John Ericson <[email protected]>
Copy link
Member

@JustinCappos JustinCappos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the recent changes, I've happy to approve merging this as a draft.

@adityasaky
Copy link
Contributor Author

@sudo-bmitch thanks for the OCI specific information! This TAP should cleanly apply to OCI by using the digest of the "root" manifest. I'd prefer to add this use case in a separate PR though rather than in this one, so that we can explore this structure some more.

@mnm678 mnm678 merged commit 6b08237 into theupdateframework:master Jun 2, 2023
@adityasaky adityasaky deleted the merkle-dag-tap branch June 2, 2023 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants