Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype support for content addressable systems such as IPFS #2325

Open
adityasaky opened this issue Mar 7, 2023 · 8 comments
Open

Prototype support for content addressable systems such as IPFS #2325

adityasaky opened this issue Mar 7, 2023 · 8 comments
Assignees

Comments

@adityasaky
Copy link
Collaborator

NOTE: This ticket is for a potential GSoC 2023 task.

TUF’s specification was written with artifacts stored in traditional file systems in mind. As such, it specifies explicitly how artifacts must be hashed in order to guarantee their integrity. Since TUF was first created, however, content addressable systems for storage and data transmission have become more prominent. Some examples of these systems are Git, the InterPlanetary File System (IPFS), and OSTree. All of these can present a file-like interface for artifacts they store, and have built-in mechanisms for ensuring the integrity of artifacts. When TUF is used with these systems, it is redundant for it to also ensure artifact integrity. Instead, TUF can delegate these guarantees to the underlying content addressable system, and focus on higher level security properties the specification provides. As part of this GSoC project, the participant will add support to an existing TUF implementation to delegate artifact integrity verification to the underlying content addressable system, specifically IPFS.

Also see: theupdateframework/taps#156

Primary Goal

Allow delegating just the black-box targets to the content-addressing system. This is what our current draft TAP, theupdateframework/taps#156, specifies. This is less invasive since, as stated above, targets are already black-box data to the rest of TUF. The draft TAP is pretty agnostic to which mechanism is used --- the examples of Git, IPFS, and OSTree above are taken for example. And given the black-box nature of targets, we think this the correct choice. The GSOC mentee is welcome to aim for support with just one or multiple of those with their prototype implementation.

Stretch Goal

TBD/WIP

GSoC Mentors

If accepted, this task will be mentored by myself (@adityasaky), John Ericson (@Ericson2314), and Marina Moore (@mnm678). This ticket was authored by all of us.

@pandyasio
Copy link

Hi, I am interested in working on this project and applying for GSoC 2023. How can I contact you?

@adityasaky
Copy link
Collaborator Author

This task has been assigned to @shubham4443. @mnm678 would it be possible to assign it to him formally on the issue?

@mnm678
Copy link
Contributor

mnm678 commented Jun 9, 2023

@shubham4443 if you add a comment here I can assign you (Github limits assignees to folks who have commented or have permission in the repo)

@shubham4443
Copy link

@mnm678 Adding a comment.

@jku
Copy link
Member

jku commented Jun 29, 2023

Just thinking out loud here: The seeming difficulty in properly integrating IPFS (and the fact that the uses cases in the TAP seem so different from each other from an implementation perspective) leads me to wonder whether it makes sense for python-tuf to handle the download at all. The whole point of TAP-19 seems to be that the TUF library no longer manages integrity, only the correct delegation... so why would we go through the trouble of abstracting the concept of "download a thing" for all of {http,ipfs,git,ostree}?

What if the application that uses python-tuf just worked like this instead:

updater = tuf.ngclient.Updater(...)

if not updater.get_targetinfo(targetpath)
    raise RuntimeError("oops, target not found")

# tuf has now confirmed the targetpath is signed by the correctly delegated role: we can download
response = requests.get(gateway_url + parse_cid(targetpath), timeout=5)

I can see a couple of possible issues:

  • This wouldn't use python-tuf artifact cache, that could be seen as a negative... but considering the design includes a local ipfs gateway, that seems like the correct place to cache things in this design?
  • the application would have to specifically support IPFS (instead of just using TUF to download a file without caring about the mechanism). I'm no sure if this is a major negative as
    • the different TAP19 systems will likely require that anyway (a git repo/commit is not a file -- there's no point in pretending it is)
    • the IPFS implementation in the PR requires the local gateway anyway: so in practice the application needs to ensure that a gateway is running, meaning it does know about the mechanism

@jku
Copy link
Member

jku commented Jun 29, 2023

What if the application that uses python-tuf just worked like this instead:

updater = tuf.ngclient.Updater(...)

if not updater.get_targetinfo(targetpath)
    raise RuntimeError("oops, target not found")

# tuf has now confirmed the targetpath is signed by the correctly delegated role: we can download
response = requests.get(gateway_url + parse_cid(targetpath), timeout=5)

or as another option: A small python-tuf-ipfs library implements a downloader client library with a nice IPFS specific API that just uses python-tuf like above

@Ericson2314
Copy link

The stretch goal up in the original post is content-addressing the metadata. I finally found some time this morning, and clarified and wrote down my thoughts in https://github.com/Ericson2314/tuf-content-addressing-notes. I would be more than happy to transfer that repo to this org / otherwise make it a collaboration!

@adityasaky in #2415 (comment) you wrote:

@Ericson2314 I'm not sure if this is practical, though it depends on "root" in your message. Do you mean we remove the snapshot role and have the timestamp role identify the IPFS root node that contains the current set of all TUF metadata?

Yes I was very unclear/thoughts have baked. The tl;dr of the notes above is:

  • snapshot role vs timestamp roll separation still seems good
  • It's the consistent snapshots protocol (e.g. version numbers in file names) that is obviated
  • snapshot objects themselves are more important, because they replace the notion of a repository.
  • the root object (in the sense of the the object defining the root role) != the root object of the Merkle DAG, confusing! :)

@shubham4443
Copy link

Prototype can now be found here - https://github.com/theupdateframework/tap19-ipfs-poc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants