Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

IPFS: add a transport layer (and append-only data structures) #148

Open
gritzko opened this issue Jul 24, 2016 · 8 comments
Open

IPFS: add a transport layer (and append-only data structures) #148

gritzko opened this issue Jul 24, 2016 · 8 comments

Comments

@gritzko
Copy link

gritzko commented Jul 24, 2016

(moved from an e-mail thread with @jbenet @haadcode @diasdavid)
Consider TCP/IP (Transmission Control Protocol / the Internet Protocol). Once you open a network address, you get a data stream. That gives certain flexibility, so IP becomes the hourglass waist. IP can run on top of anything, everything can be implemented on top of IP. Most of application-level protocols run on top of TCP, which provides the data pipe API.

ip-hourglass-zittrain

IPFS as such is a git-like graph of static blobs. That makes it simple enough to be the waist, but too limiting to become everybody's base API (data pipe). Can it be the waist then?
I should be able to open a content address (hash or sig) to get a stream of frames.
That is the base API for 95% of stuff.

Streams of immutable frames take a convenient middle ground between mutability and immutability.
IMO, IPFS needs two further steps of generalization:

  • add append-only frame streams as a generalization of the DAG
  • add hyperlogs/partially ordered logs as a further generalization (this allows for multiple concurrent writers).

That way IPFS becomes the waist of sufficient expressive power, on par with TCP/IP.
Merkleweb is simple enough (IP) and streams provide the API to build on.
The current diagram shows "naming" between "merkledag" and "applications".
That is why it feels so strange: naming tries to play the role of the transport layer here!!!

stack

The merkledag is the "thin waist" of authenticated datastructures. It is a minimal set of information needed to represent + transfer arbitrary authenticated datastructures. More complex datastructures are implemented on top of the merkledag, such as:

IPFS needs a transport layer and a transport API.
Something like https://github.com/haadcode/ipfs-log must be a part of the waist.
That will require a pubsub/multicast machinery, obviously.

As a side note, I may recommend to borrow some crypto machinery from RFC7574 which was designed for this kind of use case. (Just ping me if you need it to be generalized to partial orders).

(23 Jul: created; 24 Jul: edited for clarity, moved to notes)

@jbenet
Copy link
Member

jbenet commented Jul 24, 2016

some quick notes:

  • thanks very much for your thoughts and recommendations! 😄 👍
  • We disagree on the semantics of some terms. it is not my hope here to find agreement on "what is more right", just to establish how we think about it.
  • as mentioned elsewhere, we ARE working on pubsub and it WILL be exposed to ipfs applications. but it's not meant to be the thin waist for authenticated data structures.
  • the stack diagram above is not complete, at all. that's just a projection that's useful to think about.
  • this is a more complete diagram, but even still incomplete:
  • of course naming is not a transport... naming is one of several layers that can be (is) added between IPLD and applications. it depends very much on what they use. naming is shown there because many applications use it, and there's an important feature for IPFS: IPNS naming. Also IPNS naming itself depends on underlying transport changes (eg dht vs pub/sub).
  • for us, thin-waist does not always require a transport. for example JSON is a thin waist of APIs and it works over a large variety of transports (HTTP, REST, custom RPCs, telehash, and more).
  • HTML5 (HTML, JS, CSS3) is a thin waist of many application platforms, also deployed over a variety of transports, in the web (http, http2), native apps (electron, cordova), and more.
  • of course the transport is critical. but not the main point of our "thin waist" description. (similar to HTML/JSON vs HTTP). the transport actually varies depending on the use case. you can glob all the relevant transports together and say it's a waist with a common interfaces (IPFS or even libp2p), but we prefer to think of the underlying data structures and semantics as "the thin waist for authenticated data structures". that's much more important and likely to be used in other systems that have very specific transport/distribution mechanics.
  • Think of IPFS as a transport for IPLD. (eg HTTP to HTML/JSON). Of course, IPFS aims to cover many general cases and provide a good enough transport for all IPLD data structs, but it wont cover all cases / applications.

@jbenet
Copy link
Member

jbenet commented Jul 24, 2016

btw i should mention that i think it will take us a few rounds of discussion like this and being exposed to similar use cases/requirements to synchronize our views, so it's totally fine to disagree lots for a while.

@gritzko
Copy link
Author

gritzko commented Jul 27, 2016

A Merkle DAG is a really good abstraction for the waist. Naturally so.
But, it is strictly immutable. Hence, to implement any mutability, you'll need side channels. Hence, IPLD is not the waist.

Consider TCP/IP. The stuff below the waist works in terms of packet/datagram/frame forwarding. The stuff above works in terms of data streams. Both abstractions are very natural, very general, very convenient. The magic is how streams turn into packets and vice-versa.

IPLD object forwarding is perfectly OK for the lower part of the hourglass. They are immutable, cacheable, integrity-checked. But the upper part, IMO, needs that data stream foundation to build on.

I imagine:

    // get a static blob
    var stream = ipfs_api.open("LONG_LONG_HASH");
    // get a git-like DAG
    var dag_stream = ipfs_api.open("HEAD_HASH", O_RECURSIVE);
    // get a live video feed
    var live_video_stream = ipfs_api.open("STREAM_PUBLIC_KEY", O_FOLLOW);
    // get a partially ordered database log (hyperlog)
    var db_op_log_stream = ipfs_api.open("INITIAL_PUBLIC_KEY",
        O_FOLLOW | O_FOLLOW_INVITED_KEYS);

Then, the upper interface is a stream of immutable IPLD objects/frames. The lower interface is object/frame forwarding. Then, you can focus on the magic inbetween and let the crowd's creativity blossom above and below.

img_20160727_114412

Also, the "pub/sub" mental model is possibly off the mark a bit. When we open a TCP connection to receive live data, we don't consider it "pub/sub". We just "read" from a network "address". If we can "just read" from a content address (hash or key), then we have it.

28Jul edit: hash or key

@jbenet
Copy link
Member

jbenet commented Aug 1, 2016

@gritzko okay i think we have found agreement! \o/

And wow, what agreement! i think in the last few days i've thought about what i needed to understand more of what you initially proposed. (sorry if we were speaking past each other). glad we synchronized much faster :) and i think it's great.

This last post jives enormously well with what @mikolalysenko @nicola @diasdavid and I have been discussing in Lisbon the past few days. We've reviewed a bunch of pub/sub lit and landed on "pub/sub of IPLD objects" being the best way to make pub/sub work, but also to upgrade the IPFS core interface with a corecursive programming model (in addition to the existing recursive support). The gist (people are writing this up) is that we want to be able to subscribe to a given key (i.e. representing some IPLD object), and receive (as emitted pub/sub messages) objects that link to the key, either directly or in a log. The stream can fork (i.e. objects can be sent that do not form a strict log) for purposes of partition tolerance. different heads can be merged back and published (like git, blockchains, hyperlog, orbit-db). Any object gaps (due to partitions, being offline, failures, or omission) can just be retrieved normally. there's a lot more but this gist should give you the idea that i think we're on the same page :) 👍

@mikolalysenko
Copy link

@gritzko it is spooky how close that is to what @nicola and I were talking about a few days back.

@nicola
Copy link
Member

nicola commented Aug 1, 2016

@gritzko, @mikolalysenko and I were talking exactly about this:

Also, the "pub/sub" mental model is possibly off the mark a bit. When we open a TCP connection to receive live data, we don't consider it "pub/sub". We just "read" from a network "address". If we can "just read" from a content address (hash or key), then we have it.

I will sync agin with @mikolalysenko (the gist would have been ready if I would not have been sick!)

@nicola
Copy link
Member

nicola commented Aug 1, 2016

This is more and less what we had at the end of the night
img_1082

@jbenet
Copy link
Member

jbenet commented Aug 1, 2016

Thanks for the notebook picture @nicola. the serendipity with this issue continues. I wrote this, got interrupted before posting, and when i came back:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants