Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bubble the spec up #7

Merged
merged 4 commits into from
May 11, 2016
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 117 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,155 @@
# multistream - self-describing protocol streams
multistream
===========

multistream is a format -- or simple protocol -- for disambiguating, and
layering streams. It is extremely simple.
[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io)
[![](https://img.shields.io/badge/project-IPFS-blue.svg?style=flat-square)](http://ipfs.io/)
[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](http://webchat.freenode.net/?channels=%23ipfs)

multistream is one of the `multi-protocols`, a set of protocols that solve
problems with self-description.
> Friendly protocol multiplexing. It enables a multicodec to be negotiated between two entities.

## Motivation

To decode an incoming stream of data, a program must either (a) know the format of the data a priori, or (b) learn the format from the data itself. (a) precludes running protocols that may provide one of many kinds of formats without prior agreement on which. multistream makes (b) neat using self-description.
Some protocols have sub-protocols or protocol-suites. Often, these sub protocols are optional extensions. Selecting which protocol to use -- or even knowing what is available to chose from -- is not simple.

Moreover, this self-description allows straightforward layering of protocols without having to implement support in the parent (or encapsulating) one.
What if there was a protocol that allowed mounting or nesting other protocols, and made it easy to select which protocol to use. (This is sort of like ports, but managed at the protocol level -- not the OS -- and human readable).

## How it works - Protocol Description
### Protocol

A `multistream` stream MUST begin with a simple header, followed by an arbitrary stream of data for the specified protocol:
The actual protocol is very simple. It is a multistream protocol itself, it has a multicodec header. And it has a set of other protocols available to be used by the remote side. The remote side must enter:

```
<header>
<arbitrary-stream-data>
> <multicodec-header-of-multistream>
> <multicodec-header-for-whatever-protocol-that-we-want-to-speak>
```

### The header
for example:

The header has three parts:

- `hdr-len` - a varint length, in bytes, for security and binary protocols.
- `path` - the path of the protocol in a universal namespace. UTF-8. must start with a slash.
- `\n` - a newline at the end, for the benefit of text protocols **(included in hdr-len)**.

It looks like this:
```
<hdr-len><path>\n
> /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
```

So the full protocol is:
- The `<multicodec-header-of-multistream>` ensures a protocol selection is happening.
- The `<multistream-header-for-whatever-protocol-is-then-selected>` hopefully describes a valid protocol listed. Otherwise we return a `na`("not available") error:

```
<hdr-len><path>\n
<arbitrary-stream-data>
na\n

# in hex (note the varint prefix = 3)
# 0x036e610a
```

For example:
for example:

```sh
# this header:
# open connection + send multicodec headers, inc for a protocol not available
> /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/some-protocol-that-is-not-available

# open connection + signal protocol not available.
< /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
< na

# send a selection of a valid protocol + upgrade the conn and send traffic
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
> <dht-traffic>
> ...

# receive a selection of the protocol + sent traffic
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
< <dht-traffic>
< ...
```

/echo/1.0
#### Listing

# with hexdump -C style inspection:
0a 2f 65 63 68 6f 2f 31 2e 30 0a |./echo/1.0.|
It is also possible to "list" the available protocols. A list message is simply:

0a - varint hdr len 10 bytes
2f-30 - "/echo/1.0" 9 bytes
0a - newline 1 byte
```
ls\n

### The protocol path
# in hex (note the varint prefix = 3)
0x036c730a
```

`multistream` allows us to specify different protocols in a universal namespace, that way being able to recognize, multiplex, and embed them easily. We use the notion of a `path` instead of an `id` because it is meant to be a Unix-friendly URI.
So a remote side asking for a protocol listing would look like this:

A good path name should be decipherable -- meaning that if some machine or developer -- who has no idea about your protocol -- encounters the path string, they should be able to look it up and resolve how to use it.
```sh
# request
<multistream-header-for-multistream-select>
ls\n

# response
# TODO: maybe include a varint number of protocols here ?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping @jbenet this is something we never set in stone.

Should we include a varint at the beginning of a ls response with the number of protocols (length prefixed messages) that will come next, or a varint with the size of all of those protocols summed together?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think both: a varint of protocols and a varint of total size in bytes (the second is hard on dynamic systems, but may be fine and prevent attack on unsuspecting clients)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GREAT catch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the way I do it is a varint saying the entire size of the response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having only a varint with the number of protocols might make things simpler to parse, since we know the number of rounds to apply.

I'm not seeing any benefit from having another varint for the global size added to the first one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)

Just to confirm, the will still be <> per protocol, correct?

(note: We've being including \n on the end of the protos https://github.com/diasdavid/js-multistream/blob/master/src/lib/interactive.js#L45)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way I do the ls response is to send:

<varint length of entire response>
<varint protocol name length>/protocol/name<newline>
...
...
...
...
<newline>

Copy link
Member Author

@daviddias daviddias Apr 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am expanding the example to remove any kind of ambiguity. This is what Juan proposes:

<varint-total-response-size-in-bytes><varint-number-of-protocols>\n
<varint of (protocol length + newline)><protocol><newline>
<varint of (protocol length + newline)><protocol><newline>
...
<varint of (protocol length + newline)><protocol><newline>

@whyrusleeping sounds good?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a verdict here? :)

Copy link
Member

@jbenet jbenet May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diasdavid we settled this before -- #7 (comment) -- go-multistream needs to change

<multicodec-for-multistream>
<multicodec-of-available-protocol>
<multicodec-of-available-protocol>
<multicodec-of-available-protocol>
...
```

An example of a good path name is:
For example

```
/bittorrent.org/1.0
```sh
# send request
> /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
> ls

# get response
< /ipfs/QmdRKVhvzyATs3L6dosSb6w8hKuqfZK2SyPVqcYJ5VLYa2/multistream-select/0.3.0
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/1.0.0
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-bitswap/0.4.3
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-bitswap/1.0.0

# send selection, upgrade connection, and start protocol traffic
> /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
> <ipfs-dht-request-0>
> <ipfs-dht-request-1>
> ...

# receive selection, and upgraded protocol traffic.
< /ipfs/QmVXZiejj3sXEmxuQxF2RjmFbEiE9w7T82xDn3uYNuhbFb/ipfs-dht/0.2.3
< <ipfs-dht-response-0>
< <ipfs-dht-response-1>
< ...
```

An example of a _great_ path name is:
### Example

```
/ipfs/Qmaa4Rw81a3a1VEx4LxB7HADUAXvZFhCoRdBzsMZyZmqHD/ipfs.protocol
/http/w3id.org/ipfs/ipfs-1.1.0.json
# greeting
> /http/multiproto.io/multistream-select/1.0
< /http/multiproto.io/multistream-select/1.0

# list available protocols
> /http/multiproto.io/multistream-select/1.0
> ls
< /http/google.com/spdy/3
< /http/w3c.org/http/1.1
< /http/w3c.org/http/2
< /http/bittorrent.org/1.2
< /http/git-scm.org/1.2
< /http/ipfs.io/exchange/bitswap/1
< /http/ipfs.io/routing/dht/2.0.2
< /http/ipfs.io/network/relay/0.5.2

# select protocol
> /http/multiproto.io/multistream-select/1.0
> ls
> /http/w3id.org/http/1.1
> GET / HTTP/1.1
>
< /http/w3id.org/http/1.1
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Content-Length: 12
<
< Hello World
```

These path names happen to be resolvable -- not just in a "multistream muxer" -- but in the internet as a whole (provided the program (or OS) knows how to use the `/ipfs` and `/http` protocols).

## Implementations

- go-multistream (WIP)
- node-multistream (WIP)
# Implementations

- [js-multistream](https://github.com/diasdavid/js-multistream) - JavaScript Implementation
- [go-multistream](https://github.com/whyrusleeping/go-multistream) - Go Implementation
- [mss-nc](https://github.com/whyrusleeping/mss-nc) - multistream netcat written in Go
147 changes: 0 additions & 147 deletions multistream/README.md

This file was deleted.