Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add HTTP spec #508

Merged
merged 38 commits into from
Jun 13, 2024
Merged
Changes from 14 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
bc1aa59
add HTTP spec
marten-seemann Jan 22, 2023
1f075f6
2nd attempt for server auth
marten-seemann Jan 29, 2023
12f86b8
require client to authenticate the server when doing client auth
marten-seemann Jan 29, 2023
146c09a
better motivation for libp2p+HTTP (#515)
marten-seemann Feb 14, 2023
5398f5d
fix a few typos
marten-seemann Feb 14, 2023
b6c1bc2
http: use .well-known/libp2p.json for configuration
marten-seemann Mar 2, 2023
8a57943
http: nest libp2p.json config to allow for future configuration
marten-seemann Mar 2, 2023
d506145
Merge pull request #529 from libp2p/http-well-known-configuration
MarcoPolo Jun 1, 2023
946f516
Reformat the spec from the Point of View of an implementer
MarcoPolo Jul 7, 2023
3681472
Add link
MarcoPolo Jul 7, 2023
dd5d07c
Merge comments
MarcoPolo Jul 10, 2023
46d1857
Merge pull request #556 from libp2p/marco/http-update
MarcoPolo Jul 10, 2023
ebe612c
Add note about how this is just one possible auth mechanism
MarcoPolo Jul 10, 2023
7e5a077
Add lidel to interest group
MarcoPolo Jul 14, 2023
db2b3b5
Update http/README.md
MarcoPolo Jul 17, 2023
6319458
Formatting
MarcoPolo Jul 17, 2023
c7c9c43
Add thomas
MarcoPolo Jul 17, 2023
454e25c
Use metadata map and call it protocols
MarcoPolo Jul 17, 2023
a25267b
Add mermaid diagrom for HTTP semantics vs transport
MarcoPolo Jul 17, 2023
3014b22
Grammar fixes
MarcoPolo Jul 17, 2023
f96359b
Lidel suggestions
MarcoPolo Jul 17, 2023
1e87960
Define where the libp2p-token will be
MarcoPolo Jul 17, 2023
d0f0d93
Grammar fix
MarcoPolo Jul 17, 2023
8fbd64a
Specify IX vs NX in auth scheme
MarcoPolo Jul 19, 2023
71415b0
Add SNI and HTTP_libp2p_token to Noise extensions
MarcoPolo Jul 19, 2023
4a03bb0
Reword Namespace section a bit
MarcoPolo Aug 2, 2023
877899d
Remove SNI and token from extensions
MarcoPolo Aug 2, 2023
dc71f2c
Define the multiaddr URI
MarcoPolo Aug 24, 2023
d8850aa
update protocol name for IPFS gateway
marten-seemann Oct 4, 2023
78e8ca1
Be clear about no pipelining
MarcoPolo Mar 14, 2024
d30efda
Use SHOULD instead of MUST
MarcoPolo Mar 18, 2024
8628b5a
Update RFC for connection: close
MarcoPolo Apr 3, 2024
3c0ac40
Rename well-known
MarcoPolo Apr 3, 2024
75bc635
Add sentence on why POST and other mappings
MarcoPolo Apr 3, 2024
f95e4db
Sukun's review comments
MarcoPolo Apr 15, 2024
e3eb9dc
Small typo fixes
MarcoPolo Apr 15, 2024
95ffe6d
Update to http-path
MarcoPolo Jun 3, 2024
8f44d00
Merge pull request #568 from libp2p/marco/multiaddr-scheme
MarcoPolo Jun 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions http/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# HTTP

| Lifecycle Stage | Maturity | Status | Latest Revision |
| --------------- | ------------- | ------ | --------------- |
| 1A | Working Draft | Active | r0, 2023-01-23 |

Authors: [@marten-seemann, @MarcoPolo]
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Interest Group: [@lidel]

[@marten-seemann]: https://github.com/marten-seemann
[@MarcoPolo]: https://github.com/MarcoPolo
[@lidel]: https://github.com/lidel
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
## Introduction

This document defines how libp2p nodes can offer and use an HTTP transport alongside their other transports to support application protocols with HTTP semantics. This allows a wider variety of nodes to participate in the libp2p network, for example:
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

- Browsers communicating with other libp2p nodes without needing a WebSocket, WebTransport, or WebRTC connection.
- HTTP only edge workers can run application protocols and respond to peers on the network.
- `curl` from the command line can make requests to other libp2p nodes.

As well as allowing application protocols to make use of HTTP intermediaries such as HTTP caching and layer 7 proxying and load balancing. This is all in addition to the existing features that libp2p provides such as:
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

- Connectivity – Work on top of WebRTC, WebTransport, QUIC, TCP, or an HTTP transport.
- Hole punching – Work with peers behind NATs.
- Peer ID Authentication – Authenticate your peer by their libp2p peer id.
- Peer discovery – Learn about a peer given their peer id.

## HTTP Transport vs HTTP Semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is precedent for this in libp2p specs, but I thought the visual diagram you had created for earlier HTTP discussions was useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hesitant to include that diagram since it's a png and hard to update. I've tried replicating with Mermaid so that it's text based. wdyt? I don't think it's a nice and maybe a bit more confusing than the text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've attached the image I was referring to so others are aware.

It seems like that diagram was particularly useful when it came to talking about how "vanilla/simple request response" maps. That's not in scope for this spec. I'm still not opposed to having the diagram you have, although I do think it would be useful to color code the lines like you did in the diagram below to cover the "HTTP over libp2p" and "Just HTTP" cases.

Libp2p request response

MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

HTTP is a bit of an overloaded term. This section aims to clarify what we’re talking about when we say “HTTP”.

*HTTP semantics* ([RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html)) is the stateless application-level protocol that you work with when writing HTTP apis (for example).

*HTTP transport* is the thing that takes your high level request/response defined in terms of HTTP semantics and encodes it and sends it over the wire.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

When this document says *HTTP* it is generally referring to *HTTP semantics*.

## Interoperability with existing HTTP systems

A goal of this spec is to allow libp2p to be able to interoperate with existing HTTP servers and clients. Care is taken in this document to not introduce anything that would break interoperability with existing systems.
Comment on lines +72 to +76
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a bit confusing to me. Above you are saying the you generally refer to HTTP semantics and the next sentence says that a goal is to interoperate with existing HTTP servers and clients which refers to the transport, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refers to both actually


## HTTP Transport

Nodes MUST use HTTPS (i.e. they MUST NOT use plaintext HTTP). It is RECOMMENDED to use HTTP/2 and HTTP/3.

Nodes signal support for their HTTP transport using the `/http` component in their multiaddr. e.g. `/dns4/example.com/tls/http` . See the [HTTP multiaddr component spec](https://github.com/libp2p/specs/pull/550) for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 so.. this exists:

$ curl -q https://cid.contact/routing/v1/providers/QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR | jq
{
      "Protocol": "unknown",
      "Schema": "unknown",
      "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
      "Addrs": [
        "/dns4/dag.w3s.link/tcp/443/https"
      ]
    },

but #550 does not mention /https notation at all.

@MarcoPolo Should we explicitly state in that spec that /https is an alias for /tls/http + do the below here?
Or was this discussed already elsewhere?

Suggested change
Nodes signal support for their HTTP transport using the `/http` component in their multiaddr. e.g. `/dns4/example.com/tls/http` . See the [HTTP multiaddr component spec](https://github.com/libp2p/specs/pull/550) for more details.
Nodes signal support for their HTTP transport using the `/http` or `/https` component in their multiaddr. e.g. `/dns4/example.com/tls/http` or `/dns4/example.com/https` . See the [HTTP multiaddr component spec](https://github.com/libp2p/specs/blob/master/http/transport-component.md) for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly state in that spec that /https is an alias for /tls/http

/https is an alias for /tls/http: multiformats/multiaddr#109 and multiformats/multicodec#145

https://github.com/multiformats/multiaddr/blob/master/protocols.csv?plain=1#L31

I'm fine to add a note about /https somewhere though

MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Namespace

libp2p does not squat the global namespace. libp2p application protocols can be discovered by the [well-known resource](https://www.rfc-editor.org/rfc/rfc8615) `.well-known/libp2p`. This allows server operators to dynamically change the URLs of the application protocols offered, and not hard-code any assumptions how a certain resource is meant to be interpreted.

```json

{
"services": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the biggest fan of the term "services" here. I think I would prefer "protocols", "applications", or maybe something else. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "endpoints"? (since that is what we define here, HTTP API endpoints for specific service/protocol)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "endpoints"? (since that is what we define here, HTTP API endpoints for specific service/protocol)

The endpoints are the values though, really what we define is metadata for all supported protocols. We might extend this metadata with additional information later (see @BigLep's comment below). My vote goes down for protocols.

"/kad/1.0.0": "/kademlia/",
"/ipfs-http/1.0.0": "/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So today we have a pair of information: <libp2p protocol name, URL path>
I agree maps make it easy for encoding a collection of pairs.

Do we ever expect to need more information (tuple)?

Maybe encode like:

endpoints : [
  {
    "protocolName" : "/kad/1.0.0",
    "path" : "/kademlia"
  },  {
    ...
  }
]

?

This makes it self documenting and allows expansion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that protocol IDs have to be unique anyway, how about a map of string to object?

{
	"/kad/1.0.0": {
		"path": "/kademlia"
	}
}

Perhaps we can define that implementations must accept both representations and the "string:string" representation is a short-form of the "string:object" notation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more thoughts on this:

  1. The path alone is not enough to form a request. We need more (out-of-band) information like what method to use.
  2. Do you expect a protocol-spec like kademlia to be extended on how it can be accessed from an HTTP transport? (And which methods need authentication for example?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant to add the ability to put more metadata without a use case behind it, but making it a map with path seems easy enough, and would allow adding new fields in a backwards compatible way, which is nice.


  1. The path alone is not enough to form a request. We need more (out-of-band) information like what method to use.

This is up to the application protocol. The application protocol defines how it works and what HTTP methods it uses for what. This metadata only describes where the application protocol is mounted at. It doesn't describe the application protocol.

  1. Do you expect a protocol-spec like kademlia to be extended on how it can be accessed from an HTTP transport? (And which methods need authentication for example?)

Yes. That's up to the application protocol. Kademlia could make use of the simple request-response abstraction in #561 to define this, but ultimately the application protocol decides tis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to be a map from protocol name to metadata that includes a path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. Do you expect a protocol-spec like kademlia to be extended on how it can be accessed from an HTTP transport? (And which methods need authentication for example?)

Yes. That's up to the application protocol. Kademlia could make use of the simple request-response abstraction in #561 to define this, but ultimately the application protocol decides tis.

I see, thanks for explaining. Can we make this clearer? Something like:

This mapping only defines a URL namespace for certain applications protocols. It is entirely up to the application protocol (like kademlia) to define how it can be interacted with over HTTP.

}
}
```

The resource contains a mapping of application protocols to their respective URL. For example, this configuration file would tell a client

1. That the Kademlia protocol is available at `/kademlia` and
2. The [IPFS Path Gateway API](https://specs.ipfs.tech/http-gateways/path-gateway/) is mounted at `/`.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

It is valid to expose a service at `/`. It is RECOMMENDED that the server resolve more specific URLs before less specific ones. e.g. a path of `/kademlia/foo` should be routed to the Kademlia protocol rather than the IPFS HTTP API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is RECOMMENDED that the server resolve more specific URLs before less specific ones

Do we need to specify this? Is there an RFC that deals with path resolution that we could link instead?

from @marten-seemann #556 (comment)


I believe most HTTP path routing libraries do this already. This is RECOMMENDED because it would be confusing to a client if /kademlia (in this example) was routed one way with one server but a different way for another server with the same "services" map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoPolo perhaps we could rephrase this to be more actionable for implementers.
How about:

Suggested change
It is valid to expose a service at `/`. It is RECOMMENDED that the server resolve more specific URLs before less specific ones. e.g. a path of `/kademlia/foo` should be routed to the Kademlia protocol rather than the IPFS HTTP API.
The implementation MUST facilitate the coexistence of different service endpoints by ensuring that more specific URLs are resolved before less specific ones. For example, when registering handlers, more specific paths like `/kademlia/foo` should take precedence over less specific handler, such as `/`, which should be registered last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, but I would change this to a "SHOULD" rather than a "MUST". Thanks!


## Peer ID Authentication

When using the HTTP Transport, peer id authentication is optional. You only pay for it if you need it. This benefits use cases that don’t need peer authentication (e.g. fetching content addressed data) or authenticate some other way (not tied to libp2p peer ids).

Peer ID authentication in the HTTP Transport follows a similar to pattern to how
libp2p adds Peer ID authentication in WebTransport and WebRTC. We run the
standard libp2p Noise handshake, but using `IX` for client and server
authentication or `NX` for just server authentication.

Note: This is just one form of Peer ID authentication. Other forms may be added
in the future (with a different `www-authenticate` value) or be added to the
application protocols themselves.

### Authentication flow

1. The client initiates a request that it knows must be authenticated OR the client responds to a `401` with the header `www-authenticate: libp2p-noise` (The server MAY also include `libp2p-token` as an authentication scheme).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
2. The client sets the `Authorization` [header](https://www.rfc-editor.org/rfc/rfc9110.html#section-11.6.2) to `libp2p-noise <multibase-encoded-noise-protobuf>` . This initiates the `IX` or `NX` handshake.
1. The protobuf is multibase encoded, but clients MUST only use encodings that are HTTP header safe (refer to to the [token68 definition](https://www.rfc-editor.org/rfc/rfc9110.html#section-11.2)). To set the minimum bar for interoperability, clients and servers MUST support base32 encoding (”b” in the multibase table).
2. When the server receives this request and `IX` was used, it can authenticate the client.
3. The server responds with `Authentication-Info` field set to `libp2p-noise <multibase-encoding-noise-protobuf-response>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about Authentication-Info!

1. The server MUST include the SNI used for the connection in the [Noise extensions](https://github.com/libp2p/specs/blob/master/noise/README.md#noise-extensions).
2. The server MAY include a token that the client can use to avoid doing another Noise handshake in the future. The client would use this token by setting the `Authorization` header to `libp2p-token <token>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How and where is this token included in the response? In the noise extensions? In an HTTP header?

This is essentially a cookie, right? Would it make sense to actually use the Set-Cookie header?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cookie headers have a bit of baggage with them. For example they are a Forbidden header name, and cannot be used with a service worker proxy.

The token would be included in the Noise extensions along with the SNI used. That wasn't clear, thanks for pointing that out!

3. When the client receives this response, it can authenticate the server’s peer ID.
4. The client verifies the SNI in the Noise extension matches the one used to initiate the connection. The client MUST close the connection if they differ.
1. The client SHOULD remember this connection is authenticated.
2. The client SHOULD use the `libp2p-token` if provided for future authorized requests.

This costs one round trip, but can piggy back on an appropriate request.

### Authentication Endpoint

Because the client needs to make a request to authenticate the server, and the client may not want to make the real request before authenticating the server, the server MAY provide an authentication endpoint. This authentication endpoint is like any other application protocol, and it shows up in `.well-known/libp2p`, but it only does the authentication flow. It doesn’t send any other data besides what is defined in the above Authentication flow. The protocol id for the authentication endpoint is `/http-noise-auth/1.0.0`.

## Using HTTP semantics over stream transports
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Application protocols using HTTP semantics can run over any libp2p stream transport. Clients open a new stream using `/http/1.1` as the protocol identifer. Clients encode their HTTP request as an HTTP/1.1 message and send it over the stream. Clients parse the response as an HTTP/1.1 message and then close the stream.
Jorropo marked this conversation as resolved.
Show resolved Hide resolved
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

HTTP/1.1 is chosen as the minimum bar for interoperability, but other encodings of HTTP semantics are possible as well and may be specified in a future update.
Jorropo marked this conversation as resolved.
Show resolved Hide resolved

## Using other request-response semantics (not HTTP)

This document has focused on using HTTP semantics, but HTTP may not be the common divisor amongst all transports (current and future). It may be desirable to use some other request-response semantics for your application-level protocol, perhaps something like rust-libp2p’s [request-response](https://docs.rs/libp2p/0.52.1/libp2p/request_response/index.html) abstraction. Nothing specified in this document prohibits mapping other semantics onto HTTP semantics to keep the benefits of using an HTTP transport.

To support the simple request-response semantics, for example, the request MUST be encoded within a `POST` request to the proper URL (as defined in the Namespace section). The response is read from the body of the HTTP response. The client MUST authenticate the server and itself **before** making the request.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved