Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Signed Address Records #217

Merged
merged 17 commits into from
Nov 19, 2020
Merged

RFC: Signed Address Records #217

merged 17 commits into from
Nov 19, 2020

Conversation

yusefnapora
Copy link
Contributor

@yusefnapora yusefnapora commented Oct 4, 2019

Hey all, this introduces an RFC for a self-certified address record that could be published to a DHT or gossiped about in a pubsub topic without worrying about it being altered in-flight.

This is motivated by the discussion in libp2p/libp2p#47 and libp2p/go-libp2p#436.

I wrote up the record as an IPLD schema, but we could go with another format if we'd rather not have a dependency on IPLD.

I also wasn't sure whether we might want to issue records about peers other than ourselves - this draft does not, but we could have records with distinct subject and issuer peers.

There are some TODOs at the bottom that are better tracked here:

  • how to store signed records
    • should be separate from "working set" that's optimized for retrieval
    • need to store unaltered bytes
  • figure out if IPLD is the way to go here. If not, what serialization format, etc.
  • extend identify protocol to include signed records?

@bigs @raulk @mgoelzer @Stebalien @vyzo

@ghost
Copy link

ghost commented Oct 5, 2019

cc @whyrusleeping

@yusefnapora
Copy link
Contributor Author

What do people think about IPLD? I mostly just thought the schema notation is pretty cool, but I think if we want to use it for real we'd need a block store (at least an in-memory one), which doesn't feel like it's really a libp2p-layer concern. Maybe I'm overthinking that though.

Other options that I can think of are protobuf, which plays nice with most other libp2p things, or CBOR. Raw CBOR is as annoying to work with as raw JSON, but it does have a canonical encoding that is deterministic, which protobuf lacks. I'm not sure that a deterministic encoding is strictly necessary, and as far as I can tell, IPLD doesn't enforce one either. Anyway, protobuf is likely the simplest to use, so that seems like a decent way to go.

@yusefnapora
Copy link
Contributor Author

I should note that I haven't tried actually using IPLD yet :) I'm going to play with it a bit and see how it works in practice. If anyone has experience with it, how is it?

@yusefnapora
Copy link
Contributor Author

I've been thinking about this some more and just wanted to write up where my head is at before I start revising.

Ultimately I think IPLD is overkill for this, and we should just use protobufs. Earlier I was leaning away from this because there's no guarantees of deterministic encoding for protobufs, which was an issue with the peer id spec. However, as long as we sign, transmit and store serialized protobufs, we can still validate the record before deserializing it, and we don't really have to worry about whether our encoder has the same behavior as the one that produced the record.

I think I want to tease this apart into two RFCs:

  1. An all-purpose "signed envelope" record containing a peer id, byte string payload, and a signature of the payload. May optionally include public key (which must be consistent with peer id). Will require a "signature service" that can sign and validate.
  2. An address record with routability & confidence scores. This will get serialized to a byte string, and the encoded record will be wrapped in a signed envelope. The peer store will accept these directly, or inside a signed envelope. If inside an envelope, peer store will reject if signature is invalid. If signature is valid, mark the record as certified & surface that via peer store APIs.

This is basically the same idea as the original draft, except instead of an AddressEnvelope struct that stores a specific structured data type, the envelope only stores opaque binary blobs.

e.g.:

message SignedEnvelope {
  bytes peerId = 1;
  bytes contents = 2;
  bytes signature = 3;

  optional PublicKey pubkey = 4;
}

message AddressInfo {
  bytes addr = 1;
  Routability routability = 2;
  Confidence confidence = 3;
}

message AddressState {
  bytes subject = 1;
  repeated AddressInfo addresses = 2;
  // etc...
}

Storing serialized protobufs as a byte string inside another protobuf feels weird, but it seems necessary, given the lack of a deterministic encoder.

Does this make sense?

@Stebalien
Copy link
Member

  1. Instead of requiring the peer ID and optionally having a key, how about we just always have the key and then always derive the peer ID? That way it's impossible to just assume that the key matches the peer ID?

  2. Yeah, a byte string within a protobuf seems like the right tactic if we're using protobufs.

Also note: we need to tag the data with some kind of "purpose" string so we can't reuse these records as messages in other protocols with signed messages. We do this in pubsub by prepending libp2p-pubsub: before signing/verifying the signature (but not when serializing) but I wonder if we should have a better solution. Maybe prepend a length-prefixed string? Null terminated string?

@yusefnapora
Copy link
Contributor Author

how about we just always have the key and then always derive the peer ID

That is better 👍

Good question about the "domain separation" string. Maybe we could have a fixed string, e.g. "libp2p-signed-message:" but also allow the user to add an arbitrary "purpose" string that gets appended to the fixed string. This could be included in plain text as a field in the SignedEnvelope message, which might be a nice way to let users indicate how to process the contents of the message.

Something like

{
  publicKey: { 
    //...
  },
  contents: Buffer(),
  purpose: "AddressState",
  signature: Buffer()
}

And you prepend "libp2p-signed-message:" ++ purpose to the contents to get the signature ("libp2p-signed-message:AddressState" in the example).

@Stebalien
Copy link
Member

This could be included in plain text as a field in the SignedEnvelope message, which might be a nice way to let users indicate how to process the contents of the message.

That will take extra space in the message itself which may discourage high entropy names.

I actually like forcing users to explicitly specify the purpose/domain. If we include it in the message, users can be lazy and ignore it. If we don't, users will have to call some function like:

func Open(domain string, envelope []byte) (message []byte, author PublicKey, err error) { ... }

And you prepend "libp2p-signed-message:"

I'm not sure if libp2p-signed-message is all that important as long as users specify purpose strings. However, I guess it can't hurt.

Maybe prepend a length-prefixed string? Null terminated string?

BTW, my concern here is that foo is a prefix of foobar which would allow an attacker to add bar to the signed data and pass it off as a foo message. This isn't an issue if we require this prefix to end in : but it may be better to just length-prefix it.

@yusefnapora
Copy link
Contributor Author

Ah, I see where you're coming from with the out-of-band domain string, that makes sense.

Using only user-controlled data for the domain string makes me nervous, although I'm not sure I can really come up with a plausible attack. Still, there's not really any cost to using the fixed string plus the user-provided one, so might be worth doing just in case.

@ghost
Copy link

ghost commented Oct 22, 2019

From my experience with security auditors, letting arbitrary or user-defined input control the interpretation of a message is going to raise a yellow flag for them even if it's not exploitable. (A plain text field in the protobuf does not seem as likely to raise those concerns.) Perhaps something to consider since we've fixed a lot of other stuff that was not technically broken (but is a nonstandard approach) to satisfy auditor concerns.

@Stebalien
Copy link
Member

This is less of an arbitrary user-defined input than the bytes being signed as the purpose string is fixed at compile time.

My only concerns are:

  1. Ideally, we should always combine the purpose with the signed data the same way.
  2. We need to structure the signed data such that we can split $purpose$signedData into the $purpose and $$ignedData without external information. Otherwise, as noted above, messages for protocol foobar could be passed to protocol foo.

purpose:[data] makes (1) easy as we already do this in pubsub, TLS, and QUIC (once we cut a release). However, that does mean that : can't appear in the purpose.

@yusefnapora
Copy link
Contributor Author

Good point - the "user" that controls the domain string is another libp2p subsystem, and there's no external input at runtime.

I guess I lean towards using a : separator and prohibiting : in the purpose string, since it's consistent with what we're doing elsewhere.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shaping up well. Just a handful of initial comments.

RFC/0002-signed-envelopes.md Outdated Show resolved Hide resolved

```protobuf
message SignedEnvelope {
PublicKey publicKey = 1; // see peer id spec for definition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought. Including the pubkey may be superfluous for some signature schemes.

Given an ECDSA signature, one can recover the public key provided we know the curve, the hash function, and the plaintext that was signed. Bitcoin and Ethereum use that trick heavily to validate transactions.

See:

https://crypto.stackexchange.com/questions/18105/how-does-recovering-the-public-key-from-an-ecdsa-signature-work
https://crypto.stackexchange.com/questions/60218/recovery-public-key-from-secp256k1-signature-and-message

RFC/0002-signed-envelopes.md Show resolved Hide resolved
LOCAL = 3;

// public internet
GLOBAL = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need a PROXIMITY enum value for network-less transports that rely on physical proximity, e.g. Bluetooth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GLOBAL = 4;
PUBLIC = 4;

}
```

If Alice wants to publish her address to a public shared resource like a DHT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we recommend that records are rejected outright when they leak addresses that are outside the scope of the discovery mechanism?


// Confidence indicates how much we believe in the validity of the
// address.
enum Confidence {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What classes of problems do you foresee communicating our perceived confidence of our own addresses would solve?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's to signal which dials to prioritise, we can simplify this by conveying a 1-byte precedence value in the range (-128, 128).

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from a sync call with @yusefnapora.

Regarding the peer routing record:

  • For now, modelling confidence and routability should be out of scope, because these two concepts are not yet modelled in the libp2p stack.
  • We should think about "record extensions", where confidence and routability would be two examples of future extensions.
  • Another extension I'd like to see is a "service bloom filter" that allows peers to advertise their protocols compactly and efficiently (without enumerating them, therefore with a degree of privacy preservation), and receivers of that record would be able to probabilistically test whether that authoring peer supports a protocol they're interested in, without having to connect to them.

Regarding the envelope:

  • Add a type field: not sure if this should be a string, an enumeration, a multicodec.
  • Hoist the sequence number from the peer routing record to the envelope.
  • These two fields need to be part of the plaintext to sign.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts regarding the "routability" aspect:

  • What addresses do we want to communicate in a peer routing record?
  • Discovery mechanisms have different scopes of operation, e.g. mDNS operates on LAN and loopback, PEX and the DHT are mostly public, but we do want to facilitate two local peers finding each other and establishing a local connection.
  • For PEX and DHT, maybe we shouldn't even publish local addresses? Doing so has implications in security (you can order a machine to perform local dials to some process in their local network).
  • Instead of publishing local addresses in a global venue (e.g. DHT), we should be having a mechanism so that peers establishing a connection via a public endpoint can detect if they're behind the same network, and "upgrade" to a better route.
  • I guess in some scenarios, NAT hairpinning takes care of the above.

A possibility is having the address filtering be explicit. In most cases, we can communicate local and public endpoints, filtering out the localhost endpoints.

If the user wants to enable localhost endpoint advertisement, they can do so by enabling a flag on the Host.

In the implementation realm, assuming there'll be a peer routing record generator component, which knows all addresses and spits out records whenever they change, notifying downstream components (e.g. DHT, PEX, etc.), those components could provide an address filter when registering.

The problem with the above approach is that it can lead to a "chimera" scenario. If two peers find each other via two venues (e.g. mDNS, DHT), the routing records with different scopes could collide and override each other. Unless we isolate routing records per domain/scope in the peerstore.

Food for thought.

@raulk
Copy link
Member

raulk commented Nov 1, 2019

@yusefnapora aims to have a final draft of incorporating comments by today, in order to be reviewed and merged on Monday, to have a minimal implementation ready by the end of the week in Go.

@Stebalien
Copy link
Member

Another extension I'd like to see is a "service bloom filter" that allows peers to advertise their protocols compactly and efficiently (without enumerating them, therefore with a degree of privacy preservation),

Hm, not really. If these bloom filters are at all useful, peers will be able to ask "does peer X likely speak protocol Y". I'd expect an attacker to ask "which peers speak protocol Y" (e.g., trying to find all the IPFS, Filecoin, etc. peers).

However, shrinking the size of the protocol list is likely worth it (but multicodecs!).

@Stebalien
Copy link
Member

We should think about "record extensions", where confidence and routability would be two examples of future extensions.

YES!

@Stebalien
Copy link
Member

Hoist the sequence number from the peer routing record to the envelope.

Why? Are we planning on duplicating these? I thought the idea was to make the envelope general-purpose.

Add a type field: not sure if this should be a string, an enumeration, a multicodec.

Multicodec prefixed byte string? That allows:

This is also consistent with ENS records.

@yusefnapora
Copy link
Contributor Author

@Stebalien about moving the sequence number to the envelope, we were thinking it might be broadly useful for other kinds of payload, but as I was writing it up, I realized that each system that uses envelopes would need to have their own rules about how to interpret the sequence number (e.g. whether to invalidate & replace old records, merge them, etc). So it may be best to just leave it in the routing record and keep the envelope as simple as we can make it.

For the type field, I think your suggestion makes sense and gives us a lot of flexibility. 👍

Since we're talking about prefixing multiple fields to the content to sign instead of just the domain string, I'm now thinking that length-prefixing the fields is better than using a : separator. Especially if we want the type field to contain a CID, since a valid CID might randomly contain the UTF-8 code point for :.

@yusefnapora
Copy link
Contributor Author

Just wanted to update this, since I've been a bit quiet this week. I've been working on an initial implementation in Go that I'm hoping to make PRs for today.

I diverged from what's written up here in one place. It seemed a bit silly to use varints for length-prefixing when building up the buffer to sign, since we don't need to save space on the wire or anything. So I just used uint64, which seems like way more than enough to fit any payload we might want to use envelopes for. I'll update this RFC to reflect that in a second.

Also, how does everyone feel about the name "routing records"? In the implementation I ended up calling them RoutingStateRecords to emphasize that they contain transient state, but I don't know if that's really any better. I'm also wondering if an even more generic name like "peer records" makes sense, since we're thinking about putting things like the protocol bloom filter in here, which isn't strictly a routing concern. Maybe "availability records"? IDK 🚲 🏠

Anyway, my plan for today is to fix up a few things in the go implementation and push those branches. Then I was thinking I'd merge this RFC in and start new PRs to turn the RFCs into Working Draft specs and move the conversation over to those PRs.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already merged a reference implementation into go-libp2p, so this can be promoted to a spec and be tagged with status 2A (Candidate Recommendation).

@vasco-santos -- are you planning to implement in js-libp2p?

@jacobheun
Copy link
Contributor

Now that this has been implemented in Go and JS I intend to merge this later today unless there is a reasonable objection. Ideally we would couple the approved RFC in a merge with its spec per #198, but I'd prefer this PR not sit open, completed, until someone can dedicate time to completing it.

@aarshkshah1992
Copy link
Contributor

@jacobheun Should we go ahead and merge this ?

@jacobheun jacobheun merged commit b70ccf2 into master Nov 19, 2020
@jacobheun jacobheun deleted the rfc/address-records branch November 19, 2020 14:05
mxinden added a commit to libp2p/rust-libp2p that referenced this pull request Jan 7, 2021
This commit upgrades the current gossipsub implementation to support the [v1.1
spec](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md).

It adds a number of features, bug fixes and performance improvements. 

Besides support for all new 1.1 features, other improvements that are of particular note: 

- Improved duplicate LRU-time cache (this was previously a severe bottleneck for
  large message throughput topics)
- Extended message validation configuration options
- Arbitrary topics (users can now implement their own hashing schemes)
- Improved message validation handling - Invalid messages are no longer dropped
  but sent to the behaviour for application-level processing (including scoring)
- Support for floodsub, gossipsub v1 and gossipsub v2
- Protobuf encoding has been shifted into the behaviour. This has permitted two
  improvements:
     1. Message size verification during publishing (report to the user if the
        message is too large before attempting to send).
     2. Message fragmentation. If an RPC is too large it is fragmented into its
        sub components and sent in smaller chunks.

Additional Notes

The peer eXchange protocol defined in the v1.1 spec is inactive in its current
form. The current implementation permits sending `PeerId` in `PRUNE` messages,
however a `PeerId` is not sufficient to form a new connection to a peer. A
`Signed Address Record` is required to safely transmit peer identity
information. Once these are confirmed (libp2p/specs#217)
a future PR will implement these and make PX usable.

Co-authored-by: Max Inden <[email protected]>
Co-authored-by: Rüdiger Klaehn <[email protected]>
Co-authored-by: blacktemplar <[email protected]>
Co-authored-by: Rüdiger Klaehn <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Roman S. Borschel <[email protected]>
Co-authored-by: Roman Borschel <[email protected]>
Co-authored-by: David Craven <[email protected]>
santos227 pushed a commit to santos227/rustlib that referenced this pull request Jun 20, 2022
This commit upgrades the current gossipsub implementation to support the [v1.1
spec](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md).

It adds a number of features, bug fixes and performance improvements. 

Besides support for all new 1.1 features, other improvements that are of particular note: 

- Improved duplicate LRU-time cache (this was previously a severe bottleneck for
  large message throughput topics)
- Extended message validation configuration options
- Arbitrary topics (users can now implement their own hashing schemes)
- Improved message validation handling - Invalid messages are no longer dropped
  but sent to the behaviour for application-level processing (including scoring)
- Support for floodsub, gossipsub v1 and gossipsub v2
- Protobuf encoding has been shifted into the behaviour. This has permitted two
  improvements:
     1. Message size verification during publishing (report to the user if the
        message is too large before attempting to send).
     2. Message fragmentation. If an RPC is too large it is fragmented into its
        sub components and sent in smaller chunks.

Additional Notes

The peer eXchange protocol defined in the v1.1 spec is inactive in its current
form. The current implementation permits sending `PeerId` in `PRUNE` messages,
however a `PeerId` is not sufficient to form a new connection to a peer. A
`Signed Address Record` is required to safely transmit peer identity
information. Once these are confirmed (libp2p/specs#217)
a future PR will implement these and make PX usable.

Co-authored-by: Max Inden <[email protected]>
Co-authored-by: Rüdiger Klaehn <[email protected]>
Co-authored-by: blacktemplar <[email protected]>
Co-authored-by: Rüdiger Klaehn <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Roman S. Borschel <[email protected]>
Co-authored-by: Roman Borschel <[email protected]>
Co-authored-by: David Craven <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants