Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(Identify): add a behavior event that reports the list of locally supported protocols whenever that changes #4332

Closed
dhuseby opened this issue Aug 15, 2023 · 8 comments · Fixed by #4527
Assignees

Comments

@dhuseby
Copy link
Contributor

dhuseby commented Aug 15, 2023

Description

Modify the behavior/handler to emit a behavior event with the new set of local protocols whenever it changes.

Motivation

Some behaviors--most notably the kademlia behavior--only adds its protocol to the local set when it gets into the correct state. Having the protocol list change be reported via an event would make it simple to not only report the protocols the peer is currently supporting but also to track/confirm state changes in other behaviors.

Current Implementation

Currently, the identify behavior only logs and pushes whenever the local_supported_protocols set changes.

Are you planning to do it yourself in a pull request?

Maybe.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Aug 16, 2023

Thanks for opening this issue!

There are two things we need to consider:

  • The list of supported protocols is scoped to a connection to a specific peer, it is not a global thing.
  • I am not sure this should be tied to identify because it technically doesn't have anything to do with the protocol.

But it is difficult to make this a SwarmEvent because we are planning to do #4268 and the protocols are as I said above scoped to a connection, not the Swarm. It would be weird to report a change in protocols to each behaviour in addition to each connection handler.

You could make a custom behaviour that forwards these events from the connection to the user but I am guessing you wanted this out of the box.

Perhaps we should extend the Sent and Pushed variants of Event to carry the information we sent? That way, diffing of protocol changes with a peer can be implemented on top?

@thomaseizinger
Copy link
Contributor

Perhaps we should extend the Sent and Pushed variants of Event to carry the information we sent? That way, diffing of protocol changes with a peer can be implemented on top?

This doesn't necessarily help newcomers though because they'd need to understand that they need to compose something here.

Perhaps a dedicated behaviour is better because we can make that one specific to diffing the list of protocols directly which makes that more discoverable?

@dhuseby
Copy link
Contributor Author

dhuseby commented Aug 17, 2023

First of all I found it weird that a developer...(ehem)...

protocols/identify/src 󰇙  master 󰇙  v1.71.1 ❯ git blame -L 369,371 handler.rs
137443b9d (Thomas Eizinger 2023-05-25 03:09:35 +0200 369)                 let after = log::log_enabled!(Level::Debug)
137443b9d (Thomas Eizinger 2023-05-25 03:09:35 +0200 370)                     .then(|| self.local_protocols_to_string())
137443b9d (Thomas Eizinger 2023-05-25 03:09:35 +0200 371)                     .unwrap_or_default();

...thought it was useful to have a debug message when the local set of supported protocols changed but not emit an event.

It looks like this could easily be reported as a behavior event with the peer id, added/removed, and maybe the connection id. It could potentially be useful information for a connection manager type bit of code. Just like the Ping behavior is designed to support a connection manager, why not Identify?

@dhuseby
Copy link
Contributor Author

dhuseby commented Aug 17, 2023

So something weird is going on that I think is a bug. Now that I have a behavior event, my test peer and identify client shows some strange behavior. What's happening is I run my test peer and then I run my identify client which is just an app that dials the test peer, runs identify and then dumps the protocols the peer reports. My test peer has keep_alive, ping, identify (w/ local protocols changed event), and kademlia behaviors. It starts up and does nothing but list. I doesn't dial nor do a kademlia.bootstrap().

The first time I run the identify client, I get the following output:

fleyg 󰇙  main! 󰇙  0.1.0 󰇙  v1.71.1 ❯ cargo run --bin ident -- --addr /ip4/127.0.0.1/tcp/4920
    Finished dev [unoptimized + debuginfo] target(s) in 0.08s
     Running `target/debug/ident --addr /ip4/127.0.0.1/tcp/4920`
[2023-08-17T03:17:03Z INFO  ident] Local peer id: 12D3KooWQaPj8yyNzcMCMmwSRhgxKAvB8rSJwWxwhii8cViwPQWe
[2023-08-17T03:17:03Z INFO  ident] Dialed via addr /ip4/127.0.0.1/tcp/4920
[2023-08-17T03:17:03Z INFO  ident] Protocols changed: 12D3KooWHxzP5QEbgDntaRtGyHxnFoxuG9JmscrUAVwgdVrcrwX2
[2023-08-17T03:17:03Z INFO  ident] 	/ipfs/id/1.0.0
[2023-08-17T03:17:03Z INFO  ident] 	/ipfs/id/push/1.0.0
[2023-08-17T03:17:03Z INFO  ident] Identify Sent: 12D3KooWHxzP5QEbgDntaRtGyHxnFoxuG9JmscrUAVwgdVrcrwX2
[2023-08-17T03:17:03Z INFO  ident] Identify Received: 12D3KooWHxzP5QEbgDntaRtGyHxnFoxuG9JmscrUAVwgdVrcrwX2
[2023-08-17T03:17:03Z INFO  ident] 	Protocol: ipfs/0.1.0
[2023-08-17T03:17:03Z INFO  ident] 	Agent: fleyg/0.0.1
[2023-08-17T03:17:03Z INFO  ident] 	Addr: /ip4/127.0.0.1/tcp/59014
[2023-08-17T03:17:03Z INFO  ident] 	Protocols:
[2023-08-17T03:17:03Z INFO  ident] 		/ipfs/id/1.0.0
[2023-08-17T03:17:03Z INFO  ident] 		/ipfs/id/push/1.0.0
[2023-08-17T03:17:03Z INFO  ident] 		/ipfs/ping/1.0.0

You can see that it reports my test peer advertises three protocols: /ipfs/id/1.0.0, /ipfs/id/push/1.0.0, and /ipfs/ping/1.0.0. That's normal because my test peer is running behind a firewall. But when I run the ident peer again I get a different result:

fleyg 󰇙  main! 󰇙  0.1.0 󰇙  v1.71.1 ❯ cargo run --bin ident -- --addr /ip4/127.0.0.1/tcp/4920
    Finished dev [unoptimized + debuginfo] target(s) in 0.11s
     Running `target/debug/ident --addr /ip4/127.0.0.1/tcp/4920`
[2023-08-17T03:17:11Z INFO  ident] Local peer id: 12D3KooWM6erPWAdk1cPvU7FxmbsAGyLW3uBV2HNpTczp3himkWV
[2023-08-17T03:17:11Z INFO  ident] Dialed via addr /ip4/127.0.0.1/tcp/4920
[2023-08-17T03:17:11Z INFO  ident] Protocols changed: 12D3KooWHxzP5QEbgDntaRtGyHxnFoxuG9JmscrUAVwgdVrcrwX2
[2023-08-17T03:17:11Z INFO  ident] 	/ipfs/id/1.0.0
[2023-08-17T03:17:11Z INFO  ident] 	/ipfs/id/push/1.0.0
[2023-08-17T03:17:11Z INFO  ident] Identify Sent: 12D3KooWHxzP5QEbgDntaRtGyHxnFoxuG9JmscrUAVwgdVrcrwX2
[2023-08-17T03:17:11Z INFO  ident] Identify Received: 12D3KooWHxzP5QEbgDntaRtGyHxnFoxuG9JmscrUAVwgdVrcrwX2
[2023-08-17T03:17:11Z INFO  ident] 	Protocol: ipfs/0.1.0
[2023-08-17T03:17:11Z INFO  ident] 	Agent: fleyg/0.0.1
[2023-08-17T03:17:11Z INFO  ident] 	Addr: /ip4/127.0.0.1/tcp/53818
[2023-08-17T03:17:11Z INFO  ident] 	Protocols:
[2023-08-17T03:17:11Z INFO  ident] 		/ipfs/ping/1.0.0
[2023-08-17T03:17:11Z INFO  ident] 		/ipfs/id/1.0.0
[2023-08-17T03:17:11Z INFO  ident] 		/ipfs/id/push/1.0.0
[2023-08-17T03:17:11Z INFO  ident] 		/ipfs/kad/1.0.0

Now it is reporting that my test peer also advertises /ipfs/kad/1.0.0. The output from the test peer looks like this:

fleyg 󰇙  main! 󰇙  0.1.0 󰇙  v1.71.1 ❯ cargo run --bin fleyg
    Finished dev [unoptimized + debuginfo] target(s) in 0.20s
     Running `target/debug/fleyg`
[2023-08-17T03:16:44Z INFO  fleyg] Local peer id: 12D3KooWHxzP5QEbgDntaRtGyHxnFoxuG9JmscrUAVwgdVrcrwX2
[2023-08-17T03:16:44Z INFO  fleyg] Kademlia protocol: /ipfs/kad/1.0.0
[2023-08-17T03:17:03Z INFO  fleyg] Advertising to: 12D3KooWQaPj8yyNzcMCMmwSRhgxKAvB8rSJwWxwhii8cViwPQWe
[2023-08-17T03:17:03Z INFO  fleyg] 	/ipfs/id/1.0.0
[2023-08-17T03:17:03Z INFO  fleyg] 	/ipfs/id/push/1.0.0
[2023-08-17T03:17:03Z INFO  fleyg] 	/ipfs/ping/1.0.0
[2023-08-17T03:17:11Z INFO  fleyg] Advertising to: 12D3KooWM6erPWAdk1cPvU7FxmbsAGyLW3uBV2HNpTczp3himkWV
[2023-08-17T03:17:11Z INFO  fleyg] 	/ipfs/ping/1.0.0
[2023-08-17T03:17:11Z INFO  fleyg] 	/ipfs/id/1.0.0
[2023-08-17T03:17:11Z INFO  fleyg] 	/ipfs/id/push/1.0.0
[2023-08-17T03:17:11Z INFO  fleyg] 	/ipfs/kad/1.0.0

You can clearly see that the first connection results in a LocalProtocolsChanged behavior event saying that the local protocols changed from none to the three protocols reported in the first connection. Then the second connection results in another behavior event on the new connection that includes the kademlia DHT protocol. Weird. Something about connecting to the peer causes the kademlia behavior to change states. I bet if I add behavior events for the state changes in kademlia that I'll see it change. I think this is a bug...maybe?

@mxinden
Copy link
Member

mxinden commented Aug 17, 2023

You can clearly see that the first connection results in a LocalProtocolsChanged behavior event saying that the local protocols changed from none to the three protocols reported in the first connection. Then the second connection results in another behavior event on the new connection that includes the kademlia DHT protocol. Weird. Something about connecting to the peer causes the kademlia behavior to change states. I bet if I add behavior events for the state changes in kademlia that I'll see it change. I think this is a bug...maybe?

libp2p-kad likely changes from Mode::Client to Mode::Server. Are you calling Swarm::add_external_address based on incoming libp2p-identify events? If so, that tells libp2p-kad that it is publicly reachable and thus it switches from Mode::Client to Mode::Server and thus advertises the /ipfs/kad/1.0.0 protocol.

Also, to rule out the above, you can simply set the Mode at start-up time:

swarm
.behaviour_mut()
.kademlia
.set_mode(Some(kad::Mode::Server));

If the above is not the case, mind sharing your code? That would make remote debugging a bit easier @dhuseby.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Aug 17, 2023

...thought it was useful to have a debug message when the local set of supported protocols changed but not emit an event.

The main information we are logging is that we are pushing those changes to the remote node, not that the protocols changed. See #3613 and linked issues for details :)

I'll respond on the PR to the particular feature request.

@dhuseby
Copy link
Contributor Author

dhuseby commented Aug 17, 2023

libp2p-kad likely changes from Mode::Client to Mode::Server. Are you calling Swarm::add_external_address based on incoming libp2p-identify events? If so, that tells libp2p-kad that it is publicly reachable and thus it switches from Mode::Client to Mode::Server and thus advertises the /ipfs/kad/1.0.0 protocol.

You're right, that is what is happening, however I'm erroneously adding a loop back address as as the observed address. That's incorrect. The peer and the test client are both running locally. What strategy do production servers use to avoid adding non-routable addresses as swarm external addresses? It seems like relying entirely on the identify response is error prone as illustrated in this example.

https://github.com/dhuseby/fleyg/blob/517829396674ef87051180baf62404fc2d7e349b/src/bin/fleyg.rs#L130-L141

The output confirms your hunch:

fleyg 󰇙  main! 󰇙  0.1.0 󰇙  v1.71.1 ❯ cargo run --bin fleyg
   Compiling fleyg v0.1.0 (/home/user/Projects/pl/fleyg)
    Finished dev [unoptimized + debuginfo] target(s) in 12.34s
     Running `target/debug/fleyg`
[2023-08-17T23:15:02Z INFO  fleyg] Local peer id: 12D3KooWAoTn5gkMYkeXDnWqf2Y4N4Raw1Eg8BRxddo43wzJds6E
[2023-08-17T23:15:02Z INFO  fleyg] Kademlia protocol: /ipfs/kad/1.0.0
[2023-08-17T23:15:13Z INFO  fleyg] Advertising to: 12D3KooWL4riXPuN2WPV9z1VbiW9G4tqUvyS6sRf95ZcWRbMtf4j
[2023-08-17T23:15:13Z INFO  fleyg] 	/ipfs/id/1.0.0
[2023-08-17T23:15:13Z INFO  fleyg] 	/ipfs/ping/1.0.0
[2023-08-17T23:15:13Z INFO  fleyg] 	/ipfs/id/push/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] Advertising to: 12D3KooWAY98uWmXW8CGkCcdTrPnNaP1eKSPFjyw2MQ5ykTCYrAL
[2023-08-17T23:15:18Z INFO  fleyg] 	/ipfs/ping/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] 	/ipfs/id/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] 	/ipfs/id/push/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] Identify Received: 12D3KooWAY98uWmXW8CGkCcdTrPnNaP1eKSPFjyw2MQ5ykTCYrAL
[2023-08-17T23:15:18Z INFO  fleyg] 	Protocol: ipfs/0.1.0
[2023-08-17T23:15:18Z INFO  fleyg] 	Agent: ident/0.0.1
[2023-08-17T23:15:18Z INFO  fleyg] 	Addr: /ip4/127.0.0.1/tcp/4920
[2023-08-17T23:15:18Z INFO  fleyg] 	Protocols:
[2023-08-17T23:15:18Z INFO  fleyg] 		/ipfs/id/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] 		/ipfs/id/push/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] Advertising to: 12D3KooWAY98uWmXW8CGkCcdTrPnNaP1eKSPFjyw2MQ5ykTCYrAL
[2023-08-17T23:15:18Z INFO  fleyg] 	/ipfs/ping/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] 	/ipfs/kad/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] 	/ipfs/id/1.0.0
[2023-08-17T23:15:18Z INFO  fleyg] 	/ipfs/id/push/1.0.0

If the above is not the case, mind sharing your code? That would make remote debugging a bit easier @dhuseby.

Sure, all of my code is here: https://github.com/dhuseby/fleyg

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Aug 18, 2023

It seems like relying entirely on the identify response is error prone as illustrated in this example.

Correct. That is why we changed that behaviour in the last release and no longer do that automatically.

What strategy do production servers use to avoid adding non-routable addresses as swarm external addresses?

AutoNAT solves this problem. Unfortunately, AutoNATv1 has certain issues which is why AutoNATv2 is in the works. Additionally, our AutoNATv1 implementation has some issues when it used together with port reuse. See #3889.

In preparation for AutoNATv2, we need to rework the Transport::dial interface which should fix this issue, see #4226.

In the meantime, you have two options:

  • If your node is publicly reachable, manually configure the correct IP address.
  • Manually set Mode::Server if you know that your node is publicly reachable because you are forwarding a port somewhere in your router.

@mergify mergify bot closed this as completed in #4527 Oct 20, 2023
mergify bot pushed a commit that referenced this issue Oct 20, 2023
Starting in #3980, we make more use of the identify push protocol by actively notifying the peer when our locally supported protocols change. We emit an event (`Pushed`) to the user in that case. This event however does not include **what** we pushed.

Users might be interested in the internal changes that we push to remote peers. This patch adds the identify `Info` to the `Pushed` variant.

Fixes #4332.

Pull-Request: #4527.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment