-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't poll network unnecessarily. #1977
Don't poll network unnecessarily. #1977
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as I understand it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. To summarise in my own words after a first review, this PR does the following:
-
It removes
service.rs
and the separateMdnsService
, collapsing the (simplified) code into theMdns
behaviour while just moving other query types fromservice.rs
to the newquery.rs
module. -
It makes the MDNS record TTL for response records configurable, with a default of 5 minutes as was previously fixed in a constant.
-
It makes the interval at which multicast queries are sent out configurable with a default of 5 minutes, as opposed to the current frequency of 20 seconds, thereby additionally sending out a query whenever we join a new multicast group on some network interface to ensure timely discovery in most cases despite the long query interval. Lost queries are effectively compensated for by that regular query interval.
Does that sound about right as a summary?
I left some comments that probably need to be resolved but the direction looks good to me. I would prefer though if we keep the default query interval lower, say at 1 minute, documenting the fact that the larger your network the larger you may want to configure the query interval at the increased risk of delayed discoveries due to lost datagrams, but at the gain of less MDNS network traffic.
This PR massively reduces the mdns bandwidth requirements.
It would be great for the record if you could put absolute numbers on these gains with these changes as observed on some networks, i.e. a rough before and after comparison.
Well I'd actually make the default to never query the network (at an interval). It is probably not required at all. Once we deploy this in production we can play with the settings but I suspect we could make the interval much larger. The only reason why it may be needed is because udp is an unreliable transport so packets may be dropped without noticing it. I think the 20s interval was required before mainly because we didn't have if-watch and were therefore not notified of network changes. |
I think so, too, and what is done here seems certainly more in the spirit of the libp2p MDNS spec, which states "When a peer starts (or detects a network change), it sends a query for all peers.". Nevertheless we currently have the notion of TTLs of the discovered peer records and that is why I would prefer to keep the query interval shorter than the default TTL - to avoid intermittent "expired" events that are immediately followed again by "discovered" events. Since the default TTL is 5 minutes, how about 4 minutes then for the default query TTL? Of course, since the TTL of a record is specified by the remote that sent it, if nodes are configured differently the local query interval may still be larger than the received record TTL. If we feel like it, we could dynamically update the interval based on the shortest TTL received that did not yet expire, but I think that goes beyond the scope of this PR and we can leave that for another time, if there is interest. |
My personal opinion is that ttl's should be ignored. The application should store all known addresses and discard them if a dial failure occurs. If the dial failure is temporarily, the address will be rediscovered at a later time. There is a need for most applications to have an address book of some sort (substrate implements it's own). I think we could consider adding a general behaviour for that to rust-libp2p that does the right thing for most cases. [0] is what ipfs-embed currently does, and users of ipfs-embed can add additional discovery mechanisms based on the gossipsub api (if you aren't using a dht you need a mechanism for peers to tell you about peers on their local subnet). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It removes service.rs and the separate MdnsService, collapsing the (simplified) code into the Mdns behaviour while just moving other query types from service.rs to the new query.rs module.
I am in favor of merging the two.
protocols/mdns/src/behaviour.rs
Outdated
@@ -107,6 +164,77 @@ impl Mdns { | |||
pub fn discovered_nodes(&self) -> impl ExactSizeIterator<Item = &PeerId> { | |||
self.discovered_nodes.iter().map(|(p, _, _)| p) | |||
} | |||
|
|||
fn inject_mdns_packet(&mut self, packet: MdnsPacket, params: &impl PollParameters) { | |||
self.timeout.set_interval(self.interval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say there is one very noisy node on the network broadcasting an mdns packet every 1 minute. With a timeout of e.g. 5 minutes configured locally, the timeout would never fire, thus the local node would never broadcast a query and thus addresses from other nodes would expire locally, correct?
In case I am not mistaken with the assumption above, I would see two ways forward:
- Do not reset the timeout, always sending out a query at each interval.
- Remove the notion of TTLs for address as suggested by David.
I have yet to put more thoughts into this, thus please feel free to ignore the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite accurate. If a peer sends a query, all other peers respond with their addresses to the multicast address. So you really only need one peer to make the query, as everyone will get the updates from everyone. This means that they will not expire locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not reset the timeout for now and leave the removal of the TTL to another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the timeout is smaller than the ttl, it should still be safe to reset the timeout. as I said it uses multicast so everyone gets all queries and all responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying @dvc94ch!
That may be a good thing to do, I'd just prefer if we leave it to a possible follow-up PR, to not expand the scope here too much and to make reviewing of follow-up changes that remove the TTL easier. From my perspective as long as this PR keeps the query interval a bit lower than the TTL, it is already good to go. |
Currently we have customers with >100 nodes in their local network. We received reports that mdns was responsible for 25% to 75% of network traffic. As you can imagine our customers weren't very happy about that. This PR massively reduces the mdns bandwidth requirements.
Mdns needs two sockets, a send socket and a receive socket. The receive socket is listening on the mdns broadcast address. When you join a network and receive an
IfEvent::Up
event the send socket is used to send an mdns query to the broadcast address. Everyone listening on the broadcast address responds with their mdns records, including yourself, to the broadcast address, so everyone gets a fresh view of the network and the existing peers learn about your records.To avoid the case where you join a network and your initial discovery message was lost I readded a timer that by default will send a mdns query to the broadcast address if there has been no incoming mdns queries in the last five minutes. This timeout is configurable.
There is still one remaining issue and that is makingif-watch
pollable using manual futures. But that should minimally effect this PR other than requiring a version bump.Companion PR libp2p/if-watch#7