-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(kad): add refresh_interval
config used to poll bootstrap
#4838
Conversation
Kademlia::bootstrap
Kademlia::bootstrap
Kademlia::bootstrap
refresh_interval
config used to poll Kademlia::bootstrap
refresh_interval
config used to poll Kademlia::bootstrap
refresh_interval
config used to poll bootstrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on! I've left some feedback :)
Co-authored-by: Thomas Eizinger <[email protected]>
Co-authored-by: Thomas Eizinger <[email protected]>
Co-authored-by: Thomas Eizinger <[email protected]>
Looks like a good start to me. Correct me if im wrong but should we only poll the timer if |
Why? In order to maintain a healthy routing table, Something that we'd have to think about is: When a node starts up, it doesn't have any connections yet. Thus, we should probably delay the very first bootstrap query until we have a connection to another DHT peer at which point we should instantly bootstrap and only perform it in an interval going forward. If we naively call it on an interval, we will execute it straight at startup where it will fail because there are no peers, resulting in a delay of 5 minutes until we will actually bootstrap. |
cc @mxinden |
I do wonder if it should be done automatically instead of waiting for it to be called, given that the spec does mention doing it at the start, so maybe it could be made a configurable option, although the developer loses out on flexibility on making that call themselves. Regardless, in its current state, a successful call to |
Those are good points. How about the following:
For now, I'd leave the current |
Today users forget to (a) call bootstrapp at startup and (b) call bootstrap continuously thereafter. In my eyes in most cases one should do both (a) and (b). Thus I believe it should be the default behavior and the goal of #4730.
That would again require users to call a method in the default case which they will likely forget. How about triggering the first bootstrap once the first node is added?
I am fine with leaving the |
I will emit that I would be guilty of (b), though for (a), the only times I find myself not bootstrapping is when I wish to only query peers added to the routing table manually. I take it that that (a) should be done when one peer is added regardless and not wait around until a later point? Are there any implications from not bootstrapping in the first place even when its only one peer added (with that peer not containing any other peers in its table)? |
To quote @mxinden from an earlier conversation: A bootstrap is just a query. If you want to fire a query asap on startup, fire it. A bootstrap just fills up the buckets to make future queries faster. If you already have a query you want the result for, running it will give you a quicker results than first waiting for the bootstrap! |
Hi everyone. Sorry to barge in 3 days later, but we just saw this PR and might have some interesting thoughts to share. We totally agree with the initial issue that it would be a great thing for the end user if the Kademlia behaviour handled bootstrapping on its own. (Why not also call it However, we think it might be a good idea to have a system a little bit more evolved than just bootstrapping based on an interval. Moreover, there is already in the mDNS behaviour, a system allowing to create many packets at first and gradually reaching a cruise interval ( We did and are currently using a specific development to handle bootstrap in a more evolved manner, so we would be more than happy to help on this subject, and why not upstreaming it into the Our bootstrap is composed of several "components" and we are not saying that everything should be integrated in the
If you find this interesting, I will gladly elaborate and share code samples. |
Thank you for the input @stormshield-frb ! Whilst interesting, I am not sure we need an elaborate algorithm like this as a default. If we keep the current This follows the mantra of: "make the easy things easy, make the hard things possible". Would that work for everybody? |
Hi @thomaseizinger, you're welcome 😉 Glad you found it interesting. Indeed, it might be too much by default, and I do agree with your mantra. However, quoting you from an earlier comment:
Doing something like it is currently done in For the metric part (number of
As @mxinden said, we would gladly appreciate it if the current |
Yes we definitely need to handle that. I think to start with, it is probably easiest to have a simple state variable that tracks, whether we've already successfully run at least 1 bootstrap. Upon every new connection, we can then initiate a bootstrap if we haven't successfully bootstrapped yet (we should also track whether there is currently one in progress). It could be that the first peer we connect to does not support kademlia so we should not give up after the first peer. Something like: enum InitialBootstrap {
Active(QueryId),
Completed
} and: initial_bootstrap: Option<InitialBootstrap> If the bootstrap fails, we set it back to What do people think? |
Absolutely. This is a good idea. We actually do that in our code base. However, it raises the question : what should we do if the first bootstrap fail ? Should be try an other bootstrap right after ? Some time after ? What if this one fails too ? In my opinion, this situation conforms precisely to the recommended practices of using exponential backoff. What do you think ? |
Hi @thomaseizinger ! Fyi, one more update was pushed thanks to @stormshield-frb . I also fixed conflicts, although some pipelines are not passing due to master branch. |
Awesome, I'll take a look sometime later this week! |
This pull request has merge conflicts. Could you please resolve them @PanGan21? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work, thank you for adding all these tests and sorry that this has been dragging on for so long.
Just one request regarding the version and changelog entry, otherwise ACK.
@jxs Feel free to approve and merge this once the version and changelog has been corrected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the ping Thomas, LGTM only one minor doc remark
Thanks Panagiotis!
Approvals have been dismissed because the PR was updated after the send-it
label was applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@thomaseizinger @stormshield-frb |
Previously, users were responsible for calling `bootstrap` on an interval. This was documented but hard to discover for people new to the library. To maintain healthy routing tables, it is advised to regularly call `bootstrap`. By default, we will now do this automatically every 5 minutes and once we add a peer to our routing table, assuming we didn't bootstrap yet. This is especially useful as part of nodes starting up and connecting to bootstrap nodes. Closes: #4730. Pull-Request: #4838. Co-authored-by: stormshield-frb <[email protected]>
After testing `master`, we encountered a bug due to #4838 when doing automatic or periodic bootstrap if the node has no known peers. Since it failed immediately, I though there was no need to call the `bootstrap_status.on_started` method. But not doing so never resets the periodic timer inside `bootstrap_status` resulting in getting stuck to try to bootstrap every time `poll` is called on `kad::Behaviour`. Pull-Request: #5349.
As discussed in the last maintainer call, some improvements are probably necessary for the automatic bootstrap feature (introduced by #4838). Indeed, like @drHuangMHT mentioned in #5341 and like @guillaumemichel has agreed, triggering a bootstrap every time an update happens inside the routing table consumes a lot more resources. The idea behind the automatic bootstrap feature it that, when a peer is starting, if a routing table update happens we probably don't want to wait for the periodic bootstrap to trigger and we want to trigger it right now. However, like @guillaumemichel said, this is something we want to do at startup or when a network connectivity problem happens, we don't want to do that all the time. This PR is a proposal to trigger automatically a bootstrap on routing table update but only when we have less that `K_VALUE` peers in it (meaning that we are starting up or something went wrong and the fact that a new peer is inserted is probably a sign that the network connectivity issue is resolved). I have also added a new triggering condition like mentioned in the maintainer call. When discovering a new listen address and if we have no connected peers, we trigger a bootstrap. This condition is based on our own experience at Stormshield : some peers were starting before the network interfaces were up, doing so, the automatic and periodic bootstrap failed, but when the network interfaces were finally up, we were waiting X minutes for the periodic bootstrap to actually trigger a bootstrap and join the p2p network. Pull-Request: #5474.
After testing `master`, we encountered a bug due to libp2p#4838 when doing automatic or periodic bootstrap if the node has no known peers. Since it failed immediately, I though there was no need to call the `bootstrap_status.on_started` method. But not doing so never resets the periodic timer inside `bootstrap_status` resulting in getting stuck to try to bootstrap every time `poll` is called on `kad::Behaviour`. Pull-Request: libp2p#5349.
As discussed in the last maintainer call, some improvements are probably necessary for the automatic bootstrap feature (introduced by libp2p#4838). Indeed, like @drHuangMHT mentioned in libp2p#5341 and like @guillaumemichel has agreed, triggering a bootstrap every time an update happens inside the routing table consumes a lot more resources. The idea behind the automatic bootstrap feature it that, when a peer is starting, if a routing table update happens we probably don't want to wait for the periodic bootstrap to trigger and we want to trigger it right now. However, like @guillaumemichel said, this is something we want to do at startup or when a network connectivity problem happens, we don't want to do that all the time. This PR is a proposal to trigger automatically a bootstrap on routing table update but only when we have less that `K_VALUE` peers in it (meaning that we are starting up or something went wrong and the fact that a new peer is inserted is probably a sign that the network connectivity issue is resolved). I have also added a new triggering condition like mentioned in the maintainer call. When discovering a new listen address and if we have no connected peers, we trigger a bootstrap. This condition is based on our own experience at Stormshield : some peers were starting before the network interfaces were up, doing so, the automatic and periodic bootstrap failed, but when the network interfaces were finally up, we were waiting X minutes for the periodic bootstrap to actually trigger a bootstrap and join the p2p network. Pull-Request: libp2p#5474.
Description
Previously, users were responsible for calling
bootstrap
on an interval. This was documented but hard to discover for people new to the library. To maintain healthy routing tables, it is advised to regularly callbootstrap
. By default, we will now do this automatically every 5 minutes and once we add a peer to our routing table, assuming we didn't bootstrap yet. This is especially useful as part of nodes starting up and connecting to bootstrap nodes.Closes: #4730.
Attributions
Co-authored-by: stormshield-frb [email protected]
Notes & open questions
Change checklist