Kademlia: Refactor iterative queries. #1154

romanb · 2019-06-02T09:41:30Z

This is a suggested refactoring of iterative queries (query.rs). It addresses the following issues:

Queries no longer terminate prematurely due to counting results from peers farther from the target while results from closer peers are still pending. (Kademlia query termination isn't properly implemented #1105).
Queries no longer ignore reported closer peers that are not duplicates just because they are currently not among the num_results closest. The currently num_results closest may contain peers marked as failed or pending / waiting. Hence all reported closer peers that are not duplicates must be considered candidates that may still end up among the num_results closest that successfully responded.
Bounded parallelism based on the active_counter was not working correctly, as new (not yet contacted) peers closer to the target may be discovered at any time and thus appear in closer_peers before the already active / pending peers.
The Frozen query mechanism allowed all remaining not-yet contacted peers to be contacted, but their results were discarded, because inject_rpc_result would only incorporate results while the query is Iterating. Moreover, the Frozen state was a terminal state. The Frozen state has been reworked into a Stalled state that implements a slightly more permissive variant of the following from the paper / specs:

If a round of FIND_NODEs fails to return a node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the k closest nodes it has not already queried.

Importantly, though not explicitly mentioned, the query can move back to Iterating if it makes further progress again as a result of these requests. The Stalled state thus allows (temporarily) higher parallelism in an effort to make progress and bring the query to an end.

Other Changes

Repeated distance calculations between the same peers and the target is avoided.
Enabled by Kademlia: Correct XOR metric. #1108, a more appropriate data structure (BTreeMap keyed by Distance) for the incrementally updated list of closer peers is used. Besides resulting in a simpler implementation, it yields better performance for queries in larger DHTs. The data structure needs efficient lookups (to avoid duplicates) and insertions at any position, both of which larger(r) vectors are not that good at. Some unscientific comparisons showed a ~40-60% improvement in somewhat pathological scenarios with at least 20 healthy nodes, each returning a distinct list of closer 20 peers to the requestor. A previous assumption may have been that the vector always stays very small, but that is not the case in larger DHTs: Even if the lists of closer peers reported by the 20 contacted peers are heavily overlapping, typically a lot more than 20 peers have to be (at least temporarily) considered as candidates for the closest peers until the query completes. See also issue (2) above. Even in the case where the list stays close to 20 peers throughout, it still requires a lookup for every single reported closer peer to avoid duplicates.
Cancellation of still pending RPCs upon timeout has been removed. Besides only canceling RPCs that have not yet been sent - a rare case after the timeout elapsed - the implementation actually removed all pending RPCs for a peer, regardless of the query they belong(ed) to. Meaning that a timeout in one query could cancel a pending RPC of a concurrent, more recently started query to the same peer. It is according to the Kademlia paper / specs that a peer that does not respond will not stop the query from terminating, but the query still accepts / incorporates delayed results that come in while the query is still running. This has been realised by further distinguishing between QueryPeerState::Failed (permanent) and QueryPeerState::Unresponsive (timeout).

New Tests

New tests have been added for:

Query termination conditions.
Bounded parallelism.
Absence of duplicates.

tomaka · 2019-06-03T09:56:05Z

What is the relationship between this PR and #1144?
Am I right to think that #1144 should go in first?

romanb · 2019-06-03T10:07:04Z

What is the relationship between this PR and #1144?
Am I right to think that #1144 should go in first?

Yes, that was my intention.

montekki · 2019-06-03T10:08:36Z

@tomaka @romanb I think that at least #1155 should go in before #1144 , shouldn't it?

romanb · 2019-06-03T10:21:24Z

@tomaka @romanb I think that at least #1155 should go in before #1144 , shouldn't it?

Of course, that's why I extracted it into a separate PR.

AgeManning · 2019-06-06T01:10:31Z

protocols/kad/src/query.rs

+
+    /// A successful result from the peer has been delivered.
+    ///
+    /// This is a final state, reached as a result of a call to `on_sucess`.


on_sucess -> on_success

Refactoring of iterative queries (`query.rs`) to improve both correctness and performance (for larger DHTs): Correctness: 1. Queries no longer terminate prematurely due to counting results from peers farther from the target while results from closer peers are still pending. (libp2p#1105). 2. Queries no longer ignore reported closer peers that are not duplicates just because they are currently not among the `num_results` closest. The currently `max_results` closest may contain peers marked as failed or pending / waiting. Hence all reported closer peers that are not duplicates must be considered candidates that may still end up among the `num_results` closest that successfully responded. 3. Bounded parallelism based on the `active_counter` was not working correctly, as new (not yet contacted) peers closer to the target may be discovered at any time and thus appear in `closer_peers` before the already active / pending peers. 4. The `Frozen` query mechanism allowed all remaining not-yet contacted peers to be contacted, but their results were discarded, because `inject_rpc_result` would only incorporate results while the query is `Iterating`. The `Frozen` state has been reworked into a `Stalled` state that implements a slightly more permissive variant of the following from the paper / specs: "If a round of FIND_NODEs fails to return a node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the k closest nodes it has not already queried.". Importantly, though not explicitly mentioned, the query can move back to `Iterating` if it makes further progress again as a result of these requests. The `Stalled` state thus allows (temporarily) higher parallelism in an effort to make progress and bring the query to an end. Performance: 1. Repeated distance calculations between the same peers and the target is avoided. 2. Enabled by libp2p#1108, use of a more appropriate data structure (`BTreeMap`) for the incrementally updated list of closer peers. The data structure needs efficient lookups (to avoid duplicates) and insertions at any position, both of which large(r) vectors are not that good at. Unscientific benchmarks showed a ~40-60% improvement in somewhat pathological scenarios with at least 20 healthy nodes, each possibly returning a distinct list of closer 20 peers to the requestor. A previous assumption may have been that the vector always stays very small, but that is not the case in larger clusters: Even if the lists of closer peers reported by the 20 contacted peers are heavily overlapping, typically a lot more than 20 peers have to be (at least temporarily) considered as closest peers until the query completes. See also issue (2) above. New tests are added for: * Query termination conditions. * Bounded parallelism. * Absence of duplicates.

tomaka

I wanted to looked more into details, but keep delaying this.
I took a glance a while ago, and looked good.

This was referenced Jun 2, 2019

Kademlia: Fix premature query termination. #1155

Merged

Kademlia: Send all pending RPCs for a peer on established connection. #1156

Merged

romanb force-pushed the kad-query-termpar branch 2 times, most recently from 5d6d72b to c392787 Compare June 4, 2019 15:18

romanb mentioned this pull request Jun 5, 2019

Kad query continues beyond sufficient results? #1163

Closed

AgeManning reviewed Jun 6, 2019

View reviewed changes

romanb force-pushed the kad-query-termpar branch from a1cb248 to 4512ada Compare June 7, 2019 19:40

romanb force-pushed the kad-query-termpar branch from 4512ada to 271b496 Compare June 13, 2019 10:15

This was referenced Jun 13, 2019

Kademlia: Address some TODOs - Refactoring - API updates. romanb/rust-libp2p#2

Closed

Kademlia: Address some TODOs - Refactoring - API updates. #1174

Merged

tomaka approved these changes Jun 20, 2019

View reviewed changes

Merge branch 'master' into kad-query-termpar

7fc00ad

romanb merged commit 69bd0df into libp2p:master Jun 20, 2019

romanb deleted the kad-query-termpar branch June 20, 2019 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kademlia: Refactor iterative queries. #1154

Kademlia: Refactor iterative queries. #1154

romanb commented Jun 2, 2019

tomaka commented Jun 3, 2019 •

edited by romanb

Loading

romanb commented Jun 3, 2019

montekki commented Jun 3, 2019

romanb commented Jun 3, 2019

AgeManning Jun 6, 2019

tomaka left a comment

Kademlia: Refactor iterative queries. #1154

Kademlia: Refactor iterative queries. #1154

Conversation

romanb commented Jun 2, 2019

Other Changes

New Tests

tomaka commented Jun 3, 2019 • edited by romanb Loading

romanb commented Jun 3, 2019

montekki commented Jun 3, 2019

romanb commented Jun 3, 2019

AgeManning Jun 6, 2019

Choose a reason for hiding this comment

tomaka left a comment

Choose a reason for hiding this comment

tomaka commented Jun 3, 2019 •

edited by romanb

Loading