Kademlia query termination isn't properly implemented #1105

tomaka · 2019-05-06T09:30:11Z

A Kademlia query should terminate when we have found k entries that are responsive and didn't yield any closer entry.
From a quick look at the code, this seems to not be implemented correctly, and we terminate queries too soon.

Some context: #1101 (comment)

The text was updated successfully, but these errors were encountered:

Refactoring of iterative queries (`query.rs`) to improve both correctness and performance: Correctness: 1. Queries no longer terminate prematurely due to counting results from peers farther from the target while results from closer peers are still pending. (libp2p#1105). 2. Queries no longer ignore reported closer peers that are not duplicates just because they are currently not among the `num_results` closest. The currently `max_results` closest may contain peers marked as failed or pending / waiting. Hence all reported closer peers that are not duplicates must be considered candidates that may still end up among the `num_results` closest that successfully responded. 3. Bounded parallelism based on the `active_counter` was not working correctly, as new (not yet contacted) peers closer to the target may be discovered at any time and thus appear in `closer_peers` before the already active / pending peers. 4. The `Frozen` query mechanism allowed all remaining not-yet contacted peers to be contacted, but their results were discarded, because `inject_rpc_result` would only incorporate results while the query is `Iterating`. The `Frozen` state has been reworked into a `Stalled` state that implements a slightly more permissive variant of the following from the paper / specs: "If a round of FIND_NODEs fails to return a node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the k closest nodes it has not already queried.". Importantly, though not explicitly mentioned, the query can move back to `Iterating` if it makes further progress again as a result of these requests. The `Stalled` state thus allows (temporarily) higher parallelism in an effort to make progress and bring the query to an end. Performance: 1. Repeated distance calculations between the same peers and the target is avoided. 2. Enabled by libp2p#1108, use of a more appropriate data structure (`BTreeMap`) for the incrementally updated list of closer peers. The data structure needs efficient lookups (to avoid duplicates) and insertions at any position, both of which vectors are not that good at. Unscientific benchmarks showed a ~40-60% improvement in scenarios with at least 20 healthy nodes, each possibly returning a distinct list of closer 20 peers to the requestor. A previous assumption may have been that the vector always stays very small, but that is not the case in larger clusters: Even if the lists of closer peers reported by the 20 contacted peers are heavily overlapping, typically a lot more than 20 peers have to be (at least temporarily) considered as closest peers until the query completes. See also issue (2) above. New tests are added for: * Query termination conditions. * Bounded parallelism. * Absence of duplicates.

Refactoring of iterative queries (`query.rs`) to improve both correctness and performance (for larger DHTs): Correctness: 1. Queries no longer terminate prematurely due to counting results from peers farther from the target while results from closer peers are still pending. (libp2p#1105). 2. Queries no longer ignore reported closer peers that are not duplicates just because they are currently not among the `num_results` closest. The currently `max_results` closest may contain peers marked as failed or pending / waiting. Hence all reported closer peers that are not duplicates must be considered candidates that may still end up among the `num_results` closest that successfully responded. 3. Bounded parallelism based on the `active_counter` was not working correctly, as new (not yet contacted) peers closer to the target may be discovered at any time and thus appear in `closer_peers` before the already active / pending peers. 4. The `Frozen` query mechanism allowed all remaining not-yet contacted peers to be contacted, but their results were discarded, because `inject_rpc_result` would only incorporate results while the query is `Iterating`. The `Frozen` state has been reworked into a `Stalled` state that implements a slightly more permissive variant of the following from the paper / specs: "If a round of FIND_NODEs fails to return a node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the k closest nodes it has not already queried.". Importantly, though not explicitly mentioned, the query can move back to `Iterating` if it makes further progress again as a result of these requests. The `Stalled` state thus allows (temporarily) higher parallelism in an effort to make progress and bring the query to an end. Performance: 1. Repeated distance calculations between the same peers and the target is avoided. 2. Enabled by libp2p#1108, use of a more appropriate data structure (`BTreeMap`) for the incrementally updated list of closer peers. The data structure needs efficient lookups (to avoid duplicates) and insertions at any position, both of which large(r) vectors are not that good at. Unscientific benchmarks showed a ~40-60% improvement in somewhat pathological scenarios with at least 20 healthy nodes, each possibly returning a distinct list of closer 20 peers to the requestor. A previous assumption may have been that the vector always stays very small, but that is not the case in larger clusters: Even if the lists of closer peers reported by the 20 contacted peers are heavily overlapping, typically a lot more than 20 peers have to be (at least temporarily) considered as closest peers until the query completes. See also issue (2) above. New tests are added for: * Query termination conditions. * Bounded parallelism. * Absence of duplicates.

Refactoring of iterative queries (`query.rs`) to improve both correctness and performance (for larger DHTs): Correctness: 1. Queries no longer terminate prematurely due to counting results from peers farther from the target while results from closer peers are still pending. (#1105). 2. Queries no longer ignore reported closer peers that are not duplicates just because they are currently not among the `num_results` closest. The currently `max_results` closest may contain peers marked as failed or pending / waiting. Hence all reported closer peers that are not duplicates must be considered candidates that may still end up among the `num_results` closest that successfully responded. 3. Bounded parallelism based on the `active_counter` was not working correctly, as new (not yet contacted) peers closer to the target may be discovered at any time and thus appear in `closer_peers` before the already active / pending peers. 4. The `Frozen` query mechanism allowed all remaining not-yet contacted peers to be contacted, but their results were discarded, because `inject_rpc_result` would only incorporate results while the query is `Iterating`. The `Frozen` state has been reworked into a `Stalled` state that implements a slightly more permissive variant of the following from the paper / specs: "If a round of FIND_NODEs fails to return a node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the k closest nodes it has not already queried.". Importantly, though not explicitly mentioned, the query can move back to `Iterating` if it makes further progress again as a result of these requests. The `Stalled` state thus allows (temporarily) higher parallelism in an effort to make progress and bring the query to an end. Performance: 1. Repeated distance calculations between the same peers and the target is avoided. 2. Enabled by #1108, use of a more appropriate data structure (`BTreeMap`) for the incrementally updated list of closer peers. The data structure needs efficient lookups (to avoid duplicates) and insertions at any position, both of which large(r) vectors are not that good at. Unscientific benchmarks showed a ~40-60% improvement in somewhat pathological scenarios with at least 20 healthy nodes, each possibly returning a distinct list of closer 20 peers to the requestor. A previous assumption may have been that the vector always stays very small, but that is not the case in larger clusters: Even if the lists of closer peers reported by the 20 contacted peers are heavily overlapping, typically a lot more than 20 peers have to be (at least temporarily) considered as closest peers until the query completes. See also issue (2) above. New tests are added for: * Query termination conditions. * Bounded parallelism. * Absence of duplicates.

tomaka added priority:nicetohave priority:important The changes needed are critical for libp2p, or are blocking another project difficulty:easy labels May 6, 2019

romanb self-assigned this May 7, 2019

This was referenced Jun 2, 2019

Kademlia: Refactor iterative queries. #1154

Merged

Kademlia: Fix premature query termination. #1155

Merged

romanb closed this as completed Jun 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kademlia query termination isn't properly implemented #1105

Kademlia query termination isn't properly implemented #1105

tomaka commented May 6, 2019

Kademlia query termination isn't properly implemented #1105

Kademlia query termination isn't properly implemented #1105

Comments

tomaka commented May 6, 2019