Kademlia: Optimise iteration over closest keys / entries. #1117

romanb · 2019-05-13T15:49:22Z

Based on #1108.

The current implementation for finding the entries whose keys are closest to some target key in the Kademlia routing table involves copying the keys of all buckets into a new Vec which is then sorted based on the
distances of the entries to the target and turned into an iterator from which only a small number of elements (by default 20) are drawn.

This commit introduces an iterator for finding the closest keys (or entries) to a target that visits the buckets in the optimal order, based on the information contained in the distance bit-string representing the distance between the local key and the target (introduced in #1108). I.e. the needed contents of the respectively next bucket are cloned and sorted only as more elements are drawn from the iterator.

Correctness is tested against full-table scans.

Because there was some overlap between the handling of pending nodes and the newly introduced iterator(s) in the sense that any access to a bucket should check for applicability of pending nodes to bring the bucket up-to-date, and there were a few TODOs left around the pending node handling, I refactored that part and added more tests. In particular, I did the following:

The (internal) bucket API for a single bucket was moved to kbucket::bucket, extracting large parts of code formerly directly embedded into the Entry API. That improves reuse and testability.
The (public) Entry API was moved to the kbucket::entry sub-module and simplified due to the first point. The Entry API now just mediates access to the internal bucket API.
The pending node handling reflects the following policy: The nodes in a bucket are ordered from least-recently connected to most-recently connected, i.e. a "connection-oriented" variant of what is described in the paper.

Due to refactorings the diffs are a bit large and I suggest the following (bottom-up) order for an effective review:

The new kbucket::bucket module.
The new kbucket::entry module.
The new iterator(s) in the kbucket module and the new top-level functions closest and closest_keys, implementing the optimised bucket iteration. Here is the relevant part in the kbucket module.
The remaining full diff of the kbucket and behaviour modules which adopt the rest of the code to the above changes. That should be quick after seeing all the previous changes.

The current implementation for finding the entries whose keys are closest to some target key in the Kademlia routing table involves copying the keys of all buckets into a new `Vec` which is then sorted based on the distances to the target and turned into an iterator from which only a small number of elements (by default 20) are drawn. This commit introduces an iterator over buckets for finding the closest keys to a target that visits the buckets in the optimal order, based on the information contained in the distance bit-string representing the distance between the local key and the target. Correctness is tested against full-table scans. Also included: * Updated documentation. * The `Entry` API was moved to the `kbucket::entry` sub-module for ease of maintenance. * The pending node handling has been slightly refactored in order to bring code and documentation in agreement and clarify the semantics a little.

tomaka · 2019-05-20T14:29:48Z

As a heads up, let me know whether this is ready for review (otherwise I won't review it).

romanb · 2019-05-20T14:35:54Z

Now ready for review. The PR description has been updated and should be re-read if someone read an earlier version. I'm only polishing documentation here and there and maybe add one or the other additional test that comes to mind, but otherwise I'm moving on to #146 from here.

tomaka

A few nit-picks. I admit that I don't fully understand the logic of the closest iterator, but that's because I admit that I'm having trouble warping my head aroud the distance thing. But the code looks solid.

As a general remark, I'm not a fan of having the data structure update itself over time. I think it would be preferable to explicitly call a method that accounts for timeouts in the k-buckets, instead of having for example iter() automatically account for that. However, since it was already like that before, that's out of the scope of this PR.

tomaka · 2019-05-20T15:35:18Z

core/src/swarm/swarm.rs

@@ -377,7 +377,7 @@ impl<'a> PollParameters<'a> {
    }

    /// Returns the list of the addresses nodes can use to reach us.
-    pub fn external_addresses(&self) -> impl ExactSizeIterator<Item = &Multiaddr> {
+    pub fn external_addresses(&self) -> impl ExactSizeIterator<Item = &Multiaddr> + Clone {


Seeing the way that you use external_addresses, I don't think that this additional Clone is needed.

You mean like this? It requires one more cloning of the addresses than is strictly necessary though, doesn't it? Do you prefer that or am I overlooking something?

Isn't possible to directly write multiaddrs: parameters.external_addresses().cloned().collect() and remove that let local_addrs = altogether?

Totally, thanks!

tomaka · 2019-05-20T15:38:14Z

protocols/kad/src/behaviour.rs

@@ -203,39 +204,49 @@ impl<TSubstream> Kademlia<TSubstream> {
    /// Adds a known address for the given `PeerId`. We are connected to this address.
    // TODO: report if the address was inserted? also, semantics unclear


This is totally out of the scope of this PR, but the semantics of add_connected_address vs add_not_connected_address are extremely crappy and come from a time where Kademlia was even less correctly implemented.

What irritates me about these functions is that they seemingly allow user code to choose whether a node is considered to be connected, even though the Kademlia behaviour has no knowledge of that connection. I know that the old implementation actually ignored the connected argument, making the two functions equivalent, maybe exactly for that reason. How about then fusing these two functions into just add_address and give it the semantics of adding a known address for a peer to the routing table, with no influence on the connection status (meaning disconnected if the peer associated with the address is not yet in the routing table)?

I'm fine with fusing methods.

tomaka · 2019-05-20T15:54:30Z

protocols/kad/src/kbucket.rs

 }

-impl<T> Eq for Key<T> {}
+/// A (safe) index into a `KBucketsTable`, i.e. a non-negative integer in the


Minor nit-pick: to me "safe" means "memory-safe".

Suggested change

/// A (safe) index into a `KBucketsTable`, i.e. a non-negative integer in the

/// A (type-safe) index into a `KBucketsTable`, i.e. a non-negative integer in the

tomaka · 2019-05-20T15:59:37Z

protocols/kad/src/kbucket/bucket.rs

+    /// `None` indicates that there are no connected entries in the bucket, i.e.
+    /// the bucket is either empty, or contains only entries for peers that are
+    /// considered disconnected.
+    first_connected_pos: Option<usize>,


Why use an Option instead of setting it to nodes.len()? For simplicity?

I find the semantics clearer if first_connected_pos is either not set (None) or is a valid index into nodes, instead of attaching special meaning to out-of-bounds values (w.r.t. nodes).

romanb · 2019-05-22T07:53:48Z

A few nit-picks. I admit that I don't fully understand the logic of the closest iterator, but that's because I admit that I'm having trouble warping my head aroud the distance thing. But the code looks solid.

It is certainly not straight-forward but maybe a small example may help to illustrate the principle and provide an intuition for why this procedure works (for which the tests against full-table scans provide additional confidence):

Let the keyspace be [0, 2^4), i.e. 0000 through 1111. Furthermore let the local_key be 1010 and the target be 1100 with (XOR) distance function d. Then the distance between local_key and target is given by d(1010, 1100) = 0110 = 6. We are looking for the keys closest to target in the buckets of the routing table of local_key, with increasing distance.

The closest key to target is obviously 1100, i.e. target itself, with distance 0. That key has distance 6 from local_key, as seen above, so it falls into bucket 2 of the routing table which covers the distance interval [2^2, 2^3) from local_key. Therefore bucket 2 is the first bucket to visit. That bucket covers all keys of the form 11xx whose distances to target are 0-3 (1100 through 1111).

So the next closest key to the target not in bucket 2 must have distance at least 4 to target, i.e. must differ in the bit position for 2^2 from target. That is the key 1000, which conceptually becomes the new target. The distance of that new target to the local_key is d(1010, 1000) = 0010 = 2, i.e. bucket 1 covering distances [2^1, 2^2). Therefore bucket 1 is the next to visit. That bucket covers all keys of them form 100x
whose distances to the new target are 0 (1000) and 1 (1001) and correspondingly 4 + 0 = 4 and 4 + 1 = 5 from the original target.

So the next closest key to the original target not in bucket 1 must have distance at least 6, i.e. must differ in the bit positions for 2^2 and 2^1 from the original target. But that is the key 1010 which is the local_key and is hence skipped, so we continue with 1011 as the new target, having distance 7 from the original target. The distance to the local_key is obviously 1, so it is the sole key covered by bucket 0. Therefore bucket 0 is the next bucket to visit.

So the next closest key to the original target not in bucket 0 must have distance at least 8, i.e. must differ in the bit position for 2^3 from the original target. That is the key 0100 which again conceptually becomes the new target. That key has distance d(1010, 0100) = 1110 = 14 from the local_key, thus falls in bucket 3 covering the distances [2^3, 2^4). That bucket covers all keys of the form 0xxx, i.e. whose distance to the local_key has no leading zeros - the furthest bucket covering half of the keyspace. Therefore bucket 3 is the next (and in this tiny example, the last) bucket to visit whose keys cover distances 8-15 from the local_key as well as the original target.

The order in which to visit the buckets to find the closest keys to target is thus 2, 1, 0, 3.

Closer inspection of this procedure shows that it derives mechanically from the binary representation of the distance between local_key and target: One first takes the bit positions showing a 1 from left to right (which I referred to in the code as the "zooming in" part, since it moves to buckets closer and closer to the local_key), followed by taking the bit positions containing a 0 from right to left (the "zooming out" part, since we are now moving away from the local_key). Intuitively it is clear that flipping bit positions in the target where target differs from local_key results in keys closer and closer to the local_key (but further and further from target), whereas flipping bit positions where the local_key and target agree results in keys with increasing distance from both.

As a general remark, I'm not a fan of having the data structure update itself over time. I think it would be preferable to explicitly call a method that accounts for timeouts in the k-buckets, instead of having for example iter() automatically account for that. However, since it was already like that before, that's out of the scope of this PR.

I share the mixed feelings about the current approach. On the upside, a user of the API cannot forget to apply pending entries, since that happens automatically when accessing the routing table. The downside is that the KBucketsTable needs to keep track of these results and provide an API for consuming them (here KBucketsTable::take_applied_pending) in order to know about all insertions. As you said, this was basically the partially implemented approach already present, which I only tried to bring to its logical conclusion in the context of this PR.

I agree that the alternative of having an explicit API call of the form KBucketsTable::apply_pending that must be called by the user is worth considering for future work, assuming that is what you had in mind. While it has the downside that client code of such an API may either forget to call it, or call it sub-optimally (e.g. too infrequently), resulting in "stale" results from the routing table, the application of pending entries is already subject to a timeout that is in itself inaccurate and thus sub-optimal calls to apply_pending may be of little practical importance. On a technical note, I think with such an approach the KBucketsTable should still keep track of which buckets have elapsed pending entries, i.e. up to 256 boolean values, so that KBucketsTable::apply_pending does not need to traverse all buckets.

A second alternative could be to leave the concept of pending entries entirely outside of the kbucket module, instead also allowing deletion of entries via the Entry API. I haven't thought that through much further yet, however.

tomaka · 2019-05-22T08:55:26Z

Thanks for the explanation!

I agree that the alternative of having an explicit API call of the form KBucketsTable::apply_pending that must be called by the user is worth considering for future work, assuming that is what you had in mind.

Yes, that's what I had in mind.

A second alternative could be to leave the concept of pending entries entirely outside of the kbucket module, instead also allowing deletion of entries via the Entry API. I haven't thought that through much further yet, however.

I think that this is a viable option. I expect the last few buckets to be full and constantly have a pending node, but the first 245 buckets or so probably will probably never be full.
It would therefore make a lot of sense to store the pending nodes in an entirely separate container.

But again, this is for future work.

* Kademlia: Optimise iteration over closest entries. The current implementation for finding the entries whose keys are closest to some target key in the Kademlia routing table involves copying the keys of all buckets into a new `Vec` which is then sorted based on the distances to the target and turned into an iterator from which only a small number of elements (by default 20) are drawn. This commit introduces an iterator over buckets for finding the closest keys to a target that visits the buckets in the optimal order, based on the information contained in the distance bit-string representing the distance between the local key and the target. Correctness is tested against full-table scans. Also included: * Updated documentation. * The `Entry` API was moved to the `kbucket::entry` sub-module for ease of maintenance. * The pending node handling has been slightly refactored in order to bring code and documentation in agreement and clarify the semantics a little. * Rewrite pending node handling and add tests.

ghost assigned romanb May 13, 2019

ghost added the in progress label May 13, 2019

tomaka mentioned this pull request May 15, 2019

Kademlia: Correct XOR metric. #1108

Merged

romanb force-pushed the kad-closest branch 2 times, most recently from 7ca0e04 to 765033a Compare May 20, 2019 09:03

romanb force-pushed the kad-closest branch from 765033a to 06c0082 Compare May 20, 2019 09:05

romanb marked this pull request as ready for review May 20, 2019 14:35

tomaka approved these changes May 20, 2019

View reviewed changes

Rewrite pending node handling and add tests.

7bdccd9

romanb force-pushed the kad-closest branch from aa14417 to 7bdccd9 Compare May 22, 2019 12:10

Merge branch 'master' into kad-closest

f63c76e

romanb merged commit 09f54df into libp2p:master May 22, 2019

romanb deleted the kad-closest branch May 22, 2019 12:49

romanb mentioned this pull request May 22, 2019

Kademlia: Fix status updates in KBuckets. #1140

Merged

tomaka mentioned this pull request May 23, 2019

Update dependencies #1142

Merged

romanb mentioned this pull request Jun 4, 2019

Add a method to Kademlia to get detailed information about peers #1056

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kademlia: Optimise iteration over closest keys / entries. #1117

Kademlia: Optimise iteration over closest keys / entries. #1117

romanb commented May 13, 2019 •

edited

Loading

tomaka commented May 20, 2019

romanb commented May 20, 2019

tomaka left a comment •

edited

Loading

tomaka May 20, 2019 •

edited

Loading

romanb May 20, 2019

tomaka May 21, 2019 •

edited

Loading

romanb May 21, 2019

tomaka May 20, 2019

romanb May 22, 2019

tomaka May 22, 2019

tomaka May 20, 2019

tomaka May 20, 2019

romanb May 20, 2019

romanb commented May 22, 2019 •

edited

Loading

tomaka commented May 22, 2019 •

edited

Loading

		@@ -203,39 +204,49 @@ impl<TSubstream> Kademlia<TSubstream> {
		/// Adds a known address for the given `PeerId`. We are connected to this address.
		// TODO: report if the address was inserted? also, semantics unclear

	/// A (safe) index into a `KBucketsTable`, i.e. a non-negative integer in the
	/// A (type-safe) index into a `KBucketsTable`, i.e. a non-negative integer in the

Kademlia: Optimise iteration over closest keys / entries. #1117

Kademlia: Optimise iteration over closest keys / entries. #1117

Conversation

romanb commented May 13, 2019 • edited Loading

tomaka commented May 20, 2019

romanb commented May 20, 2019

tomaka left a comment • edited Loading

Choose a reason for hiding this comment

tomaka May 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka May 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanb commented May 22, 2019 • edited Loading

tomaka commented May 22, 2019 • edited Loading

romanb commented May 13, 2019 •

edited

Loading

tomaka left a comment •

edited

Loading

tomaka May 20, 2019 •

edited

Loading

tomaka May 21, 2019 •

edited

Loading

romanb commented May 22, 2019 •

edited

Loading

tomaka commented May 22, 2019 •

edited

Loading