Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kademlia: Optimise iteration over closest keys / entries. #1117

Merged
merged 3 commits into from
May 22, 2019

Conversation

romanb
Copy link
Contributor

@romanb romanb commented May 13, 2019

Based on #1108.

The current implementation for finding the entries whose keys are closest to some target key in the Kademlia routing table involves copying the keys of all buckets into a new Vec which is then sorted based on the
distances of the entries to the target and turned into an iterator from which only a small number of elements (by default 20) are drawn.

This commit introduces an iterator for finding the closest keys (or entries) to a target that visits the buckets in the optimal order, based on the information contained in the distance bit-string representing the distance between the local key and the target (introduced in #1108). I.e. the needed contents of the respectively next bucket are cloned and sorted only as more elements are drawn from the iterator.

Correctness is tested against full-table scans.

Because there was some overlap between the handling of pending nodes and the newly introduced iterator(s) in the sense that any access to a bucket should check for applicability of pending nodes to bring the bucket up-to-date, and there were a few TODOs left around the pending node handling, I refactored that part and added more tests. In particular, I did the following:

  • The (internal) bucket API for a single bucket was moved to kbucket::bucket, extracting large parts of code formerly directly embedded into the Entry API. That improves reuse and testability.
  • The (public) Entry API was moved to the kbucket::entry sub-module and simplified due to the first point. The Entry API now just mediates access to the internal bucket API.
  • The pending node handling reflects the following policy: The nodes in a bucket are ordered from least-recently connected to most-recently connected, i.e. a "connection-oriented" variant of what is described in the paper.

Due to refactorings the diffs are a bit large and I suggest the following (bottom-up) order for an effective review:

  1. The new kbucket::bucket module.
  2. The new kbucket::entry module.
  3. The new iterator(s) in the kbucket module and the new top-level functions closest and closest_keys, implementing the optimised bucket iteration. Here is the relevant part in the kbucket module.
  4. The remaining full diff of the kbucket and behaviour modules which adopt the rest of the code to the above changes. That should be quick after seeing all the previous changes.

@ghost ghost assigned romanb May 13, 2019
@ghost ghost added the in progress label May 13, 2019
@romanb romanb force-pushed the kad-closest branch 2 times, most recently from 7ca0e04 to 765033a Compare May 20, 2019 09:03
The current implementation for finding the entries whose keys are closest
to some target key in the Kademlia routing table involves copying the
keys of all buckets into a new `Vec` which is then sorted based on the
distances to the target and turned into an iterator from which only a
small number of elements (by default 20) are drawn.

This commit introduces an iterator over buckets for finding the closest
keys to a target that visits the buckets in the optimal order, based on
the information contained in the distance bit-string representing the
distance between the local key and the target.

Correctness is tested against full-table scans.

Also included:

  * Updated documentation.
  * The `Entry` API was moved to the `kbucket::entry` sub-module for
    ease of maintenance.
  * The pending node handling has been slightly refactored in order to
    bring code and documentation in agreement and clarify the semantics
    a little.
@tomaka
Copy link
Member

tomaka commented May 20, 2019

As a heads up, let me know whether this is ready for review (otherwise I won't review it).

@romanb
Copy link
Contributor Author

romanb commented May 20, 2019

Now ready for review. The PR description has been updated and should be re-read if someone read an earlier version. I'm only polishing documentation here and there and maybe add one or the other additional test that comes to mind, but otherwise I'm moving on to #146 from here.

@romanb romanb marked this pull request as ready for review May 20, 2019 14:35
Copy link
Member

@tomaka tomaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nit-picks. I admit that I don't fully understand the logic of the closest iterator, but that's because I admit that I'm having trouble warping my head aroud the distance thing. But the code looks solid.

As a general remark, I'm not a fan of having the data structure update itself over time. I think it would be preferable to explicitly call a method that accounts for timeouts in the k-buckets, instead of having for example iter() automatically account for that. However, since it was already like that before, that's out of the scope of this PR.

@@ -377,7 +377,7 @@ impl<'a> PollParameters<'a> {
}

/// Returns the list of the addresses nodes can use to reach us.
pub fn external_addresses(&self) -> impl ExactSizeIterator<Item = &Multiaddr> {
pub fn external_addresses(&self) -> impl ExactSizeIterator<Item = &Multiaddr> + Clone {
Copy link
Member

@tomaka tomaka May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the way that you use external_addresses, I don't think that this additional Clone is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this? It requires one more cloning of the addresses than is strictly necessary though, doesn't it? Do you prefer that or am I overlooking something?

Copy link
Member

@tomaka tomaka May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't possible to directly write multiaddrs: parameters.external_addresses().cloned().collect() and remove that let local_addrs = altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, thanks!

@@ -203,39 +204,49 @@ impl<TSubstream> Kademlia<TSubstream> {
/// Adds a known address for the given `PeerId`. We are connected to this address.
// TODO: report if the address was inserted? also, semantics unclear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally out of the scope of this PR, but the semantics of add_connected_address vs add_not_connected_address are extremely crappy and come from a time where Kademlia was even less correctly implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What irritates me about these functions is that they seemingly allow user code to choose whether a node is considered to be connected, even though the Kademlia behaviour has no knowledge of that connection. I know that the old implementation actually ignored the connected argument, making the two functions equivalent, maybe exactly for that reason. How about then fusing these two functions into just add_address and give it the semantics of adding a known address for a peer to the routing table, with no influence on the connection status (meaning disconnected if the peer associated with the address is not yet in the routing table)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with fusing methods.

}

impl<T> Eq for Key<T> {}
/// A (safe) index into a `KBucketsTable`, i.e. a non-negative integer in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit-pick: to me "safe" means "memory-safe".

Suggested change
/// A (safe) index into a `KBucketsTable`, i.e. a non-negative integer in the
/// A (type-safe) index into a `KBucketsTable`, i.e. a non-negative integer in the

/// `None` indicates that there are no connected entries in the bucket, i.e.
/// the bucket is either empty, or contains only entries for peers that are
/// considered disconnected.
first_connected_pos: Option<usize>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use an Option instead of setting it to nodes.len()? For simplicity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the semantics clearer if first_connected_pos is either not set (None) or is a valid index into nodes, instead of attaching special meaning to out-of-bounds values (w.r.t. nodes).

@romanb
Copy link
Contributor Author

romanb commented May 22, 2019

A few nit-picks. I admit that I don't fully understand the logic of the closest iterator, but that's because I admit that I'm having trouble warping my head aroud the distance thing. But the code looks solid.

It is certainly not straight-forward but maybe a small example may help to illustrate the principle and provide an intuition for why this procedure works (for which the tests against full-table scans provide additional confidence):

Let the keyspace be [0, 2^4), i.e. 0000 through 1111. Furthermore let the local_key be 1010 and the target be 1100 with (XOR) distance function d. Then the distance between local_key and target is given by d(1010, 1100) = 0110 = 6. We are looking for the keys closest to target in the buckets of the routing table of local_key, with increasing distance.

The closest key to target is obviously 1100, i.e. target itself, with distance 0. That key has distance 6 from local_key, as seen above, so it falls into bucket 2 of the routing table which covers the distance interval [2^2, 2^3) from local_key. Therefore bucket 2 is the first bucket to visit. That bucket covers all keys of the form 11xx whose distances to target are 0-3 (1100 through 1111).

So the next closest key to the target not in bucket 2 must have distance at least 4 to target, i.e. must differ in the bit position for 2^2 from target. That is the key 1000, which conceptually becomes the new target. The distance of that new target to the local_key is d(1010, 1000) = 0010 = 2, i.e. bucket 1 covering distances [2^1, 2^2). Therefore bucket 1 is the next to visit. That bucket covers all keys of them form 100x
whose distances to the new target are 0 (1000) and 1 (1001) and correspondingly 4 + 0 = 4 and 4 + 1 = 5 from the original target.

So the next closest key to the original target not in bucket 1 must have distance at least 6, i.e. must differ in the bit positions for 2^2 and 2^1 from the original target. But that is the key 1010 which is the local_key and is hence skipped, so we continue with 1011 as the new target, having distance 7 from the original target. The distance to the local_key is obviously 1, so it is the sole key covered by bucket 0. Therefore bucket 0 is the next bucket to visit.

So the next closest key to the original target not in bucket 0 must have distance at least 8, i.e. must differ in the bit position for 2^3 from the original target. That is the key 0100 which again conceptually becomes the new target. That key has distance d(1010, 0100) = 1110 = 14 from the local_key, thus falls in bucket 3 covering the distances [2^3, 2^4). That bucket covers all keys of the form 0xxx, i.e. whose distance to the local_key has no leading zeros - the furthest bucket covering half of the keyspace. Therefore bucket 3 is the next (and in this tiny example, the last) bucket to visit whose keys cover distances 8-15 from the local_key as well as the original target.

The order in which to visit the buckets to find the closest keys to target is thus 2, 1, 0, 3.

Closer inspection of this procedure shows that it derives mechanically from the binary representation of the distance between local_key and target: One first takes the bit positions showing a 1 from left to right (which I referred to in the code as the "zooming in" part, since it moves to buckets closer and closer to the local_key), followed by taking the bit positions containing a 0 from right to left (the "zooming out" part, since we are now moving away from the local_key). Intuitively it is clear that flipping bit positions in the target where target differs from local_key results in keys closer and closer to the local_key (but further and further from target), whereas flipping bit positions where the local_key and target agree results in keys with increasing distance from both.

As a general remark, I'm not a fan of having the data structure update itself over time. I think it would be preferable to explicitly call a method that accounts for timeouts in the k-buckets, instead of having for example iter() automatically account for that. However, since it was already like that before, that's out of the scope of this PR.

I share the mixed feelings about the current approach. On the upside, a user of the API cannot forget to apply pending entries, since that happens automatically when accessing the routing table. The downside is that the KBucketsTable needs to keep track of these results and provide an API for consuming them (here KBucketsTable::take_applied_pending) in order to know about all insertions. As you said, this was basically the partially implemented approach already present, which I only tried to bring to its logical conclusion in the context of this PR.

I agree that the alternative of having an explicit API call of the form KBucketsTable::apply_pending that must be called by the user is worth considering for future work, assuming that is what you had in mind. While it has the downside that client code of such an API may either forget to call it, or call it sub-optimally (e.g. too infrequently), resulting in "stale" results from the routing table, the application of pending entries is already subject to a timeout that is in itself inaccurate and thus sub-optimal calls to apply_pending may be of little practical importance. On a technical note, I think with such an approach the KBucketsTable should still keep track of which buckets have elapsed pending entries, i.e. up to 256 boolean values, so that KBucketsTable::apply_pending does not need to traverse all buckets.

A second alternative could be to leave the concept of pending entries entirely outside of the kbucket module, instead also allowing deletion of entries via the Entry API. I haven't thought that through much further yet, however.

@tomaka
Copy link
Member

tomaka commented May 22, 2019

Thanks for the explanation!

I agree that the alternative of having an explicit API call of the form KBucketsTable::apply_pending that must be called by the user is worth considering for future work, assuming that is what you had in mind.

Yes, that's what I had in mind.

A second alternative could be to leave the concept of pending entries entirely outside of the kbucket module, instead also allowing deletion of entries via the Entry API. I haven't thought that through much further yet, however.

I think that this is a viable option. I expect the last few buckets to be full and constantly have a pending node, but the first 245 buckets or so probably will probably never be full.
It would therefore make a lot of sense to store the pending nodes in an entirely separate container.

But again, this is for future work.

@romanb romanb merged commit 09f54df into libp2p:master May 22, 2019
@romanb romanb deleted the kad-closest branch May 22, 2019 12:49
@tomaka tomaka mentioned this pull request May 23, 2019
g-r-a-n-t pushed a commit to g-r-a-n-t/rust-libp2p that referenced this pull request Jun 13, 2019
* Kademlia: Optimise iteration over closest entries.

The current implementation for finding the entries whose keys are closest
to some target key in the Kademlia routing table involves copying the
keys of all buckets into a new `Vec` which is then sorted based on the
distances to the target and turned into an iterator from which only a
small number of elements (by default 20) are drawn.

This commit introduces an iterator over buckets for finding the closest
keys to a target that visits the buckets in the optimal order, based on
the information contained in the distance bit-string representing the
distance between the local key and the target.

Correctness is tested against full-table scans.

Also included:

  * Updated documentation.
  * The `Entry` API was moved to the `kbucket::entry` sub-module for
    ease of maintenance.
  * The pending node handling has been slightly refactored in order to
    bring code and documentation in agreement and clarify the semantics
    a little.

* Rewrite pending node handling and add tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants