Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: reuse nim-libp2p peerstore + move peermanager logic #1383

Merged
merged 6 commits into from
Nov 24, 2022

Conversation

alrevuelta
Copy link
Contributor

@alrevuelta alrevuelta commented Nov 15, 2022

Closes #622

  • Reuse nim-libp2p peerstore and remove duplicated fields.
  • Move all logic related to accessing peers, retrieving peers with specific conditions (like supported protocol) from peer_manager to the peer store.
  • Add new cli flag to configure the number of peers stored by the nim-libp2p peerstore.
  • Add new field tracking peer origin (discv5, static, dns). Unused.
  • Add test coverage in the peermanager, since its being modified and some functions were not unit tested.

@status-im-auto
Copy link
Collaborator

status-im-auto commented Nov 15, 2022

Jenkins Builds

Click to see older builds (6)
Commit #️⃣ Finished (UTC) Duration Platform Result
7a94032 #1 2022-11-15 13:03:07 ~8 min linux 📄log
7a94032 #1 2022-11-15 13:07:05 ~12 min macos 📄log
bb92436 #2 2022-11-22 09:39:31 ~7 min linux 📄log
bb92436 #2 2022-11-22 09:40:01 ~8 min macos 📄log
165f36f #3 2022-11-22 13:05:56 ~7 min linux 📄log
165f36f #3 2022-11-22 13:11:52 ~13 min macos 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ d1a1d12 #4 2022-11-22 13:32:55 ~14 min macos 📦bin
✔️ d1a1d12 #4 2022-11-22 13:33:55 ~15 min linux 📦bin
779cc03 #5 2022-11-23 23:04:32 ~17 min macos 📄log

waku.nimble Outdated Show resolved Hide resolved
return some(peerStored.toRemotePeerInfo())
else:
return none(RemotePeerInfo)

proc reconnectPeers*(pm: PeerManager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if we should keep this reconnection logic, at least in the medium term: this was added before discovery methods, mainly to reconnect fleet nodes to each other after a restart. The main difference between the previous peer store and the underlying libp2p one, is that the previous peer store only contained peers that we initiated a connection to, whereas the libp2p peer store also contains all nodes connecting to us. This implies that we only attempted to reconnect to peers that we explicitly connected to before (e.g. using the connectToNodes API call). Because GossipSub has a mandatory backoff time before GRAFTing a PRUNEd connection (implicitly then reconnecting after a restart), we had to wait before attempting to reconnect to each node. IIRC this can block for quite a while when the node starts (and maybe much more so now that all nodes will be attempted with reconnection). Now that nodes generally use DNS discovery on boot to reconnect to nodes, perhaps the whole idea of peer persistence and reconnection should be rethought?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

perhaps the whole idea of peer persistence and reconnection should be rethought?

Yep. Regarding reconnections, I think reconnectPeers can be rescoped to be runPeeringStrategy (not happy with the name) with the following functionalities:

  • Infinite loop.
  • If connectedPeers is below a threshold, try to connect to new peers, from the peerstore.
  • It respects the backoff (as its done)
  • Some criteria TBD can be added on which peers to select, like score, keeping a given inbound/outbound ratio, keeping a store/relay/etc nodes ratio, etc.
  • Forces disconnections if needed.
  • Its totally agnostic of peer discovery.
  • And does not add peers to the peer store as done now: pm.addPeer(remotePeerInfo, proto)

Beyond runPeeringStrategy, we will have runPrunningStrategy that will remove peers from the peerstore and runDiscoveryStrategy that will populate the peerstore with new peers. Names TBD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

Absolutely. I think, though, this PR will change the current behaviour, unless I'm missing something. Previously on startup a peer would reconnect only to peers that were statically added or connected via DNS discovery. Now all peers will both be persisted and a reconnection attempted.

Changed my thinking halfway through typing this: the peers we persist will still only be the ones added via addPeers() (i.e. the ones we ourselves statically configured, discovered via DNS, etc) and not include all peers in the peer store (which would include incoming connections too)? My worry was that we'll now keep persisting all these peers and reattempt connecting to all of them forever (since the peer storage is never truncated/pruned as it stands).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll now keep persisting all these peers and reattempt connecting to all of them forever

As I can see in kibana we are already attempting to connect to all peers forever, so after checking it I would say the behaviour is the same. But I have this in mind and planning to further investigate it, also related to the issue in the disv5 loop that we discussed. We are constanly discovering peers, connecting to them. But since we don't have that many peers, we are always trying to connect to the same 10-20 nodes.

which would include incoming connections too

Yes! Will address all this in another PR, with a new field indicating Direction in/out. Then that would be used to complete #1206 (showing number of in/out connections)

# Check if peer is reachable.
if pm.peerStore.connectionBook[storedInfo.peerId] == CannotConnect:
if pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect: #TODO what if it doesnt exist?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious about this TODO. :D

Copy link
Contributor

@Menduist Menduist Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default(T), which is debatable

if storedInfo.peerId in pm.peerStore[ConnectionBook] and pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect:

# or a pattern I start to like & we could add, even though it's ugly, {} returning an option
if pm.peerStore[ConnectionBook]{storedInfo.peerId} == some(CannotConnect)

Copy link
Contributor Author

@alrevuelta alrevuelta Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love snippets that can be run nim r file.nim. Looks like it just prints an empty string "".

import libp2p/peerstore
import libp2p/peerid
let myPeerStore = PeerStore.new(capacity = 5)

type ConnectionBook = ref object of PeerBook[string]

var p1, p2: PeerId
let ok1 =  p1.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW1")
let ok2 =  p2.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW2")

echo ok1, ok2
myPeerStore[ConnectionBook][p1] = "Connected"

echo myPeerStore[ConnectionBook][p1]
echo myPeerStore[ConnectionBook][p2]

#Output:
#truetrue
#Connected
#

Edit:

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think the direction looks good!
Two things I think to consider:

  • the subtle differences of peer persistence and reconnection (with backoff times) now that we consider all connected peers (incoming and outgoing) vs the previous peer store where we had more control (only peers we initiated connection to added to the store).
  • keeping the PRs related to this as contained as possible, so we can review and increment to the complete feature. Probably good idea to keep the existing API as stable as possible until we have a design for the improved peer manager API. Agree though with the moving of peer* API procs to the peerStore rather than the manager.

@arnetheduck
Copy link
Contributor

how is application-specific per-peer data stored when using libp2p-based peer manager?

@Menduist
Copy link
Contributor

Menduist commented Nov 18, 2022

See the "new" peer store that enables to create custom books with custom types:
https://github.com/status-im/nim-libp2p/blob/unstable/libp2p/peerstore.nim#L11-L23

Copy link
Contributor Author

@alrevuelta alrevuelta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jm-clius really appreciate the comments. left some answers :)

waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved
waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved
return some(peerStored.toRemotePeerInfo())
else:
return none(RemotePeerInfo)

proc reconnectPeers*(pm: PeerManager,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

perhaps the whole idea of peer persistence and reconnection should be rethought?

Yep. Regarding reconnections, I think reconnectPeers can be rescoped to be runPeeringStrategy (not happy with the name) with the following functionalities:

  • Infinite loop.
  • If connectedPeers is below a threshold, try to connect to new peers, from the peerstore.
  • It respects the backoff (as its done)
  • Some criteria TBD can be added on which peers to select, like score, keeping a given inbound/outbound ratio, keeping a store/relay/etc nodes ratio, etc.
  • Forces disconnections if needed.
  • Its totally agnostic of peer discovery.
  • And does not add peers to the peer store as done now: pm.addPeer(remotePeerInfo, proto)

Beyond runPeeringStrategy, we will have runPrunningStrategy that will remove peers from the peerstore and runDiscoveryStrategy that will populate the peerstore with new peers. Names TBD.

# Check if peer is reachable.
if pm.peerStore.connectionBook[storedInfo.peerId] == CannotConnect:
if pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect: #TODO what if it doesnt exist?
Copy link
Contributor Author

@alrevuelta alrevuelta Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love snippets that can be run nim r file.nim. Looks like it just prints an empty string "".

import libp2p/peerstore
import libp2p/peerid
let myPeerStore = PeerStore.new(capacity = 5)

type ConnectionBook = ref object of PeerBook[string]

var p1, p2: PeerId
let ok1 =  p1.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW1")
let ok2 =  p2.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW2")

echo ok1, ok2
myPeerStore[ConnectionBook][p1] = "Connected"

echo myPeerStore[ConnectionBook][p1]
echo myPeerStore[ConnectionBook][p2]

#Output:
#truetrue
#Connected
#

Edit:

@alrevuelta alrevuelta marked this pull request as ready for review November 22, 2022 13:06
Copy link
Contributor

@LNSD LNSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comments.

Looking forward to understanding what your plan for the peer manager is.

tests/v2/test_peer_store_extended.nim Outdated Show resolved Hide resolved
tests/v2/test_peer_store_extended.nim Outdated Show resolved Hide resolved
tests/v2/test_peer_store_extended.nim Outdated Show resolved Hide resolved
tests/v2/test_peer_manager.nim Show resolved Hide resolved
waku/v2/node/jsonrpc/admin_api.nim Outdated Show resolved Hide resolved
waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved
waku/v2/node/peer_manager/waku_peer_store.nim Show resolved Hide resolved
ConnectionBook* = ref object of PeerBook[Connectedness]

DisconnectBook* = ref object of PeerBook[int64] # Keeps track of when peers were disconnected in Unix timestamps
# Keeps track of when peers were disconnected in Unix timestamps
DisconnectBook* = ref object of PeerBook[int64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clearly demonstrates that we need a common time module for timestamps. We should use the same timestamp format.

cc @jm-clius

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 noted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this now a Timestamp (which is currently just an alias for int64) to save some refactoring in future.

waku/v2/node/peer_manager/waku_peer_store.nim Show resolved Hide resolved
Comment on lines +80 to +94
proc peers*(peerStore: PeerStore): seq[StoredInfo] =
## Get all the stored information of every peer.

let allKeys = concat(toSeq(keys(peerStore.addressBook.book)),
toSeq(keys(peerStore.protoBook.book)),
toSeq(keys(peerStore.keyBook.book))).toHashSet()
let allKeys = concat(toSeq(peerStore[AddressBook].book.keys()),
toSeq(peerStore[ProtoBook].book.keys()),
toSeq(peerStore[KeyBook].book.keys())).toHashSet()

return allKeys.mapIt(peerStore.get(it))

proc peers*(peerStore: PeerStore, proto: string): seq[StoredInfo] =
# Return the known info for all peers registered on the specified protocol
peerStore.peers.filterIt(it.protos.contains(proto))

proc peers*(peerStore: PeerStore, protocolMatcher: Matcher): seq[StoredInfo] =
# Return the known info for all peers matching the provided protocolMatcher
peerStore.peers.filterIt(it.protos.anyIt(protocolMatcher(it)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proc's name is a "noun". We use nouns for accessing properties (e.g., a property that we did not make public because we don't want to allow modifying it) or "dynamic properties" (virtual properties that are derived from other properties). These are functions as they have parameters. Please, consider renaming the methods to something like getPeersByProtocol or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 getPeersByProtocol. this proc already existed, so just reused the name. will followup with a pr with renamings once the heavy stuff is done.

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks. May be worth doing some log analyses after this gets deployed to the fleets (should now be auto-deployed to both wakuv2.test and status.test) to double check that e.g. reconnection works as before.

waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved
waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved
return some(peerStored.toRemotePeerInfo())
else:
return none(RemotePeerInfo)

proc reconnectPeers*(pm: PeerManager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

Absolutely. I think, though, this PR will change the current behaviour, unless I'm missing something. Previously on startup a peer would reconnect only to peers that were statically added or connected via DNS discovery. Now all peers will both be persisted and a reconnection attempted.

Changed my thinking halfway through typing this: the peers we persist will still only be the ones added via addPeers() (i.e. the ones we ourselves statically configured, discovered via DNS, etc) and not include all peers in the peer store (which would include incoming connections too)? My worry was that we'll now keep persisting all these peers and reattempt connecting to all of them forever (since the peer storage is never truncated/pruned as it stands).

ConnectionBook* = ref object of PeerBook[Connectedness]

DisconnectBook* = ref object of PeerBook[int64] # Keeps track of when peers were disconnected in Unix timestamps
# Keeps track of when peers were disconnected in Unix timestamps
DisconnectBook* = ref object of PeerBook[int64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this now a Timestamp (which is currently just an alias for int64) to save some refactoring in future.

@alrevuelta alrevuelta merged commit 43fd11b into master Nov 24, 2022
@alrevuelta alrevuelta deleted the reuse-nlp2p-peerstore-1 branch November 24, 2022 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Waku Peer Store: Integrated libp2p peer store
6 participants