refactor: reuse nim-libp2p peerstore + move peermanager logic #1383

alrevuelta · 2022-11-15T12:54:02Z

Closes #622

Reuse nim-libp2p peerstore and remove duplicated fields.
Move all logic related to accessing peers, retrieving peers with specific conditions (like supported protocol) from peer_manager to the peer store.
Add new cli flag to configure the number of peers stored by the nim-libp2p peerstore.
Add new field tracking peer origin (discv5, static, dns). Unused.
Add test coverage in the peermanager, since its being modified and some functions were not unit tested.

status-im-auto · 2022-11-15T13:03:08Z

Jenkins Builds

Click to see older builds (6)

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
❌	`7a94032`	#1	2022-11-15 13:03:07	~8 min	`linux`	📄`log`
❌	`7a94032`	#1	2022-11-15 13:07:05	~12 min	`macos`	📄`log`

❌	`bb92436`	#2	2022-11-22 09:39:31	~7 min	`linux`	📄`log`
❌	`bb92436`	#2	2022-11-22 09:40:01	~8 min	`macos`	📄`log`

❌	`165f36f`	#3	2022-11-22 13:05:56	~7 min	`linux`	📄`log`
❌	`165f36f`	#3	2022-11-22 13:11:52	~13 min	`macos`	📄`log`

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
✔️	`d1a1d12`	#4	2022-11-22 13:32:55	~14 min	`macos`	📦`bin`
✔️	`d1a1d12`	#4	2022-11-22 13:33:55	~15 min	`linux`	📦`bin`

❌	`779cc03`	#5	2022-11-23 23:04:32	~17 min	`macos`	📄`log`

waku.nimble

jm-clius · 2022-11-17T16:50:29Z

waku/v2/node/peer_manager/peer_manager.nim

-    return some(peerStored.toRemotePeerInfo())
-  else:
-    return none(RemotePeerInfo)
-
 proc reconnectPeers*(pm: PeerManager,


I'm unsure if we should keep this reconnection logic, at least in the medium term: this was added before discovery methods, mainly to reconnect fleet nodes to each other after a restart. The main difference between the previous peer store and the underlying libp2p one, is that the previous peer store only contained peers that we initiated a connection to, whereas the libp2p peer store also contains all nodes connecting to us. This implies that we only attempted to reconnect to peers that we explicitly connected to before (e.g. using the connectToNodes API call). Because GossipSub has a mandatory backoff time before GRAFTing a PRUNEd connection (implicitly then reconnecting after a restart), we had to wait before attempting to reconnect to each node. IIRC this can block for quite a while when the node starts (and maybe much more so now that all nodes will be attempted with reconnection). Now that nodes generally use DNS discovery on boot to reconnect to nodes, perhaps the whole idea of peer persistence and reconnection should be rethought?

Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

perhaps the whole idea of peer persistence and reconnection should be rethought?

Yep. Regarding reconnections, I think reconnectPeers can be rescoped to be runPeeringStrategy (not happy with the name) with the following functionalities:

Infinite loop.

If connectedPeers is below a threshold, try to connect to new peers, from the peerstore.

It respects the backoff (as its done)

Some criteria TBD can be added on which peers to select, like score, keeping a given inbound/outbound ratio, keeping a store/relay/etc nodes ratio, etc.

Forces disconnections if needed.

Its totally agnostic of peer discovery.

And does not add peers to the peer store as done now: pm.addPeer(remotePeerInfo, proto)

Beyond runPeeringStrategy, we will have runPrunningStrategy that will remove peers from the peerstore and runDiscoveryStrategy that will populate the peerstore with new peers. Names TBD.

Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

Absolutely. I think, though, this PR will change the current behaviour, unless I'm missing something. Previously on startup a peer would reconnect only to peers that were statically added or connected via DNS discovery. Now all peers will both be persisted and a reconnection attempted.

Changed my thinking halfway through typing this: the peers we persist will still only be the ones added via addPeers() (i.e. the ones we ourselves statically configured, discovered via DNS, etc) and not include all peers in the peer store (which would include incoming connections too)? My worry was that we'll now keep persisting all these peers and reattempt connecting to all of them forever (since the peer storage is never truncated/pruned as it stands).

we'll now keep persisting all these peers and reattempt connecting to all of them forever

As I can see in kibana we are already attempting to connect to all peers forever, so after checking it I would say the behaviour is the same. But I have this in mind and planning to further investigate it, also related to the issue in the disv5 loop that we discussed. We are constanly discovering peers, connecting to them. But since we don't have that many peers, we are always trying to connect to the same 10-20 nodes.

which would include incoming connections too

Yes! Will address all this in another PR, with a new field indicating Direction in/out. Then that would be used to complete #1206 (showing number of in/out connections)

jm-clius · 2022-11-17T16:50:49Z

waku/v2/node/peer_manager/peer_manager.nim

    # Check if peer is reachable.
-    if pm.peerStore.connectionBook[storedInfo.peerId] == CannotConnect:
+    if pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect: #TODO what if it doesnt exist?


I'm also curious about this TODO. :D

default(T), which is debatable

if storedInfo.peerId in pm.peerStore[ConnectionBook] and pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect: # or a pattern I start to like & we could add, even though it's ugly, {} returning an option if pm.peerStore[ConnectionBook]{storedInfo.peerId} == some(CannotConnect)

I love snippets that can be run nim r file.nim. Looks like it just prints an empty string "".

import libp2p/peerstore import libp2p/peerid let myPeerStore = PeerStore.new(capacity = 5) type ConnectionBook = ref object of PeerBook[string] var p1, p2: PeerId let ok1 = p1.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW1") let ok2 = p2.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW2") echo ok1, ok2 myPeerStore[ConnectionBook][p1] = "Connected" echo myPeerStore[ConnectionBook][p1] echo myPeerStore[ConnectionBook][p2] #Output: #truetrue #Connected #

Edit:

waku/v2/node/peer_manager/peer_manager.nim

jm-clius

In general I think the direction looks good!
Two things I think to consider:

the subtle differences of peer persistence and reconnection (with backoff times) now that we consider all connected peers (incoming and outgoing) vs the previous peer store where we had more control (only peers we initiated connection to added to the store).
keeping the PRs related to this as contained as possible, so we can review and increment to the complete feature. Probably good idea to keep the existing API as stable as possible until we have a design for the improved peer manager API. Agree though with the moving of peer* API procs to the peerStore rather than the manager.

arnetheduck · 2022-11-18T12:58:44Z

how is application-specific per-peer data stored when using libp2p-based peer manager?

Menduist · 2022-11-18T13:01:21Z

See the "new" peer store that enables to create custom books with custom types:
https://github.com/status-im/nim-libp2p/blob/unstable/libp2p/peerstore.nim#L11-L23

alrevuelta

thanks @jm-clius really appreciate the comments. left some answers :)

waku/v2/node/peer_manager/peer_manager.nim

alrevuelta · 2022-11-18T13:19:19Z

waku/v2/node/peer_manager/peer_manager.nim

-    return some(peerStored.toRemotePeerInfo())
-  else:
-    return none(RemotePeerInfo)
-
 proc reconnectPeers*(pm: PeerManager,


Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

perhaps the whole idea of peer persistence and reconnection should be rethought?

Yep. Regarding reconnections, I think reconnectPeers can be rescoped to be runPeeringStrategy (not happy with the name) with the following functionalities:

Infinite loop.

If connectedPeers is below a threshold, try to connect to new peers, from the peerstore.

It respects the backoff (as its done)

Some criteria TBD can be added on which peers to select, like score, keeping a given inbound/outbound ratio, keeping a store/relay/etc nodes ratio, etc.

Forces disconnections if needed.

Its totally agnostic of peer discovery.

And does not add peers to the peer store as done now: pm.addPeer(remotePeerInfo, proto)

Beyond runPeeringStrategy, we will have runPrunningStrategy that will remove peers from the peerstore and runDiscoveryStrategy that will populate the peerstore with new peers. Names TBD.

alrevuelta · 2022-11-18T13:34:17Z

waku/v2/node/peer_manager/peer_manager.nim

    # Check if peer is reachable.
-    if pm.peerStore.connectionBook[storedInfo.peerId] == CannotConnect:
+    if pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect: #TODO what if it doesnt exist?


I love snippets that can be run nim r file.nim. Looks like it just prints an empty string "".

import libp2p/peerstore import libp2p/peerid let myPeerStore = PeerStore.new(capacity = 5) type ConnectionBook = ref object of PeerBook[string] var p1, p2: PeerId let ok1 = p1.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW1") let ok2 = p2.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW2") echo ok1, ok2 myPeerStore[ConnectionBook][p1] = "Connected" echo myPeerStore[ConnectionBook][p1] echo myPeerStore[ConnectionBook][p2] #Output: #truetrue #Connected #

Edit:

LNSD

Please check my comments.

Looking forward to understanding what your plan for the peer manager is.

tests/v2/test_peer_store_extended.nim

tests/v2/test_peer_manager.nim

waku/v2/node/jsonrpc/admin_api.nim

waku/v2/node/peer_manager/peer_manager.nim

waku/v2/node/peer_manager/waku_peer_store.nim

LNSD · 2022-11-23T11:30:21Z

waku/v2/node/peer_manager/waku_peer_store.nim

  ConnectionBook* = ref object of PeerBook[Connectedness]

-  DisconnectBook* = ref object of PeerBook[int64] # Keeps track of when peers were disconnected in Unix timestamps
+  # Keeps track of when peers were disconnected in Unix timestamps
+  DisconnectBook* = ref object of PeerBook[int64]


This clearly demonstrates that we need a common time module for timestamps. We should use the same timestamp format.

cc @jm-clius

Could we make this now a Timestamp (which is currently just an alias for int64) to save some refactoring in future.

waku/v2/node/peer_manager/waku_peer_store.nim

LNSD · 2022-11-23T11:39:36Z

waku/v2/node/peer_manager/waku_peer_store.nim

+proc peers*(peerStore: PeerStore): seq[StoredInfo] =
  ## Get all the stored information of every peer.
-
-  let allKeys = concat(toSeq(keys(peerStore.addressBook.book)),
-                       toSeq(keys(peerStore.protoBook.book)),
-                       toSeq(keys(peerStore.keyBook.book))).toHashSet()
+  let allKeys = concat(toSeq(peerStore[AddressBook].book.keys()),
+                       toSeq(peerStore[ProtoBook].book.keys()),
+                       toSeq(peerStore[KeyBook].book.keys())).toHashSet()

  return allKeys.mapIt(peerStore.get(it))
+
+proc peers*(peerStore: PeerStore, proto: string): seq[StoredInfo] =
+  # Return the known info for all peers registered on the specified protocol
+  peerStore.peers.filterIt(it.protos.contains(proto))
+
+proc peers*(peerStore: PeerStore, protocolMatcher: Matcher): seq[StoredInfo] =
+  # Return the known info for all peers matching the provided protocolMatcher
+  peerStore.peers.filterIt(it.protos.anyIt(protocolMatcher(it)))


This proc's name is a "noun". We use nouns for accessing properties (e.g., a property that we did not make public because we don't want to allow modifying it) or "dynamic properties" (virtual properties that are derived from other properties). These are functions as they have parameters. Please, consider renaming the methods to something like getPeersByProtocol or something similar.

+1 getPeersByProtocol. this proc already existed, so just reused the name. will followup with a pr with renamings once the heavy stuff is done.

jm-clius

LGTM! Thanks. May be worth doing some log analyses after this gets deployed to the fleets (should now be auto-deployed to both wakuv2.test and status.test) to double check that e.g. reconnection works as before.

waku/v2/node/peer_manager/peer_manager.nim

jm-clius · 2022-11-24T08:29:08Z

waku/v2/node/peer_manager/peer_manager.nim

-    return some(peerStored.toRemotePeerInfo())
-  else:
-    return none(RemotePeerInfo)
-
 proc reconnectPeers*(pm: PeerManager,


Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.

Absolutely. I think, though, this PR will change the current behaviour, unless I'm missing something. Previously on startup a peer would reconnect only to peers that were statically added or connected via DNS discovery. Now all peers will both be persisted and a reconnection attempted.

Changed my thinking halfway through typing this: the peers we persist will still only be the ones added via addPeers() (i.e. the ones we ourselves statically configured, discovered via DNS, etc) and not include all peers in the peer store (which would include incoming connections too)? My worry was that we'll now keep persisting all these peers and reattempt connecting to all of them forever (since the peer storage is never truncated/pruned as it stands).

jm-clius · 2022-11-24T08:32:14Z

waku/v2/node/peer_manager/waku_peer_store.nim

  ConnectionBook* = ref object of PeerBook[Connectedness]

-  DisconnectBook* = ref object of PeerBook[int64] # Keeps track of when peers were disconnected in Unix timestamps
+  # Keeps track of when peers were disconnected in Unix timestamps
+  DisconnectBook* = ref object of PeerBook[int64]


Could we make this now a Timestamp (which is currently just an alias for int64) to save some refactoring in future.

jm-clius reviewed Nov 17, 2022

View reviewed changes

waku.nimble Outdated Show resolved Hide resolved

jm-clius reviewed Nov 17, 2022

View reviewed changes

waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved

jm-clius reviewed Nov 17, 2022

View reviewed changes

waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved

jm-clius reviewed Nov 17, 2022

View reviewed changes

alrevuelta commented Nov 18, 2022

View reviewed changes

alrevuelta force-pushed the reuse-nlp2p-peerstore-1 branch from 7a94032 to bb92436 Compare November 22, 2022 09:31

alrevuelta marked this pull request as ready for review November 22, 2022 13:06

alrevuelta requested review from jm-clius, LNSD and kaiserd November 22, 2022 13:06

LNSD suggested changes Nov 23, 2022

View reviewed changes

LNSD approved these changes Nov 23, 2022

View reviewed changes

alrevuelta force-pushed the reuse-nlp2p-peerstore-1 branch from d1a1d12 to 779cc03 Compare November 23, 2022 15:48

alrevuelta added 6 commits November 24, 2022 07:45

refactor: reuse nim-libp2p peerstore + move peermanager logic

3242c19

refactor: fix comments

f16a72c

refactor: modify reconnectPeers and unittest

6611dc2

feat(apps): new flag for peerStoreCapacity

eb34a8d

fix(examples): fix example2 target

ef9270e

refactor: fix comments

ea4e79e

alrevuelta force-pushed the reuse-nlp2p-peerstore-1 branch from 779cc03 to ea4e79e Compare November 24, 2022 06:45

jm-clius approved these changes Nov 24, 2022

View reviewed changes

alrevuelta merged commit 43fd11b into master Nov 24, 2022

alrevuelta deleted the reuse-nlp2p-peerstore-1 branch November 24, 2022 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: reuse nim-libp2p peerstore + move peermanager logic #1383

refactor: reuse nim-libp2p peerstore + move peermanager logic #1383

alrevuelta commented Nov 15, 2022 •

edited

Loading

status-im-auto commented Nov 15, 2022 •

edited

Loading

jm-clius Nov 17, 2022

alrevuelta Nov 18, 2022

jm-clius Nov 24, 2022

alrevuelta Nov 24, 2022

jm-clius Nov 17, 2022

Menduist Nov 18, 2022 •

edited

Loading

alrevuelta Nov 18, 2022 •

edited

Loading

jm-clius left a comment

arnetheduck commented Nov 18, 2022

Menduist commented Nov 18, 2022 •

edited

Loading

alrevuelta left a comment

alrevuelta Nov 18, 2022

alrevuelta Nov 18, 2022 •

edited

Loading

LNSD left a comment

LNSD Nov 23, 2022

alrevuelta Nov 23, 2022

jm-clius Nov 24, 2022

LNSD Nov 23, 2022

alrevuelta Nov 23, 2022

jm-clius left a comment

jm-clius Nov 24, 2022

jm-clius Nov 24, 2022

refactor: reuse nim-libp2p peerstore + move peermanager logic #1383

refactor: reuse nim-libp2p peerstore + move peermanager logic #1383

Conversation

alrevuelta commented Nov 15, 2022 • edited Loading

status-im-auto commented Nov 15, 2022 • edited Loading

Jenkins Builds

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Menduist Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

alrevuelta Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

jm-clius left a comment

Choose a reason for hiding this comment

arnetheduck commented Nov 18, 2022

Menduist commented Nov 18, 2022 • edited Loading

alrevuelta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrevuelta Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

LNSD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jm-clius left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrevuelta commented Nov 15, 2022 •

edited

Loading

status-im-auto commented Nov 15, 2022 •

edited

Loading

Menduist Nov 18, 2022 •

edited

Loading

alrevuelta Nov 18, 2022 •

edited

Loading

Menduist commented Nov 18, 2022 •

edited

Loading

alrevuelta Nov 18, 2022 •

edited

Loading