-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: reuse nim-libp2p peerstore + move peermanager logic #1383
Conversation
Jenkins BuildsClick to see older builds (6)
|
return some(peerStored.toRemotePeerInfo()) | ||
else: | ||
return none(RemotePeerInfo) | ||
|
||
proc reconnectPeers*(pm: PeerManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if we should keep this reconnection logic, at least in the medium term: this was added before discovery methods, mainly to reconnect fleet nodes to each other after a restart. The main difference between the previous peer store and the underlying libp2p one, is that the previous peer store only contained peers that we initiated a connection to, whereas the libp2p peer store also contains all nodes connecting to us. This implies that we only attempted to reconnect to peers that we explicitly connected to before (e.g. using the connectToNodes
API call). Because GossipSub has a mandatory backoff time before GRAFTing a PRUNEd connection (implicitly then reconnecting after a restart), we had to wait before attempting to reconnect to each node. IIRC this can block for quite a while when the node starts (and maybe much more so now that all nodes will be attempted with reconnection). Now that nodes generally use DNS discovery on boot to reconnect to nodes, perhaps the whole idea of peer persistence and reconnection should be rethought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.
perhaps the whole idea of peer persistence and reconnection should be rethought?
Yep. Regarding reconnections, I think reconnectPeers
can be rescoped to be runPeeringStrategy
(not happy with the name) with the following functionalities:
- Infinite loop.
- If connectedPeers is below a threshold, try to connect to new peers, from the peerstore.
- It respects the backoff (as its done)
- Some criteria TBD can be added on which peers to select, like score, keeping a given inbound/outbound ratio, keeping a store/relay/etc nodes ratio, etc.
- Forces disconnections if needed.
- Its totally agnostic of peer discovery.
- And does not add peers to the peer store as done now:
pm.addPeer(remotePeerInfo, proto)
Beyond runPeeringStrategy
, we will have runPrunningStrategy
that will remove peers from the peerstore and runDiscoveryStrategy
that will populate the peerstore with new peers. Names TBD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.
Absolutely. I think, though, this PR will change the current behaviour, unless I'm missing something. Previously on startup a peer would reconnect only to peers that were statically added or connected via DNS discovery. Now all peers will both be persisted and a reconnection attempted.
Changed my thinking halfway through typing this: the peers we persist will still only be the ones added via addPeers()
(i.e. the ones we ourselves statically configured, discovered via DNS, etc) and not include all peers in the peer store (which would include incoming connections too)? My worry was that we'll now keep persisting all these peers and reattempt connecting to all of them forever (since the peer storage is never truncated/pruned as it stands).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll now keep persisting all these peers and reattempt connecting to all of them forever
As I can see in kibana we are already attempting to connect to all peers forever, so after checking it I would say the behaviour is the same. But I have this in mind and planning to further investigate it, also related to the issue in the disv5 loop that we discussed. We are constanly discovering peers, connecting to them. But since we don't have that many peers, we are always trying to connect to the same 10-20 nodes.
which would include incoming connections too
Yes! Will address all this in another PR, with a new field indicating Direction
in/out. Then that would be used to complete #1206 (showing number of in/out connections)
# Check if peer is reachable. | ||
if pm.peerStore.connectionBook[storedInfo.peerId] == CannotConnect: | ||
if pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect: #TODO what if it doesnt exist? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious about this TODO. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default(T), which is debatable
if storedInfo.peerId in pm.peerStore[ConnectionBook] and pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect:
# or a pattern I start to like & we could add, even though it's ugly, {} returning an option
if pm.peerStore[ConnectionBook]{storedInfo.peerId} == some(CannotConnect)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love snippets that can be run nim r file.nim
. Looks like it just prints an empty string "".
import libp2p/peerstore
import libp2p/peerid
let myPeerStore = PeerStore.new(capacity = 5)
type ConnectionBook = ref object of PeerBook[string]
var p1, p2: PeerId
let ok1 = p1.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW1")
let ok2 = p2.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW2")
echo ok1, ok2
myPeerStore[ConnectionBook][p1] = "Connected"
echo myPeerStore[ConnectionBook][p1]
echo myPeerStore[ConnectionBook][p2]
#Output:
#truetrue
#Connected
#
Edit:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think the direction looks good!
Two things I think to consider:
- the subtle differences of peer persistence and reconnection (with backoff times) now that we consider all connected peers (incoming and outgoing) vs the previous peer store where we had more control (only peers we initiated connection to added to the store).
- keeping the PRs related to this as contained as possible, so we can review and increment to the complete feature. Probably good idea to keep the existing API as stable as possible until we have a design for the improved peer manager API. Agree though with the moving of
peer*
API procs to thepeerStore
rather than the manager.
how is application-specific per-peer data stored when using libp2p-based peer manager? |
See the "new" peer store that enables to create custom books with custom types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jm-clius really appreciate the comments. left some answers :)
return some(peerStored.toRemotePeerInfo()) | ||
else: | ||
return none(RemotePeerInfo) | ||
|
||
proc reconnectPeers*(pm: PeerManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.
perhaps the whole idea of peer persistence and reconnection should be rethought?
Yep. Regarding reconnections, I think reconnectPeers
can be rescoped to be runPeeringStrategy
(not happy with the name) with the following functionalities:
- Infinite loop.
- If connectedPeers is below a threshold, try to connect to new peers, from the peerstore.
- It respects the backoff (as its done)
- Some criteria TBD can be added on which peers to select, like score, keeping a given inbound/outbound ratio, keeping a store/relay/etc nodes ratio, etc.
- Forces disconnections if needed.
- Its totally agnostic of peer discovery.
- And does not add peers to the peer store as done now:
pm.addPeer(remotePeerInfo, proto)
Beyond runPeeringStrategy
, we will have runPrunningStrategy
that will remove peers from the peerstore and runDiscoveryStrategy
that will populate the peerstore with new peers. Names TBD.
# Check if peer is reachable. | ||
if pm.peerStore.connectionBook[storedInfo.peerId] == CannotConnect: | ||
if pm.peerStore[ConnectionBook][storedInfo.peerId] == CannotConnect: #TODO what if it doesnt exist? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love snippets that can be run nim r file.nim
. Looks like it just prints an empty string "".
import libp2p/peerstore
import libp2p/peerid
let myPeerStore = PeerStore.new(capacity = 5)
type ConnectionBook = ref object of PeerBook[string]
var p1, p2: PeerId
let ok1 = p1.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW1")
let ok2 = p2.init("QmeuZJbXrszW2jdT7GdduSjQskPU3S7vvGWKtKgDfkDvW2")
echo ok1, ok2
myPeerStore[ConnectionBook][p1] = "Connected"
echo myPeerStore[ConnectionBook][p1]
echo myPeerStore[ConnectionBook][p2]
#Output:
#truetrue
#Connected
#
Edit:
7a94032
to
bb92436
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my comments.
Looking forward to understanding what your plan for the peer manager is.
ConnectionBook* = ref object of PeerBook[Connectedness] | ||
|
||
DisconnectBook* = ref object of PeerBook[int64] # Keeps track of when peers were disconnected in Unix timestamps | ||
# Keeps track of when peers were disconnected in Unix timestamps | ||
DisconnectBook* = ref object of PeerBook[int64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clearly demonstrates that we need a common time module for timestamps. We should use the same timestamp format.
cc @jm-clius
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 noted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this now a Timestamp
(which is currently just an alias for int64
) to save some refactoring in future.
proc peers*(peerStore: PeerStore): seq[StoredInfo] = | ||
## Get all the stored information of every peer. | ||
|
||
let allKeys = concat(toSeq(keys(peerStore.addressBook.book)), | ||
toSeq(keys(peerStore.protoBook.book)), | ||
toSeq(keys(peerStore.keyBook.book))).toHashSet() | ||
let allKeys = concat(toSeq(peerStore[AddressBook].book.keys()), | ||
toSeq(peerStore[ProtoBook].book.keys()), | ||
toSeq(peerStore[KeyBook].book.keys())).toHashSet() | ||
|
||
return allKeys.mapIt(peerStore.get(it)) | ||
|
||
proc peers*(peerStore: PeerStore, proto: string): seq[StoredInfo] = | ||
# Return the known info for all peers registered on the specified protocol | ||
peerStore.peers.filterIt(it.protos.contains(proto)) | ||
|
||
proc peers*(peerStore: PeerStore, protocolMatcher: Matcher): seq[StoredInfo] = | ||
# Return the known info for all peers matching the provided protocolMatcher | ||
peerStore.peers.filterIt(it.protos.anyIt(protocolMatcher(it))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proc's name is a "noun". We use nouns for accessing properties (e.g., a property that we did not make public because we don't want to allow modifying it) or "dynamic properties" (virtual properties that are derived from other properties). These are functions as they have parameters. Please, consider renaming the methods to something like getPeersByProtocol
or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 getPeersByProtocol
. this proc already existed, so just reused the name. will followup with a pr with renamings once the heavy stuff is done.
d1a1d12
to
779cc03
Compare
779cc03
to
ea4e79e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks. May be worth doing some log analyses after this gets deployed to the fleets (should now be auto-deployed to both wakuv2.test
and status.test
) to double check that e.g. reconnection works as before.
return some(peerStored.toRemotePeerInfo()) | ||
else: | ||
return none(RemotePeerInfo) | ||
|
||
proc reconnectPeers*(pm: PeerManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If that's fine I will just change what is needed to not break anything and leave this for another PR.
Absolutely. I think, though, this PR will change the current behaviour, unless I'm missing something. Previously on startup a peer would reconnect only to peers that were statically added or connected via DNS discovery. Now all peers will both be persisted and a reconnection attempted.
Changed my thinking halfway through typing this: the peers we persist will still only be the ones added via addPeers()
(i.e. the ones we ourselves statically configured, discovered via DNS, etc) and not include all peers in the peer store (which would include incoming connections too)? My worry was that we'll now keep persisting all these peers and reattempt connecting to all of them forever (since the peer storage is never truncated/pruned as it stands).
ConnectionBook* = ref object of PeerBook[Connectedness] | ||
|
||
DisconnectBook* = ref object of PeerBook[int64] # Keeps track of when peers were disconnected in Unix timestamps | ||
# Keeps track of when peers were disconnected in Unix timestamps | ||
DisconnectBook* = ref object of PeerBook[int64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this now a Timestamp
(which is currently just an alias for int64
) to save some refactoring in future.
Closes #622
nim-libp2p
peerstore and remove duplicated fields.peer_manager
to the peer store.