-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CPU usage in small devnets #14446
Conversation
7e5be76
to
a123ab0
Compare
a123ab0
to
330d07f
Compare
beacon-chain/p2p/discovery.go
Outdated
continue | ||
} | ||
|
||
if searchInProgress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the searchInProgress
boolean ? it seems like its only used for logging. Do we really need to differentiate it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
searchInProgress
is only used for logging, yes.
Do we really need to differentiate it ?
Differentiate what and what?start
andcontinue
?
If yes, we do not strictly need to differentiate start
and continue
.
However, we need to differentiate start
and stop
.
This kind of boolean is needed for that. (Maybe you have a a solution without?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My specifications are:
- If we are starting / continuing / finishing a search for active peers, I want to know it in debug
- If we are starting / continuing / finishing a search for peers for a given subnets, I want to know it in debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, listenForNewNodes
isn't used for searching peers on our subnets. It is a background discovery routine that we use for general peer discovery. If you want to log for a subnet search it needs to be in FindPeersInSubnet
If we are starting / continuing / finishing a search for active peers, I want to know it in debug
Is it sufficient to just know when you are still 'searching' for new peers ? You don't need the boolean for that. I don't see the value in tracking starting
because it only happens once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify,
listenForNewNodes
isn't used for searching peers on our subnets. It is a background discovery routine that we use for general peer discovery. If you want to log for a subnet search it needs to be inFindPeersInSubnet
Yes, there is different logging for that. For a dedicated subnet we have:
DEBUG p2p: Searching for new peers for a subnet - success currentPeerCount=1 targetPeerCount=1 topic=/eth2/edf8b306/data_column_sidecar_101/ssz_snappy
Is it sufficient to just know when you are still 'searching' for new peers ? You don't need the boolean for that. I don't see the value in tracking starting because it only happens once.
It only happens once (in the BN lifecycle) in the best case. If, during the life of the BN, the peer counts goes bellow the threshold, then start
happens again.
==> Fixed in fa4f693
beacon-chain/p2p/service.go
Outdated
@@ -43,6 +43,10 @@ var _ runtime.Service = (*Service)(nil) | |||
// defined below. | |||
var pollingPeriod = 6 * time.Second | |||
|
|||
// When looking for new nodes, if not enough nodes are found, | |||
// we stop after this amount of iterations. | |||
var batchSize = 40_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40000 seems like a pretty big amount, was 2000 too small previously ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I moved from 2.000 to 40.000 to avoid too frequent logging.
Fixed in 7f70bec.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
Please read commit by commit.
Avoid infinite loop (with a whole CPU usage) when looking for peer in a small subnets.
Reminder: This former PR stopped using mainnet bootnodes in devnet. When doing this, an issue appeared on small devnets: We entered in an infinite loop when looking for peers in a given subnet. This former PR fixed this issue.
However, even with this former PR, we can still enter in an infinite loop when looking for active peers (in general, not in a given subnet.)
The current PR fixes this last issue.
In term of CPU usage, we switch from that (associated CPU profiling):
to that:
(Prysm super node: Red - Prysm full node: Blue)
E2E local test: OK