Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CPU usage in small devnets #14446

Merged
merged 5 commits into from
Sep 13, 2024
Merged

Fix CPU usage in small devnets #14446

merged 5 commits into from
Sep 13, 2024

Conversation

nalepae
Copy link
Contributor

@nalepae nalepae commented Sep 12, 2024

Please read commit by commit.

Avoid infinite loop (with a whole CPU usage) when looking for peer in a small subnets.

Reminder: This former PR stopped using mainnet bootnodes in devnet. When doing this, an issue appeared on small devnets: We entered in an infinite loop when looking for peers in a given subnet. This former PR fixed this issue.

However, even with this former PR, we can still enter in an infinite loop when looking for active peers (in general, not in a given subnet.)

The current PR fixes this last issue.

In term of CPU usage, we switch from that (associated CPU profiling):

image

to that:

image

(Prysm super node: Red - Prysm full node: Blue)

E2E local test: OK

@nalepae nalepae changed the title Peerdas look for peers cpy Fix CPU usage in small devnets Sep 12, 2024
@nalepae nalepae force-pushed the peerdas-look-for-peers-cpy branch from 7e5be76 to a123ab0 Compare September 12, 2024 11:16
@nalepae nalepae force-pushed the peerdas-look-for-peers-cpy branch from a123ab0 to 330d07f Compare September 12, 2024 11:19
@nalepae nalepae marked this pull request as ready for review September 12, 2024 11:41
@nalepae nalepae requested a review from a team as a code owner September 12, 2024 11:41
@nalepae nalepae requested review from terencechain, nisdas and james-prysm and removed request for a team September 12, 2024 11:41
continue
}

if searchInProgress {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the searchInProgress boolean ? it seems like its only used for logging. Do we really need to differentiate it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

searchInProgress is only used for logging, yes.

Do we really need to differentiate it ?
Differentiate what and what? start and continue?

If yes, we do not strictly need to differentiate start and continue.
However, we need to differentiate start and stop.
This kind of boolean is needed for that. (Maybe you have a a solution without?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My specifications are:

  • If we are starting / continuing / finishing a search for active peers, I want to know it in debug
  • If we are starting / continuing / finishing a search for peers for a given subnets, I want to know it in debug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, listenForNewNodes isn't used for searching peers on our subnets. It is a background discovery routine that we use for general peer discovery. If you want to log for a subnet search it needs to be in FindPeersInSubnet

If we are starting / continuing / finishing a search for active peers, I want to know it in debug

Is it sufficient to just know when you are still 'searching' for new peers ? You don't need the boolean for that. I don't see the value in tracking starting because it only happens once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, listenForNewNodes isn't used for searching peers on our subnets. It is a background discovery routine that we use for general peer discovery. If you want to log for a subnet search it needs to be in FindPeersInSubnet

Yes, there is different logging for that. For a dedicated subnet we have:

DEBUG p2p: Searching for new peers for a subnet - success currentPeerCount=1 targetPeerCount=1 topic=/eth2/edf8b306/data_column_sidecar_101/ssz_snappy

Is it sufficient to just know when you are still 'searching' for new peers ? You don't need the boolean for that. I don't see the value in tracking starting because it only happens once.

It only happens once (in the BN lifecycle) in the best case. If, during the life of the BN, the peer counts goes bellow the threshold, then start happens again.

==> Fixed in fa4f693

@@ -43,6 +43,10 @@ var _ runtime.Service = (*Service)(nil)
// defined below.
var pollingPeriod = 6 * time.Second

// When looking for new nodes, if not enough nodes are found,
// we stop after this amount of iterations.
var batchSize = 40_000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40000 seems like a pretty big amount, was 2000 too small previously ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I moved from 2.000 to 40.000 to avoid too frequent logging.

Fixed in 7f70bec.

@nalepae nalepae merged commit 56df2d0 into peerDAS Sep 13, 2024
13 of 16 checks passed
@nalepae nalepae deleted the peerdas-look-for-peers-cpy branch September 13, 2024 10:27
nalepae added a commit that referenced this pull request Oct 7, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 7, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 7, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 7, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 8, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 23, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 23, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 23, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Oct 24, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Nov 20, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Nov 22, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Nov 22, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Nov 25, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Nov 25, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
nalepae added a commit that referenced this pull request Nov 27, 2024
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope.

* `FindPeersWithSubnet`: Improve logging.

* `listenForNewNodes`: Avoid infinite loop in a small subnet.

* Address Nishant's comment.

* FIx Nishant's comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants