Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Export number of expected chunks/blocks in epoch to prometheus #8759

Merged
merged 2 commits into from
Apr 24, 2023

Conversation

VanBarbascu
Copy link
Contributor

@VanBarbascu VanBarbascu commented Mar 20, 2023

The producer to block/chunk assignment is known ahead of time so we can
estimate what we expect to see at the end of the epoch.

We are computing these numbers only once per epoch.

Tested on mainnet and localnet.
On localnet:

  • started 4 nodes
  • stopped one of them for a delta of 200 blocks
  • started it again

@VanBarbascu VanBarbascu requested a review from a team as a code owner March 20, 2023 18:27
@VanBarbascu VanBarbascu requested a review from mzhangmzz March 20, 2023 18:27
@VanBarbascu VanBarbascu marked this pull request as draft March 20, 2023 18:28
@VanBarbascu
Copy link
Contributor Author

VanBarbascu commented Mar 20, 2023

only focus on the last commit. The first two will be merged in this PR.

  • update changelog

@VanBarbascu VanBarbascu force-pushed the prometheus_metrics branch 2 times, most recently from 5740721 to 41a5d13 Compare March 21, 2023 15:28
@VanBarbascu VanBarbascu marked this pull request as ready for review March 21, 2023 15:30
.map_or(0..0, |epoch_start_height| {
epoch_start_height..(epoch_start_height + blocks_in_epoch)
})
.fold(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be easier to read if we just change it to a regular for loop; the fold is only used to iterate over the indexes, not to, say, fold some function over the actual elements in the container.

@@ -177,6 +181,45 @@ impl InfoHelper {
}
}

fn record_epoch_settlement_info(head: &Tip, client: &crate::client::Client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how long this takes? For 43200 heights in an epoch I just wanna make sure this won't take too long and make validators miss block/chunk production, since I assume this is run on ClientActor.

Copy link
Contributor Author

@VanBarbascu VanBarbascu Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran perf on the RPC node and the function does not show up in the graph. I even looked for log_summary call. I believe they are negligible.

The function is also called once per epoch to avoid unnecessary computation.

Check the CPU flamegraph below.
https://drive.google.com/file/d/1x7HSYTFWBZOgHlh1g7_3ev5r1IPsIT-L/view?usp=sharing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well my concern isn't about the overall performance impact on CPU. My concern is about exactly that one time latency on epoch transitions. Validators are very keen on not missing even a single chunk or block, so if this introduces say 100ms delay then this may cause that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a localnet with 4 nodes to run perf during the epoch transition. The results are here but I cannot see a difference.

Let me know if you have any suggestion on how to test this more accurately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here to avoid overloading the ClientActor is to compute this metric async when a new epoch begins. The drawback here is that a lot of stuff happens at epoch start as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I am not sure how long something runs, I run it 1000 times. Just loop your function 1000 times, run your tests like that, see the huge delay and estimate the real delay by dividing by 1000.

@robin-near robin-near requested review from robin-near and removed request for mzhangmzz March 22, 2023 19:26
@robin-near
Copy link
Contributor

Looks like sample_chunk_producer hashes 48 bytes and then uses that to index into an array. The bottleneck would seem to be the hash rate on 48 byte inputs. If we can benchmark/calculate how long that takes on a typical machine then that'd be enough of a justification.

@VanBarbascu
Copy link
Contributor Author

I measured the time spent in the sample functions for chunk and blocks.
My node is tracking only one shard in mainnet.
The time is showed in microseconds.

Setup:

  • Code: 129cec3
  • Host: n2-standard-8

Result:

The call costs 1.3 ms for a 43200 iteration
In the current setup it potentially adds ~6 ms. (4 shards + 1 iteration for block producers)

Mar 28 11:45:28 razvan-mainnet-rpc sh[1277299]: 2023-03-28T11:45:28.304034Z  INFO TIME: Time spent in sample_chunk_producer total:1310 count:43200 avg:0
Mar 28 11:45:28 razvan-mainnet-rpc sh[1277299]: 2023-03-28T11:45:28.304050Z  INFO TIME: Time spent in sample_block_producer total:1204 count:43200 avg:0

robin-near
robin-near previously approved these changes Mar 28, 2023
Copy link
Contributor

@robin-near robin-near left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the benchmark! Seems fine for now, but when we scale to 100 shards this will become 25x more expensive and then it may be a potential issue.

@VanBarbascu
Copy link
Contributor Author

Bad news... I ran the test again because my node was out of date and the result does not look good.
It ran over all 4 shards and in total it takes just under 100ms.

Mar 28 16:55:25 razvan-mainnet-rpc sh[1299785]: 2023-03-28T16:55:25.220848Z  INFO TIME: Time spent in sample_chunk_producer total:74805 count:172800 avg:0
Mar 28 16:55:25 razvan-mainnet-rpc sh[1299785]: 2023-03-28T16:55:25.220874Z  INFO TIME: Time spent in sample_block_producer total:18605 count:43200 avg:0

@VanBarbascu VanBarbascu requested a review from robin-near March 28, 2023 17:11
@robin-near
Copy link
Contributor

I see, that's unfortunate :(

Let's take a step back. Do we really need to have such an exact count? Is it OK to have an approximate? Because we know the exact sampling weights so we can calculate the expected ("expected" as in probability theory) number of blocks or chunks.

@VanBarbascu VanBarbascu dismissed robin-near’s stale review March 30, 2023 10:57

The benchmark that you based your approve on was not accurate.

@VanBarbascu
Copy link
Contributor Author

I got some time to get back to this.
I used the stake of each validator divided by the sum of stake to calculate the weight and the results are different from the alias sampling. This is expected as the alias sampling is using the rand function.

In this paste you can see the comparison.
near_validators_chunks_expected_in_epoch is the metric using the weights / sum
near_validators_chunks_expected_in_epoh_2 the metric using the alias sampler for each block/chunk.

If we chose to go with the approximation, @nikurt are these values going to be helpful or misleading?

This is the code from my fork that I used to generate the metrics.

@VanBarbascu VanBarbascu marked this pull request as draft April 19, 2023 09:50
@nikurt
Copy link
Contributor

nikurt commented Apr 21, 2023

In the provided sample I see the approximations being off by 40% (6 expected instead of 10 expected).
However, the exact calculation isn't exact as well. For example, if the final epoch block gets skipped, one more block will be expected from a validator.

I vote for the approximation.

@VanBarbascu
Copy link
Contributor Author

Thanks @nikurt for the feedback. Just updated the code. This is the latest output from the metrics page link.

@VanBarbascu VanBarbascu marked this pull request as ready for review April 21, 2023 14:52
@@ -243,6 +284,11 @@ impl InfoHelper {
InfoHelper::record_tracked_shards(&head, &client);
InfoHelper::record_block_producers(&head, &client);
InfoHelper::record_chunk_producers(&head, &client);
if self.epoch_id.as_ref().map_or(true, |epoch_id| epoch_id != &head.epoch_id) {
Copy link
Contributor

@nikurt nikurt Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.epoch_id.as_ref().map_or(true, |epoch_id| epoch_id != &head.epoch_id) {
if self.epoch_id.ne(&head.epoch_id) {

let blocks_in_epoch = client.config.epoch_length;
let number_of_shards =
client.runtime_adapter.num_shards(&head.epoch_id).unwrap_or_default();
if let Ok(epoch_info) = epoch_info {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments that this is an approximation and how it's computed.

The producer to block/chunk assignment is known ahead of time so we can
estimate what we expect to see at the end of the epoch.

We are computing these numbers only once per epoch.
@near-bulldozer near-bulldozer bot merged commit e7575ae into near:master Apr 24, 2023
@VanBarbascu VanBarbascu deleted the prometheus_metrics branch April 25, 2023 08:47
nikurt pushed a commit that referenced this pull request Apr 25, 2023
…8759)

The producer to block/chunk assignment is known ahead of time so we can
estimate what we expect to see at the end of the epoch.

We are computing these numbers only once per epoch.

Tested on mainnet and localnet. 
On localnet:
- started 4 nodes
- stopped one of them for a delta of 200 blocks
- started it again
nikurt pushed a commit that referenced this pull request Apr 28, 2023
…8759)

The producer to block/chunk assignment is known ahead of time so we can
estimate what we expect to see at the end of the epoch.

We are computing these numbers only once per epoch.

Tested on mainnet and localnet. 
On localnet:
- started 4 nodes
- stopped one of them for a delta of 200 blocks
- started it again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants