-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add histogram metrics for block arrival time #5708
Conversation
Why not use a histogram metric? https://prometheus.io/docs/concepts/metric_types/ |
I didn't know about histogram. Will look into this |
Codecov Report
@@ Coverage Diff @@
## master #5708 +/- ##
=======================================
Coverage 48.24% 48.24%
=======================================
Files 241 241
Lines 21656 21656
=======================================
Hits 10449 10449
Misses 9334 9334
Partials 1873 1873 |
beacon-chain/sync/metrics.go
Outdated
prometheus.HistogramOpts{ | ||
Name: "block_arrival_latency_milliseconds", | ||
Help: "Captures blocks propagation time. Blocks arrival in milliseconds distribution", | ||
Buckets: []float64{1, 2, 3, 4, 5, 6}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to update these buckets to be milliseconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. Nice catch. I just fixed it.
Really need ☕ ....
great suggestion! |
} | ||
diff := uint64(roughtime.Now().UnixNano() - startTime.UnixNano()) | ||
|
||
// Denominate everything in milliseconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would actually be microseconds. If you use the time API as suggested above, it’s more clear
1 microsecond is 1000ns.
if err != nil { | ||
return err | ||
} | ||
diff := uint64(roughtime.Now().UnixNano() - startTime.UnixNano()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the time API?
diffMs := startTime.Sub(roughtime.Now()) / time.Milliseconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh nice. Getting a crash course on time pkg today...
I think it's suppose to be roughtime.Now().Sub(startTime)
if err != nil { | ||
return err | ||
} | ||
diffMs := roughtime.Now().Sub(startTime) / time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is backwards and you’ll have negative numbers. Did you confirm at runtime?
LGTM, tested locally and here's what I got after a few slots.
|
Oh wow. You don't get many delayed blocks compare to mine... blocks_arrival_in_seconds_total_bucket{le="1"} 3 |
Looking forward to monitoring over time, but yeah I see most blocks arriving in the first 2 buckets |
What type of PR is this?
What does this PR do? Why is it needed?
Pre read: https://hackmd.io/dVbmIMHNQ6aby77g0-ME8A
This PR adds useful metrics to capture beacon blocks propagation time on arrival. It only captures once per slot so not too bad.
I think this will be really helpful to roll out to cluster for observing network stability
Which issues(s) does this PR fix?
Helped to debug #5665