-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add babylon-related metrics #158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Those are all very useful metrics.
Note that we should not alarm when the number of failed checkpoints submitted to Babylon surpasses 0
. One can insert a bad checkpoint on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement! Just some minor comments
metrics/monitor.go
Outdated
Registry: registry, | ||
ValidEpochsCounter: registerer.NewCounter(prometheus.CounterOpts{ | ||
Name: "vigilante_monitor_valid_epochs", | ||
Help: "The total number of verified epochs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help: "The total number of verified epochs", | |
Help: "The total number of valid epochs", |
Name: "vigilante_monitor_invalid_epochs", | ||
Help: "The total number of invalid BTC headers", | ||
}), | ||
LivenessAttacksCounter: registerer.NewCounter(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the monitor will just panick upon a liveness attack, do we need this metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to panic the monitor or the validator but send an alert instead and let the operator decide how to deal with this.
ValidEpochsCounter prometheus.Counter | ||
InvalidEpochsCounter prometheus.Counter | ||
ValidBTCHeadersCounter prometheus.Counter | ||
InvalidBTCHeadersCounter prometheus.Counter | ||
LivenessAttacksCounter prometheus.Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be pointers or objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are interfaces, so it depends on whether the actual values are pointers or objects.
SuccessfulHeadersCounter prometheus.Counter | ||
SuccessfulCheckpointsCounter prometheus.Counter | ||
FailedHeadersCounter prometheus.Counter | ||
FailedCheckpointsCounter prometheus.Counter | ||
SecondsSinceLastHeaderGauge prometheus.Gauge | ||
SecondsSinceLastCheckpointGauge prometheus.Gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here
SuccessfulCheckpointsCounter prometheus.Counter | ||
FailedCheckpointsCounter prometheus.Counter | ||
SecondsSinceLastCheckpointGauge prometheus.Gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same ques here
metrics/reporter.go
Outdated
func (sm *ReporterMetrics) RecordMetrics() { | ||
go func() { | ||
for { | ||
time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use ticker here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to use ticker here. Btw, why can't we use sleep here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It basically achieves the same functionality as sleeping here, but a more formal way.
Something like below
ticker := time.NewTicker(1 * time.Second)
for range ticker.C {
// do our stuff
}
You can find multiple usages of this in vigilante repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It's good to know. Thanks a lot!
sm.SecondsSinceLastHeaderGauge.Inc() | ||
sm.SecondsSinceLastCheckpointGauge.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the two gauges will never decrease, maybe they can be counters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will be reset when the last header/checkpoint is successfully sent. For example, see this
This PR adds Babylon-related metrics for monitoring purposes. The metrics and alarming rules are described as follows. cc @philmln.