-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/tracker: implement analyse cluster participation #846
Conversation
2650393
to
417af6a
Compare
Codecov Report
@@ Coverage Diff @@
## main #846 +/- ##
==========================================
- Coverage 55.30% 54.85% -0.46%
==========================================
Files 111 112 +1
Lines 11539 11666 +127
==========================================
+ Hits 6382 6399 +17
- Misses 4240 4328 +88
- Partials 917 939 +22
Continue to review full report at Codecov.
|
"github.com/prometheus/client_golang/prometheus/promauto" | ||
) | ||
|
||
var participationGauge = promauto.NewGaugeVec(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace: core
Subsystem: tracker
Name: duty_participation (or just participation)
Align label to "duty", "peer" and "pubkey".
core/tracker/tracker.go
Outdated
@@ -42,11 +46,15 @@ const ( | |||
sentinel | |||
) | |||
|
|||
// shareIdx represents a peer's share index. | |||
type shareIdx int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather not introduce this type, naming var shareIdx int
is what we use through the rest of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i introduced this type to avoid any confusion with int and shareIdx since shareIdx
is used in this file very extensively
core/tracker/tracker.go
Outdated
} | ||
|
||
// newParticipationReporter returns a new participation reporter function which logs and instruments peer participation | ||
func newParticipationReporter(peers []p2p.Peer) func(context.Context, core.Duty, map[core.PubKey]map[shareIdx]bool, map[core.PubKey]map[shareIdx]bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internalise state my keeping last reported state inside this function, do not require it to kept outside
core/tracker/tracker.go
Outdated
var absentPeers []string | ||
for _, peer := range peers { | ||
// Peer index is 0 indexed while shareIdx is 1 indexed. | ||
if dvPeers[shareIdx(peer.Index+1)] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a ShareIdx
method to Peer
maybe
core/tracker/tracker.go
Outdated
} | ||
|
||
// Avoid spamming from identical logs. | ||
if len(absentPeers) > 0 && !reflect.DeepEqual(currentParticipation[pubKey], lastParticipation[pubKey]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best not use reflect in production code.
Also you should also log if some peers didn't participate last time, and all participated this time, e.g. "all peers participated".
Note this is still going to spam the logs if we have 1000 validators in a cluster.
core/tracker/tracker.go
Outdated
// event represents an event emitted by a core workflow component. | ||
type event struct { | ||
duty core.Duty | ||
component component | ||
pubkey core.PubKey | ||
shareIdx shareIdx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a note that this is optional field only set for some component events and list them.
core/tracker/tracker.go
Outdated
eventsPerDV[e.pubkey] = append(eventsPerDV[e.pubkey], e) | ||
} | ||
|
||
for pubKey, dvEvents := range eventsPerDV { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about including pubkey in the whole analysis. Since we group pubkeys by duty, either all the pubkeys will be there that should be there for the peer or none will be there for the peer. It doesn't add any value. I would say we should exclude pubkey from this. Just do duty and peer, not pubkey.
core/tracker/metrics.go
Outdated
Namespace: "core", | ||
Subsystem: "tracker", | ||
Name: "participation", | ||
Help: "Set to 1 if peer participated successfully for the given duty and Distributed Validator public key or else 0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove pubkey stuff from help
core/tracker/tracker.go
Outdated
@@ -47,6 +50,7 @@ type event struct { | |||
duty core.Duty | |||
component component | |||
pubkey core.PubKey | |||
shareIdx int // This is an optional field only set by parSigEx and parSigDBInternal events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and validatorapi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe add: ShareIdx is 1 indexed, so 0 indicates unset.
@@ -46,6 +46,11 @@ type Peer struct { | |||
Name string | |||
} | |||
|
|||
// ShareIdx returns share index of this Peer. | |||
func (p Peer) ShareIdx() int { | |||
return p.Index + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment about 1-indexed
core/tracker/tracker.go
Outdated
t.failedDutyReporter(duty, failed, failedComponent.String(), failedMsg) | ||
|
||
// TODO(dhruv): Case of cluster participation | ||
// analyseParticipation(duty, t.events[duty]) | ||
currentParticipation := analyseParticipation(t.events[duty]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: participatedShares := ...
core/tracker/tracker.go
Outdated
for _, e := range events { | ||
// If we get a parSigDBInternal event, then the current node participated successfully. | ||
// If we get a parSigEx event, then the corresponding peer with e.shareIdx participated successfully. | ||
if e.shareIdx > 0 && (e.component == parSigEx || e.component == parSigDBInternal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather be defensive, error if shareIdx 0 for expected components.
core/tracker/tracker.go
Outdated
// newParticipationReporter returns a new participation reporter function which logs and instruments peer participation. | ||
func newParticipationReporter(peers []p2p.Peer) func(context.Context, core.Duty, map[int]bool) { | ||
// lastParticipation is the set of peers who participated in the last duty. | ||
lastParticipation := make(map[int]bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prevAbsent := make(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can just make this []string
and compare against fmt.Sprint(absent) != fmt.Sprint(prevAbsent)
core/tracker/tracker.go
Outdated
// lastParticipation is the set of peers who participated in the last duty. | ||
lastParticipation := make(map[int]bool) | ||
|
||
return func(ctx context.Context, duty core.Duty, currentParticipation map[int]bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: participatedShares
core/tracker/tracker.go
Outdated
} | ||
|
||
// Avoid identical logs if same set of peers didn't participated. | ||
if len(absentPeers) > 0 && !uniqueParticipation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if fmt.Sprint(absent) != fmt.Sprint(prevAbsent) {
if len(absent) == 0 {
log.Info(ctx, "All peers participated in duty", z.Any("duty"))
else {
log.Info(ctx, "Not all peers participated in duty", z.Any("duty"), z.Any("absent", absentPeers))
}
}
}
// analyseParticipation(duty, t.events[duty]) | ||
participatedShares, err := analyseParticipation(t.events[duty]) | ||
if err != nil { | ||
log.Error(ctx, "Invalid participated shares", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, you need to add tracker log topic to ctx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
core/tracker/tracker.go
Outdated
// If we get a parSigEx event, then the corresponding peer with e.shareIdx participated successfully. | ||
if e.component == parSigEx || e.component == parSigDBInternal { | ||
if e.shareIdx == 0 { | ||
return nil, errors.New("invalid shareIdx", z.Any("component", e.component)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"shareIdx empty"
Implements analyse cluster participation and participation reporter to calculate cluster participation based on partial signatures exchanged and instruments the results.
category: feature
ticket: #821