-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/scheduler: add duty query method #384
Conversation
Codecov Report
@@ Coverage Diff @@
## main #384 +/- ##
==========================================
- Coverage 56.93% 56.81% -0.13%
==========================================
Files 63 63
Lines 5605 5506 -99
==========================================
- Hits 3191 3128 -63
+ Misses 1993 1965 -28
+ Partials 421 413 -8
Continue to review full report at Codecov.
|
core/scheduler/scheduler.go
Outdated
@@ -290,6 +327,32 @@ type slot struct { | |||
SlotDuration time.Duration | |||
} | |||
|
|||
func newSlot(ctx context.Context, eth2Cl eth2Provider, height uint64) (slot, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolveDuties doesn't need the full internal slot
type, it only uses slot and epoch int64s. I feel it would be simpler to refactor resolveDuties rather than adding this which adds some code duplication (from newSlotTicker
).
core/scheduler/scheduler.go
Outdated
@@ -173,6 +209,7 @@ func (s *Scheduler) scheduleSlot(ctx context.Context, slot slot) error { | |||
} | |||
|
|||
// resolveDuties resolves the duties for the slot's epoch, caching the results. | |||
// Do not call if you do not hold the dutiesMutex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do not hold the mutex when call this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller holds the mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment not accurate anymore, can remove
core/scheduler/scheduler.go
Outdated
s.dutiesMutex.Lock() | ||
defer s.dutiesMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always decrease the time locks held to a minimum. I would strongly discourage doing network calls while a lock is held for example. Beacon API calls often take multiple seconds, during which you will be locking up validatorapi
in this case (or whatever other components use GetDuty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding getter/setter methods for mutex protected data. Then the logic can be lock free.
func (s *Scheduler) getFetchArgSet(duty core.Duty) core.FetchArgSet, bool {
s.dutiesMutex.Lock()
defer s.dutiesMutex.Unlock()
return s.duties[duty]
}
func (s *Scheduler) setFetchArg(duty core.Duty, pubkey core.Pubkey, core.FetchArg) bool {
s.dutiesMutex.Lock()
defer s.dutiesMutex.Unlock()
argSet, ok := s.duties[duty]
if !ok {
argSet = make(core.FetchArgSet)
}
if _, ok := argSet[pubkey]; ok {
return false
}
s.duties[duty] = set
return true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corverroos it is not just the duties map that we are sharing, it is also the resolvedEpoch, that is why I put mutex around everything. Otherwise, we will need another mutex and another extra private methods for resolvedEpoch, that perhaps that is what you want.
I recommend a redesign, where these duties are on a external component that stores them, and also the resolved epochs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can update the resolveDuties map in setFetchArg
and also add a getter resolvedEpoch
with similar pattern.
Another mutex isn't required, nor a is refactor I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we add a map, I am confused why we keep resolved epochs instead of resolved slots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think I can update the resolvedEpochs or resolveSlot as an setFetchArg does not mean the whole slot is resolved.
If we move to a resolvedSlot map to bool then is ok, but I think it would be easier then to just put an empty value when a slot is resolved but there are no duties.
@@ -123,8 +125,42 @@ func (s *Scheduler) Run() error { | |||
} | |||
} | |||
|
|||
func (s *Scheduler) GetDuty(ctx context.Context, duty core.Duty) (core.FetchArgSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a godoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry about that
core/scheduler/scheduler.go
Outdated
return s.duties[duty], nil | ||
} | ||
|
||
func (s *Scheduler) isEpochResolved(epoch uint64) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your concern that this will result in skipping a bunch of duties if a VC calls some far future slot. We should refactor resolvedEpoch
into a map or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to refactor with an external component that has all that logic, perhaps ResolvedDutiesDB ?
Perhaps we should define:
- Duty
- ResolvedDuty
- SolvedDuty
or some separated concepts and include them in the type system for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that is necessary, we are almost there.
Let's think about this for a moment, all this is because the validatorAPI and the scheduler need to know the public key of a proposer for a given slot. Instead of making the validtorAPI communicate with the scheduler and access some data structure that was thought to be internal of scheduler in the first time. Lets create a new component that provides this information (and caches it if the BN cliend does not cache it) to both the scheduler and the validatorAPI. |
@corverroos please check it before I add tests |
return argSet, ok | ||
} | ||
|
||
func (s *Scheduler) setFetchArg(duty core.Duty, pubkey core.PubKey, set core.FetchArg) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used?
@corverroos I forgot a few places to call the private methods, I think now they are all |
core/scheduler/scheduler.go
Outdated
@@ -206,7 +230,7 @@ func (s *Scheduler) resolveDuties(ctx context.Context, slot slot) error { | |||
|
|||
duty := core.Duty{Slot: int64(attDuty.Slot), Type: core.DutyAttester} | |||
|
|||
argSet, ok := s.duties[duty] | |||
argSet, ok := s.getFetchArgSet(duty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to call getFetchArgSet
first. Just call setFetchArgSet
below.
if !s.setFetchArg(duty, pubkey, arg){
log.Debug(ctx, "Ignoring previously resolved duty", z.Any("duty", duty))
continue
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
612e695
to
937add0
Compare
core/scheduler/scheduler.go
Outdated
@@ -159,7 +182,7 @@ func (s *Scheduler) scheduleSlot(ctx context.Context, slot slot) error { | |||
} | |||
|
|||
span.End() | |||
delete(s.duties, duty) | |||
s.deleteDuty(duty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we delete duties after scheduling, so calls to GetDuty
will error after it has been scheduled.
Suggest removing the deleteDuty
function and adding a TODO:
// TODO(leo): Trim duties after some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had the shared component this would not be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we will indeed not need to do this here . But we will still need to do it in DutyResolver
though. So same logic is required somewhere.
core/scheduler/scheduler.go
Outdated
func (s *Scheduler) isEpochResolved(epoch uint64) bool { | ||
return s.getResolvedEpoch() >= epoch | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will return true for default s.resolvedEpoch=math.MaxUint64
before the first epoch is resolved which is technically not correct, but maybe that is not a big issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corverroos ok, so we need to take this into account
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ok, I am going to add tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add a method to the scheduler to query a duty arguments once it has been resolved
category: feature
ticket: #352