Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace validator wait for activation stream with polling #14514

Merged
merged 22 commits into from
Oct 10, 2024

Conversation

james-prysm
Copy link
Contributor

@james-prysm james-prysm commented Oct 7, 2024

What type of PR is this?

Other

What does this PR do? Why is it needed?

deprecates the wait for activation event stream. This PR simplifies the checks for validator activation by polling against the status every epoch instead of receiving gRPC streams. By implementing this way we move one step closer to migrating to a full REST based validator client that can more easily connect with beacon nodes from other clients.

tested with web UI in holesky

Which issues(s) does this PR fix?

Fixes #

Other notes for review

Acknowledgements

  • I have read CONTRIBUTING.md.
  • I have made an appropriate entry to CHANGELOG.md.
  • I have added a description to this PR with sufficient context for reviewers to understand this PR.

@james-prysm james-prysm added the Cleanup Code health! label Oct 7, 2024
@james-prysm james-prysm requested a review from a team as a code owner October 7, 2024 15:56
@james-prysm james-prysm added the API Api related tasks label Oct 7, 2024
@james-prysm james-prysm changed the title Remove validator wait activation Replace validator wait for activation stream with polling Oct 7, 2024
@james-prysm james-prysm marked this pull request as draft October 7, 2024 15:58
@@ -341,46 +327,6 @@ func TestCanonicalHeadSlot_OK(t *testing.T) {
assert.Equal(t, primitives.Slot(0), headSlot, "Mismatch slots")
}

func TestWaitMultipleActivation_LogsActivationEpochOK(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate test no longer needed, covered in activation_test


func (v *validator) handleReconnection(ctx context.Context, span octrace.Span, err error, message string, accountsChangedChan <-chan [][fieldparams.BLSPubkeyLength]byte) error {
tracing.AnnotateError(span, err)
attempts := streamAttempts(ctx)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i rename this streamAttempts variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block is small enough that I don't think it matters too much. It should be very clear what it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it's not really a stream

@james-prysm james-prysm marked this pull request as ready for review October 7, 2024 16:11
@james-prysm james-prysm added the Ready For Review A pull request ready for code review label Oct 7, 2024
CHANGELOG.md Outdated Show resolved Hide resolved

func (v *validator) handleReconnection(ctx context.Context, span octrace.Span, err error, message string, accountsChangedChan <-chan [][fieldparams.BLSPubkeyLength]byte) error {
tracing.AnnotateError(span, err)
attempts := streamAttempts(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block is small enough that I don't think it matters too much. It should be very clear what it is

validator/client/wait_for_activation.go Show resolved Hide resolved
validator/client/wait_for_activation.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
validator/client/wait_for_activation.go Outdated Show resolved Hide resolved

// Step 3: update validator statuses in cache.
if err := v.updateValidatorStatusCache(ctx, validatingKeys); err != nil {
return v.handleReconnection(ctx, span, err, "Connection broken while waiting for activation. Reconnecting...", accountsChangedChan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we try to reconnect here? Failing to update the cache does not mean the connection is broken


func (v *validator) getValidatorCount(ctx context.Context) (int64, error) {
// TODO: revisit https://github.com/prysmaticlabs/prysm/pull/12471#issuecomment-1568320970 to review if ValidatorCount api can be removed.
// "-1" indicates that validator count endpoint is not supported by the beacon node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this comment as the function documentation. Otherwise callers will get confused when they see -1 returned

validator/client/wait_for_activation.go Outdated Show resolved Hide resolved
validator/client/wait_for_activation.go Show resolved Hide resolved
Comment on lines 152 to 155
"slot": slot,
"seconds_sinceStart": ss,
"next_epoch_start_slot": firstSlotOfNextEpoch,
"slots_until_next_start": firstSlotOfNextEpoch - slot,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all of this information really useful? I personally would care only about the next_epoch_start_slot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was useful for debugging, just having the slot wasn't always meaningful/exact

"seconds_sinceStart": ss,
"next_epoch_start_slot": firstSlotOfNextEpoch,
"slots_until_next_start": firstSlotOfNextEpoch - slot,
}).Debugf("Waiting %d seconds", waitTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide a little more context because it's not obvious what this log refers to. Maybe Waiting until next epoch before re-checking validator statuses...?

if err != nil {
return err
}
nextEpochStartDuration, err := slots.ToTime(genesisTimeSec, firstSlotOfNextEpoch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: the result of ToTime is not a duration, but a point in time

if err != nil {
return err
}
waitTime := uint64(nextEpochStartDuration.Unix()-time.Now().Unix()) + ss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect, you should not be adding ss. You already know the exact point in time when the next epoch will start, so subtracting the current time from that time gives you the correct answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it, the time isn't always 100% exact, sometimes it's like 1-2 seconds off when the new epoch starts
the +ss is just from the current time in the slot so it guarantees it is within the bounds of the start of the first epoch. sometimes adding a 1 second buffer doesn't work

@james-prysm james-prysm added Blocked Blocked by research or external factors and removed Ready For Review A pull request ready for code review labels Oct 8, 2024
@james-prysm james-prysm added Ready For Review A pull request ready for code review and removed Blocked Blocked by research or external factors labels Oct 9, 2024
validator/client/wait_for_activation.go Outdated Show resolved Hide resolved
validator/client/wait_for_activation.go Outdated Show resolved Hide resolved
Comment on lines 305 to 311
log.WithFields(logrus.Fields{
"current_slot": currentSlot,
"next_epoch_start_slot": firstSlotOfNextEpoch,
"slots_until_next_start": firstSlotOfNextEpoch - currentSlot,
"total_wait_time": waitTime,
"is_epoch_start": IsEpochStart(currentSlot),
}).Warn("Waiting until next epoch before re-checking validator statuses...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log does not belong here

Comment on lines 306 to 310
"current_slot": currentSlot,
"next_epoch_start_slot": firstSlotOfNextEpoch,
"slots_until_next_start": firstSlotOfNextEpoch - currentSlot,
"total_wait_time": waitTime,
"is_epoch_start": IsEpochStart(currentSlot),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention is to use camelCase for log fields

@james-prysm james-prysm added this pull request to the merge queue Oct 10, 2024
Merged via the queue into develop with commit 6c22ede Oct 10, 2024
17 of 18 checks passed
@james-prysm james-prysm deleted the remove-validator-wait-activation branch October 10, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Api related tasks Cleanup Code health! Ready For Review A pull request ready for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants