-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace validator wait for activation stream with polling #14514
Conversation
@@ -341,46 +327,6 @@ func TestCanonicalHeadSlot_OK(t *testing.T) { | |||
assert.Equal(t, primitives.Slot(0), headSlot, "Mismatch slots") | |||
} | |||
|
|||
func TestWaitMultipleActivation_LogsActivationEpochOK(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate test no longer needed, covered in activation_test
|
||
func (v *validator) handleReconnection(ctx context.Context, span octrace.Span, err error, message string, accountsChangedChan <-chan [][fieldparams.BLSPubkeyLength]byte) error { | ||
tracing.AnnotateError(span, err) | ||
attempts := streamAttempts(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should i rename this streamAttempts variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block is small enough that I don't think it matters too much. It should be very clear what it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it's not really a stream
|
||
func (v *validator) handleReconnection(ctx context.Context, span octrace.Span, err error, message string, accountsChangedChan <-chan [][fieldparams.BLSPubkeyLength]byte) error { | ||
tracing.AnnotateError(span, err) | ||
attempts := streamAttempts(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block is small enough that I don't think it matters too much. It should be very clear what it is
Co-authored-by: Sammy Rosso <[email protected]>
Co-authored-by: Sammy Rosso <[email protected]>
|
||
// Step 3: update validator statuses in cache. | ||
if err := v.updateValidatorStatusCache(ctx, validatingKeys); err != nil { | ||
return v.handleReconnection(ctx, span, err, "Connection broken while waiting for activation. Reconnecting...", accountsChangedChan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we try to reconnect here? Failing to update the cache does not mean the connection is broken
|
||
func (v *validator) getValidatorCount(ctx context.Context) (int64, error) { | ||
// TODO: revisit https://github.com/prysmaticlabs/prysm/pull/12471#issuecomment-1568320970 to review if ValidatorCount api can be removed. | ||
// "-1" indicates that validator count endpoint is not supported by the beacon node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this comment as the function documentation. Otherwise callers will get confused when they see -1
returned
"slot": slot, | ||
"seconds_sinceStart": ss, | ||
"next_epoch_start_slot": firstSlotOfNextEpoch, | ||
"slots_until_next_start": firstSlotOfNextEpoch - slot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is all of this information really useful? I personally would care only about the next_epoch_start_slot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was useful for debugging, just having the slot wasn't always meaningful/exact
"seconds_sinceStart": ss, | ||
"next_epoch_start_slot": firstSlotOfNextEpoch, | ||
"slots_until_next_start": firstSlotOfNextEpoch - slot, | ||
}).Debugf("Waiting %d seconds", waitTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide a little more context because it's not obvious what this log refers to. Maybe Waiting until next epoch before re-checking validator statuses...
?
if err != nil { | ||
return err | ||
} | ||
nextEpochStartDuration, err := slots.ToTime(genesisTimeSec, firstSlotOfNextEpoch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: the result of ToTime
is not a duration, but a point in time
if err != nil { | ||
return err | ||
} | ||
waitTime := uint64(nextEpochStartDuration.Unix()-time.Now().Unix()) + ss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect, you should not be adding ss
. You already know the exact point in time when the next epoch will start, so subtracting the current time from that time gives you the correct answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it, the time isn't always 100% exact, sometimes it's like 1-2 seconds off when the new epoch starts
the +ss is just from the current time in the slot so it guarantees it is within the bounds of the start of the first epoch. sometimes adding a 1 second buffer doesn't work
Co-authored-by: Radosław Kapka <[email protected]>
Co-authored-by: Radosław Kapka <[email protected]>
time/slots/slottime.go
Outdated
log.WithFields(logrus.Fields{ | ||
"current_slot": currentSlot, | ||
"next_epoch_start_slot": firstSlotOfNextEpoch, | ||
"slots_until_next_start": firstSlotOfNextEpoch - currentSlot, | ||
"total_wait_time": waitTime, | ||
"is_epoch_start": IsEpochStart(currentSlot), | ||
}).Warn("Waiting until next epoch before re-checking validator statuses...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log does not belong here
time/slots/slottime.go
Outdated
"current_slot": currentSlot, | ||
"next_epoch_start_slot": firstSlotOfNextEpoch, | ||
"slots_until_next_start": firstSlotOfNextEpoch - currentSlot, | ||
"total_wait_time": waitTime, | ||
"is_epoch_start": IsEpochStart(currentSlot), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our convention is to use camelCase
for log fields
Co-authored-by: Radosław Kapka <[email protected]>
Co-authored-by: Radosław Kapka <[email protected]>
What type of PR is this?
Other
What does this PR do? Why is it needed?
deprecates the wait for activation event stream. This PR simplifies the checks for validator activation by polling against the status every epoch instead of receiving gRPC streams. By implementing this way we move one step closer to migrating to a full REST based validator client that can more easily connect with beacon nodes from other clients.
tested with web UI in holesky
Which issues(s) does this PR fix?
Fixes #
Other notes for review
Acknowledgements