-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow honest validators to reorg late blocks #3034
Conversation
Since this is optional, it might be nice to separate it from other parts of the spec a bit more, similar to optimistic sync. We could contain the complexity a bit, especially if it might be phased out in the future. |
e92cf07
to
1c8cc82
Compare
What is the circuit-breaker mechanism for when latency is > 4s? Is the intention that checking "single_slot_reorg" and "participation_ok" should act as one, because, when latency is > 4s and blocks do not get attestations in their slot, the parent block would either be too old or not have enough weight? If that's the case, I think this should be clarified. It might also be nice to have an explicit circuit-breaker that's easier to reason about, and gives clear liveness guarantees (e.g. explicitly turn off the reorg functionality whenever too many proposals are being missed) |
@fradamt Yeah the single slot & participation conditions are meant to be the circuit-breaker for asynchrony (although I think the single slot condition alone is sufficient). I think the outcome is that with message delay
Are you thinking of something like the builder API circuit-breaker conditions about number of missed blocks in the last N blocks and number of epochs since finalization? We could add number-of-skipped-blocks check (while keeping |
## Proposed Changes With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks. This PR adds three flags to the BN to control this behaviour: * `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default). * `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled. * `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally. For safety Lighthouse will only attempt a re-org under very specific conditions: 1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`. 2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes. 3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense. 4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost. ## Additional Info For the initial idea and background, see: ethereum/consensus-specs#2353 (comment) There is also a specification for this feature here: ethereum/consensus-specs#3034 Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: pawan <[email protected]>
## Proposed Changes With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks. This PR adds three flags to the BN to control this behaviour: * `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default). * `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled. * `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally. For safety Lighthouse will only attempt a re-org under very specific conditions: 1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`. 2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes. 3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense. 4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost. ## Additional Info For the initial idea and background, see: ethereum/consensus-specs#2353 (comment) There is also a specification for this feature here: ethereum/consensus-specs#3034 Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: pawan <[email protected]>
## Proposed Changes With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks. This PR adds three flags to the BN to control this behaviour: * `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default). * `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled. * `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally. For safety Lighthouse will only attempt a re-org under very specific conditions: 1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`. 2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes. 3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense. 4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost. ## Additional Info For the initial idea and background, see: ethereum/consensus-specs#2353 (comment) There is also a specification for this feature here: ethereum/consensus-specs#3034 Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: pawan <[email protected]>
## Proposed Changes With proposer boosting implemented (sigp#2822) we have an opportunity to re-org out late blocks. This PR adds three flags to the BN to control this behaviour: * `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default). * `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled. * `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally. For safety Lighthouse will only attempt a re-org under very specific conditions: 1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`. 2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes. 3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense. 4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost. ## Additional Info For the initial idea and background, see: ethereum/consensus-specs#2353 (comment) There is also a specification for this feature here: ethereum/consensus-specs#3034 Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: pawan <[email protected]>
# store.realized_finalization[head_root] | ||
ffg_competitive = True | ||
|
||
# Check the head weight only if the attestations from the head slot have already been applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Check the head weight only if the attestations from the head slot have already been applied. | |
# Check the head weight only if the attestations from the current slot have already been applied. |
This function is phrased in a way that it is agnostic to whether we are calling it on the slot we are about to propose or during the previous slot. However, what worries me is the situation where current_slot = N
, we are supposed to propose at N+1
, head_slot = N-1
and we call during slot N
. In this case, suppose the block N
is based on N-2
, forking off N-1
, it is a timely block, and we haven't yet counted attestations because we are calling this function during N
. What will happen is that our head will still be N-1
and every check here will pass. It may well be that later, at the time we need to propose, during N+1, this timely block at N would become our head, but this function here may return true.
Now this is not a problem per-se in this specification since later on, when calling get_proposer_head
we will get a misprediction because we will not have passed single_slot_reorg
. Anyway, I feel it's better to change this condition here from checking that the attestations for the head block to have been counted to "changing that the attestations for the current slot have been counted if we are calling late in the slot". I do recognize that this is difficult to specify, but perhaps it's better to not be agnostic as to the timing in which this function is being called.
As a side in this long comment: if we are in the situation above, and we call this function during N, then the assert store.proposer_boost_root == Root()
will fail in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the head is at N - 1
and we are proposing at N + 1
then this function won't return true. This function infers the proposal_slot
from the head slot, so it will check whether we are proposing at slot N:
proposal_slot = head_block.slot + 1 |
i.e. if we are actually proposing at N+1
and some other block for N
already exists, the proposing_reorg_slot
check will be False
.
Therefore I think the spec already minimises the chance of a misprediction. It's a really nice test case though, I'll make sure it gets a spec test when we add them (I'll also add a test to Lighthouse before then).
Anyway, I feel it's better to change this condition here from checking that the attestations for the head block to have been counted to "changing that the attestations for the current slot have been counted if we are calling late in the slot".
I'm open to making this change, but would like to clarify something about how you envisage Prysm's implementation applying attestations at 10 or 11 secs into the slot will work: are you going to avoid updating the fork choice store's current slot "early"? I.e. if you are 10s into slot N
will you apply the attestations for slot N
and leave get_current_slot(store) == N
? The way everything is written at the moment maintains the invariant that only attestations from get_current_slot(store) - 1
or earlier are applied, supporting Lighthouse's implementation which advances the fork choice slot early at 11.5s into slot N
and pre-emptively increments the fork choice slot to N + 1
.
As a side in this long comment: if we are in the situation above, and we call this function during N, then the assert store.proposer_boost_root == Root() will fail in this function.
Nice catch on the assert
! I think we need to change it so we return False
rather than erroring, or assert that the proposer boost is for a block other than head_block
:
assert store.proposer_root_root != head_root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. if we are actually proposing at N+1 and some other block for N already exists, the
proposing_reorg_slot
check will beFalse
.
Indeed I mixed the previous check, but now I'm worried about something else: validator_is_connected
can return true for a large number of validators in the case of a pool. In the above scenario, a pool proposing at N and N+1 will actually pass with the wrong validator the check for proposing_reorg_slot
.
are you going to avoid updating the fork choice store's current slot "early"? I.e. if you are 10s into slot N will you apply the attestations for slot N and leave
get_current_slot(store) == N
?
Yeah that's my plan, I'm open to change this to make it closer to LightHouse, but it seems very hard on our end cause we track the wallclock on forkchoice and tricking it to change the slot involves bad hacks like changing the genesis time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validator_is_connected can return true for a large number of validators in the case of a pool. In the above scenario, a pool proposing at N and N+1 will actually pass with the wrong validator the check for proposing_reorg_slot.
@potuz I don't understand what you mean here, can you spell it out in a bit more detail? I thought we already covered this case by demonstrating that the fcU for N - 1
won't be suppressed, and if the head switches to N
upon counting attestations towards the end of slot N
, then the fcU for N
also won't be suppressed because it will fail the parent_slot_ok
check (N
's parent is N - 2
). Hence we'll still build on N
via the normal process (no re-org attempts).
1dd136a
to
d6ab7c9
Compare
I've rebased this on
I think my current preference for Q2 is option (C). I think the only outstanding difference between Prysm and Lighthouse is our handling of unrealized justification. Lighthouse currently won't re-org a block that updates unrealized justification, while Prysm will, because it assumes it can achieve equivalent justification by packing the block's attestations. |
Given https://ethresear.ch/t/selfish-mining-in-pos/15551/3 I suggest we add a constant
And require |
I am starting to get more worried about this change -- the reason is that it implements a version of the block/slot fork choice, but without the mitigations that would have to be in place to avoid reorging under poor networking conditions. Also, we should probably open a discussion about the 4s attestation timeline at this point. In order for attestations, we need (1) block propagation and (2) block validation all to happen in this time slot, which is starting to become more work (e.g. with 4844) and does not feel like the appropriate slot division anymore since the work in the other three thirds remains the same (and is overall less time critical). |
There is a constraint on the slot of a parent block as a mitigation to poor network connections, i.e.
I think we should raise it on one of the next calls. It would be nice to see the distribution of attestation and aggregates submitting over time into their respective slot phases. |
Yes, but that makes any slot whose parent was not reorged subject to a reorg. Seems pretty harsh!
Yeah, it's a good idea, I will check if we can get this data |
FWIW, we have so far not implemented this in Nimbus (yet), for the simple reason it's not part of the spec. +1 that having things like this behind flags dubious at best and should be considered a temporary solution to address urgent problems and not a permanent state. |
Edit: ah I see by voting you mean the vote the proposer is casting by asserting his head was the previous one. Will leave this comment in case someone else also took "attesting" for the meaning of "voting" I don't see how this modifies voting rules:: attesters in the orphaned committee would not have attested for the late block cause they sent attestations at 4 seconds. Attesters in the next committee are following the usual forkchoice and proposer boost mandates them to vote for the reorging block. |
Just to clarify this, the only reason Prysm has this behind a flag is to protect from an implementation bug that can cause a liveness issue. We do this with most changes that touch core functions like block production. This has nothing to do with the feature being or not merged in the spec. As far as we were concerned this was thoroughly discussed and agreed in ACD albeit some reservations about disclosing because of certain forkchoice attacks that's were present at the time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late assignment!
I added some basic test cases, updated test format with should_override_forkchoice_update
result, and fixed trivial things.
TODOs
- add the mocking
validator_is_connected
value to test vector and format - confirm if test format is ok and run with a client
- check if we should output
get_proposer_head
result. It is labeled optional so I hesitated. also, consider movingget_proposer_head
to a separate place (Allow honest validators to reorg late blocks #3034 (comment))
It would be nice to check again in the ACDC call, and then we can move this PR to "last call" status + merge.
Hi @hwwhww, sorry for the slow reply! Thanks for cleaning this up and adding tests, I really appreciate that 😊
I tried to clarify the wording of this section in 3f1bc20. Both functions are optional, and
I think the tests should include I also think we should just leave it in the current location, as it seems like we may want to make this feature mandatory in future (Potuz mentioned that several PBS designs rely on it, but I haven't dug deeper).
I'll try running the test vectors with Lighthouse soon. Maybe once they also include |
6fb6a00
to
b9285b8
Compare
@michaelsproul thank you! I updated the test format: https://github.com/ethereum/consensus-specs/blob/d8440f8bb4496d053a36b5fef64420d5149156bb/tests/formats/fork_choice/README.md#checks-step
Now running the fork-choice testgen. |
# Make get_current_slot(store) >= attestation.data.slot + 1 | ||
min_time_to_include = (attestation.data.slot + 1) * spec.config.SECONDS_PER_SLOT | ||
if store.time < min_time_to_include: | ||
spec.on_tick(store, min_time_to_include) | ||
test_steps.append({'tick': int(min_time_to_include)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: this change would modify other test vectors.
FC-only test vectors of commit d8440f8 |
Tests are passing on Lighthouse (sigp/lighthouse#4887). I expected at least one test to fail because Lighthouse doesn't have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll try to add more test cases in other PRs!
- added `REORG_HEAD_WEIGHT_THRESHOLD`, `REORG_PARENT_WEIGHT_THRESHOLD`, and `REORG_MAX_EPOCHS_SINCE_FINALIZATION` to phase0 configuration, and defaulted - added utility functions: - `isShufflingStable` - `isFinalizationOk` - `calculateCommitteeFraction` which were part of the changes in ethereum/consensus-specs#3034 partially addresses Consensys#6595 Signed-off-by: Paul Harris <[email protected]>
With proposer boosting implemented (sigp#2822) we have an opportunity to re-org out late blocks. This PR adds three flags to the BN to control this behaviour: * `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default). * `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled. * `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally. For safety Lighthouse will only attempt a re-org under very specific conditions: 1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`. 2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes. 3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense. 4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost. For the initial idea and background, see: ethereum/consensus-specs#2353 (comment) There is also a specification for this feature here: ethereum/consensus-specs#3034 Co-authored-by: Michael Sproul <[email protected]> Co-authored-by: pawan <[email protected]>
Specify a function in fork choice
get_proposer_head
which returns the parent block upon which a proposer may build while possibly reorging the canonical head (if it was broadcast late and lacks sufficient attestation weight).The mechanism is designed with safeguards built in so that it does not cause liveness failures:
N + 2
may build on blockN
if blockN + 1
arrived late.REORG_MAX_EPOCHS_SINCE_FINALIZATION
. The default is to require optimal finalization (every 2 epochs).REORG_HEAD_WEIGHT_THRESHOLD
is set to 20% (half of the proposer boost) so that an attacker must control up toPROPOSER_BOOST_SCORE - REORG_WEIGHT_THRESHOLD
= 20% of attester weight in order to perform an ex-ante reorg. The attack works by releasing attestations for the block at slotN + 1
around the same time as the honest proposer publishes the attempted reorg block atN + 2
. The late attestations tip the weight of theN + 1
block from< REORG_WEIGHT_THRESHOLD
to> PROPOSER_BOOST_SCORE
, meaning that blockN + 1
will be preferred to blockN + 2
.REORG_PARENT_WEIGHT_THRESHOLD
is set to 160% so that an attacker that is hoarding attestations cannot cause the attempted re-org to fail. For details of this calculation see: https://ethresear.ch/t/selfish-mining-in-pos/15551.For background on the development of this idea, see: sigp/lighthouse#2860.
Pros
Cons
(block, slot)
fork choice. If a viable implementation of(block, slot)
fork choice emerges then reorging late blocks will become the default behaviour.Thanks to @potuz @mkalinin @terencechain @fradamt @casparschwa @djrtwo for feedback on this idea.