Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kusama validators getting 0 backing rewards the first session tey enter the active set #3733

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Mar 18, 2024

This is a cherry-pick from master of #3722

Expected patches for (1.7.0):
pallet-authorithy-discovery

@alexggh alexggh requested review from a team as code owners March 18, 2024 13:30
@alexggh alexggh changed the base branch from master to release-polkadot-v1.7.0 March 18, 2024 13:31
@alexggh alexggh force-pushed the alexaggh/release-polkadot-v1.7.0 branch from 9543eb6 to 446646b Compare March 18, 2024 13:31
@alexggh alexggh requested a review from a team as a code owner March 18, 2024 13:31
@alexggh alexggh changed the base branch from release-polkadot-v1.7.0 to release-polkadot-v1.7.2 March 18, 2024 13:32
@alexggh alexggh changed the title Alexaggh/release polkadot v1.7.0 Backport #3722 Mar 18, 2024
…y enter the active set (#3722)

There is a problem in the way we update `authorithy-discovery` next keys
and because of that nodes that enter the active set would be noticed at
the start of the session they become active, instead of the start of the
previous session as it was intended. This is problematic because:

1. The node itself advertises its addresses on the DHT only when it
notices it should become active on around ~10m loop, so in this case it
would notice after it becomes active.
2. The other nodes won't be able to detect the new nodes addresses at
the beginning of the session, so it won't added them to the reserved
set.

With 1 + 2, we end-up in a situation where the the new node won't be
able to properly connect to its peers because it won't be in its peers
reserved set. Now, the nodes accept by default`MIN_GOSSIP_PEERS: usize =
25` connections to nodes that are not in the reserved set, but given
Kusama size(> 1000 nodes) you could easily have more than`25` new nodes
entering the active set or simply the nodes don't have slots anymore
because, they already have connections to peers not in the active set.

In the end what the node would notice is 0 backing rewards because it
wasn't directly connected to the peers in its backing group.

## Root-cause

The flow is like this:
1. At BAD_SESSION - 1, in `rotate_session` new nodes are added to
QueuedKeys
https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/session/src/lib.rs#L609
```
 <QueuedKeys<T>>::put(queued_amalgamated.clone());
<QueuedChanged<T>>::put(next_changed);
```
2. AuthorityDiscovery::on_new_session is called with `changed` being the
value of `<QueuedChanged<T>>:` at BAD_SESSION - **2** because it was
saved before being updated
https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/session/src/lib.rs#L613
3. At BAD_SESSION - 1, `AuthorityDiscovery::on_new_session` doesn't
updated its next_keys because `changed` was false.
4. For the entire durations of `BAD_SESSION - 1` everyone calling
runtime api `authorities`(should return past, present and future
authorities) won't discover the nodes that should become active .
5. At the beginning of BAD_SESSION, all nodes discover the new nodes are
authorities, but it is already too late because reserved_nodes are
updated only at the beginning of the session by the `gossip-support`.
See above why this bad.

## Fix
Update next keys with the queued_validators at every session, not matter
the value of `changed` this is the same way babe pallet correctly does
it.
https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/babe/src/lib.rs#L655

## Notes

- The issue doesn't reproduce with proof-authorities changes like
`versi` because `changed` would always be true and `AuthorityDiscovery`
correctly updates its next_keys every time.
- Confirmed at session `37651` on kusama that this is exactly what it
happens by looking at blocks with polkadot.js.

## TODO
- [ ] Move versi on proof of stake and properly test before and after
fix to confirm there is no other issue.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
(cherry picked from commit 8d0cd4f)
Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh alexggh force-pushed the alexaggh/release-polkadot-v1.7.0 branch from 446646b to ecad3ef Compare March 18, 2024 13:38
@paritytech-review-bot paritytech-review-bot bot requested a review from a team March 18, 2024 13:39
@alexggh alexggh changed the base branch from release-polkadot-v1.7.2 to release-crates-io-v1.7.0 March 18, 2024 13:39
@paritytech-review-bot paritytech-review-bot bot requested a review from a team March 18, 2024 13:39
@alexggh alexggh changed the title Backport #3722 Fix kusama validators getting 0 backing rewards the first session tey enter the active set Mar 18, 2024
@alexggh alexggh added the T1-FRAME This PR/Issue is related to core FRAME, the framework. label Mar 18, 2024
@alexggh alexggh added the R0-silent Changes should not be mentioned in any release notes label Mar 18, 2024
@bkchr bkchr merged commit a19c4ce into release-crates-io-v1.7.0 Mar 18, 2024
30 of 32 checks passed
@bkchr bkchr deleted the alexaggh/release-polkadot-v1.7.0 branch March 18, 2024 15:38
@alexggh alexggh mentioned this pull request Mar 19, 2024
19 tasks
alexggh added a commit to alexggh/runtimes that referenced this pull request Mar 21, 2024
To pick up the fix for
paritytech/polkadot-sdk#3733

Signed-off-by: Alexandru Gheorghe <[email protected]>
fellowship-merge-bot bot added a commit to polkadot-fellows/runtimes that referenced this pull request Mar 21, 2024
To pick up the fix for
paritytech/polkadot-sdk#3733

<!-- Remember that you can run `/merge` to enable auto-merge in the PR
-->

<!-- Remember to modify the changelog. If you don't need to modify it,
you can check the following box.
Instead, if you have already modified it, simply delete the following
line. -->

- [x] Does not require a CHANGELOG entry

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: fellowship-merge-bot[bot] <151052383+fellowship-merge-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T1-FRAME This PR/Issue is related to core FRAME, the framework.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants