Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime: allow backing multiple candidates of same parachain on different cores #3231

Merged
merged 63 commits into from
Feb 23, 2024

Conversation

sandreim
Copy link
Contributor

@sandreim sandreim commented Feb 6, 2024

Fixes #3144

Builds on top of #3229

Summary

Some preparations for Runtime to support elastic scaling, guarded by config node features bit FeatureIndex::ElasticScalingMVP. This PR introduces a per-candidate CoreIndex but does it in a hacky way to avoid changing CandidateCommitments, CandidateReceipts primitives and networking protocols.

Including CoreIndex in BackedCandidate

If the ElasticScalingMVP feature bit is enabled then BackedCandidate::validator_indices is extended by 8 bits.
The value stored in these bits represents the assumed core index for the candidate.

It is temporary solution which works by creating a mapping from BackedCandidate to CoreIndex by assuming the CoreIndex can be discovered by checking in which validator group the validator that signed the statement is.

TODO:

Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
…:paritytech/polkadot-sdk into sandreim/runtime_core_index_step1
@alindima
Copy link
Contributor

Seeing this new InjectCoreIndex feature made me think about how are we going to enable elastic scaling (on the runtime side). On the node side, all of the changes that I can think of are backwards compatible.

I think we'll need another NodeFeature bit, AllowMultipleCoresPerPara, which will switch on the logic in the runtime for accepting multiple candidates for a parachain.

This way:

  1. if InjectCoreIndex is enabled and AllowMultipleCoresPerPara is enabled, elastic scaling is enabled and working
  2. if InjectCoreIndex is enabled and AllowMultipleCoresPerPara is disabled, parachains can acquire multiple cores but the runtime will only accept one candidate at a time. (what this PR aims to do)
  3. if InjectCoreIndex is disabled and AllowMultipleCoresPerPara is enabled, this makes no sense. Elastic scaling won't work and parachains who acquire multiple cores will be stalled (the current status quo)
  4. if InjectCoreIndex is disabled and AllowMultipleCoresPerPara is disabled. Identical to option 3 above.

Seeing this, I'm wondering if having this InjectCoreIndex makes any sense on its own after all. I guess it does make sense because it'll be disabled once we have the CoreIndex in the receipt.

CC @eskimor

@alindima
Copy link
Contributor

Maybe a NodeFeature is not the most suitable name for this, because in this case it enables changes in the runtime as well (not only on the node). Especially the AllowMultipleCoresPerPara I mention above. But I think it's fine

Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
…:paritytech/polkadot-sdk into sandreim/runtime_core_index_step1
@sandreim sandreim changed the title Elastic scaling: allow backing multiple candidates of same parachain on different cores Runtime: allow backing multiple candidates of same parachain on different cores Feb 13, 2024
@sandreim
Copy link
Contributor Author

@alindima
I think InjectCoreIndex is sufficient to enable elastic scaling in Runtime as well as node side. Not sure why you want another feature. The soft switch which you seem to be looking for is actually having paras assigned to multiple cores.

@alindima
Copy link
Contributor

alindima commented Feb 13, 2024

@alindima I think InjectCoreIndex is sufficient to enable elastic scaling in Runtime as well as node side. Not sure why you want another feature. The soft switch which you seem to be looking for is actually having paras assigned to multiple cores.

I thought we wanted to maintain the possibility of not having elastic scaling but still not bricking a parachain which occupies more cores.
Now that I think about it again, maybe it isn't very helpful. Only for the possibility of disabling elastic scaling to mitigate a bug. If there were some bug with elastic scaling that stalled a parachain, disabling InjectCoreIndex would not unbrick the parachain. Maybe it's not worth the complexity.
Also, in the future, after CoreIndex is present in the candidatereceipt, disabling InjectCoreIndex will no longer disable elastic scaling.

@sandreim
Copy link
Contributor Author

I thought we wanted to maintain the possibility of not having elastic scaling but still not bricking a parachain which occupies more cores.

This is included in this PR, they won't break if the InjectCoreIndex feature is enabled, even if parachains have multiple cores assigned and their collators are unupgraded.

@alindima
Copy link
Contributor

I thought we wanted to maintain the possibility of not having elastic scaling but still not bricking a parachain which occupies more cores.

This is included in this PR, they won't break if the InjectCoreIndex feature is enabled, even if parachains have multiple cores assigned and their collators are unupgraded.

I am talking about after we enable elastic scaling

@sandreim
Copy link
Contributor Author

Anything we implement for elastic scaling must not break parachains that don't make use of this feature, while also not requiring them to upgrade (except of when we add CoreIndex to the commited candidate receipt).

We should prevent parachains (I'll cut a ticket) from acquiring multiple cores as a safety measure and as an enablement mechanism. We initially plan to launch on Rococo, while on other chains the possibility of assigning multiple cores is disabled - hence elastic scaling is disabled.

Signed-off-by: Andrei Sandu <[email protected]>
Base automatically changed from sandreim/backing_multiple_cores_per_para to master February 22, 2024 08:47
alice: js-script ./0012-enable-node-feature.js with "1" return is 0 within 600 seconds

# Wait two sessions for the config to be updated.
sleep 120 seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to wait for the actual session change 😬 - If that is not possible right now, we should add that feature.

Copy link
Contributor

@tdimitrov tdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good @alindima. Left a few nits and a question.

polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
// cases.
if context == ProcessInherentDataContext::Enter {
ensure!(
!dropped_elastic_scaling_candidates,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR but why we check if we have dropped 'elastic scaling candidates' here, but we are fine dropping candidates without assigned core (which happens here)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out!

prior to this PR, we just assumed that one para maps to one core. So we were only checking if the candidate's paraid has one scheduled core.
if it didn't, we'd return an error. if multiple candidates with the same paraid were supplied, we'd only check for one scheduled core.

The logic I added is more comprehensive and will work for proper elastic scaling also. But I decided to be more lenient and just filter out unscheduled candidates instead of erroring.

However, the check you mentioned should be I think extended to all dropped candidates 👍️ I don't see a reason why it should only be for some of them
because they should be all filtered by the block author. the other block importers should not do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5329732

@eskimor eskimor added this pull request to the merge queue Feb 23, 2024
Merged via the queue into master with commit 2431001 Feb 23, 2024
129 of 131 checks passed
@eskimor eskimor deleted the sandreim/runtime_core_index_step1 branch February 23, 2024 17:01
eskimor pushed a commit that referenced this pull request Feb 23, 2024
…rent cores (#3231)

Fixes #3144

Builds on top of #3229

Some preparations for Runtime to support elastic scaling, guarded by
config node features bit `FeatureIndex::ElasticScalingMVP`. This PR
introduces a per-candidate `CoreIndex` but does it in a hacky way to
avoid changing `CandidateCommitments`, `CandidateReceipts` primitives
and networking protocols.

If the `ElasticScalingMVP` feature bit is enabled then
`BackedCandidate::validator_indices` is extended by 8 bits.
The value stored in these bits represents the assumed core index for the
candidate.

It is temporary solution which works by creating a mapping from
`BackedCandidate` to `CoreIndex` by assuming the `CoreIndex` can be
discovered by checking in which validator group the validator that
signed the statement is.

TODO:
- [x] fix tests
- [x] add new tests
- [x] Bump runtime API for Kusama, so we have that node features thing!
-> polkadot-fellows/runtimes#194

---------

Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: alindima <[email protected]>
Co-authored-by: alindima <[email protected]>
eskimor pushed a commit that referenced this pull request Feb 26, 2024
…rent cores (#3231)

Fixes #3144

Builds on top of #3229

### Summary
Some preparations for Runtime to support elastic scaling, guarded by
config node features bit `FeatureIndex::ElasticScalingMVP`. This PR
introduces a per-candidate `CoreIndex` but does it in a hacky way to
avoid changing `CandidateCommitments`, `CandidateReceipts` primitives
and networking protocols.

#### Including `CoreIndex` in `BackedCandidate`
If the `ElasticScalingMVP` feature bit is enabled then
`BackedCandidate::validator_indices` is extended by 8 bits.
The value stored in these bits represents the assumed core index for the
candidate.

It is temporary solution which works by creating a mapping from
`BackedCandidate` to `CoreIndex` by assuming the `CoreIndex` can be
discovered by checking in which validator group the validator that
signed the statement is.

TODO:
- [x] fix tests
- [x] add new tests
- [x] Bump runtime API for Kusama, so we have that node features thing!
-> polkadot-fellows/runtimes#194

---------

Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: alindima <[email protected]>
Co-authored-by: alindima <[email protected]>
ordian added a commit that referenced this pull request Feb 28, 2024
…head-data

* origin/master: (51 commits)
  Runtime Upgrade ref docs and Single Block Migration example pallet  (#1554)
  Collator overseer builder unification (#3335)
  Introduce storage attr macro `#[disable_try_decode_storage]` and set it on `System::Events` and `ParachainSystem::HostConfiguration` (#3454)
  Add Polkadotters bootnoders per IBP application (#3423)
  Add documentation around FRAME Origin (#3362)
  Bridge zombienet tests: Check amount received at destination (#3490)
  Snowbridge benchmark tests fix (#3424)
  fix(zombienet): increase timeout in download artifacts (#3376)
  Cleanup String::from_utf8 (#3446)
  [prdoc] Validate crate names (#3467)
  Limit max execution time for `test-linux-stable` CI jobs (#3483)
  Introduce Notification block pinning limit (#2935)
  frame-support: Improve error reporting when having too many pallets (#3478)
  add Encointer as trusted teleporter for Westend (#3411)
  [pallet-xcm] Adjust benchmarks (teleport_assets/reserve_transfer_assets) not relying on ED (#3464)
  Add more debug logs to understand if statement-distribution misbehaving (#3419)
  Remove redundant parachains assigner pallet. (#3457)
  Use generic hash for runtime wasm in resolve_state_version_from_wasm (#3447)
  Runtime: allow backing multiple candidates of same parachain on different cores  (#3231)
  Bridge zombienet tests: move all "framework" files under one folder (#3462)
  ...
@patriciobcs patriciobcs self-assigned this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
Development

Successfully merging this pull request may close these issues.

Fix (backing) relying on para ids being unique
6 participants