-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prospective-parachains rework #4035
prospective-parachains rework #4035
Conversation
…e-parachains-unconnected
…e-parachains-unconnected
…e-parachains-unconnected
Signed-off-by: Andrei Sandu <[email protected]>
…e-parachains-unconnected
…e-parachains-unconnected
…e-parachains-unconnected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work here and testing on Versi @alindima!
Just have some more nits and questions, but anyway good to go.
polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs
Outdated
Show resolved
Hide resolved
polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs
Outdated
Show resolved
Hide resolved
|
||
if (self.chain.len() + unconnected) < self.scope.max_depth { | ||
PotentialAddition::Anyhow | ||
} else if (self.chain.len() + unconnected) == self.scope.max_depth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why if we are already at max_depth we can add one more ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good question.
Even prior to this PR, we were accepting max_depth + 1 unincluded candidates at all times.
Not sure why it was designed this way. I assume it's because a max_candidate_depth of 0 is equal to synchronous backing and would still allow for one candidate to be backed (therefore the +1 is needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that makes some sense indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is that now we have async backing enabled do we still want to keep this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is that now we have async backing enabled do we still want to keep this ?
I don't have a good reason for chaning this. Moreover, we seem to have discovered that we need a max_depth + 1 unincluded segment size also on the collator. So maybe it's here for a reason.
Nevertheless, I'd leave it as is unless we find a good reason to change it
polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs
Outdated
Show resolved
Hide resolved
let Some(relay_parent) = self.scope.ancestor(relay_parent) else { return false }; | ||
|
||
if relay_parent.number < earliest_rp.number { | ||
return false // relay parent moved backwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to log trace here just in case for debugging ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it helps, because this function is also called when pruning the candidate storage on a new leaf update.
therefore, it's bound to happen most of the times under normal circumstances
/// access to the parent's HeadData. Can be retried once the candidate outputting this head data is | ||
/// seconded. | ||
#[derive(Debug, Clone, Eq, PartialEq, Hash)] | ||
pub struct BlockedCollationId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really an ID ? Just BlockedCollation
maybe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only the key used in the HashMap, hence it's an ID. It maps to a fetched collation
prdoc/pr_4035.prdoc
Outdated
- audience: Node Dev | ||
description: | | ||
Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't yet know | ||
the parent candidate). Needed for elastic scaling. Also simplifies it to not allow parachain forks and cycles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd first mention that this fundamentally changes what information the subsystem stores and operates on. From a tree to just a chain and bunch of unconnected candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworked the prdoc, let me know if it sounds better now
…e-parachains-unconnected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! Left some nits, couldn't find any problems within it.
polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs
Outdated
Show resolved
Hide resolved
if self.is_fork_or_cycle( | ||
candidate.parent_head_data_hash(), | ||
Some(candidate.output_head_data_hash()), | ||
) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't expect this to happen, right ? So let's log a debug if it happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it can happen if there's a parachain fork or cycle on another relay chain fork. the candidate storage holds all candidates across all relay parents. so when populating the chain, we may attempt to introduce a candidate that is present on another relay chain fork (but those candidates will probably not pass these sanity checks). but it may not neccesarily mean something's wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work @alindima
…e-parachains-unconnected
* master: improve MockValidationDataInherentDataProvider to support async backing (#4442) Bump `proc-macro-crate` to the latest version (#4409) [ci] Run check-runtime-migration in GHA (#4441) prospective-parachains rework (#4035) [ci] Add forklift to GHA ARC (#4372) `CheckWeight` SE: Check for extrinsic length + proof size combined (#4326) Add generate and verify logic for `AncestryProof` (#4430) Rococo AH: undeploy trie migration (#4414) Remove `substrate-frame-cli` (#4403) migrations: `take()`should consume read and write operation weight (#4302) `remote-externalities`: store block header in snapshot (#4349) xcm-emlator: Use `BlockNumberFor` instead of `parachains_common::BlockNumber=u32` (#4434) Remove `pallet::getter` usage from authority-discovery pallet (#4091) Remove pallet::getter usage from pallet-contracts-mock-network (#4417) Add docs to request_core_count (#4423)
Are fragment chains backwards compatible with fragment trees? Should we do the corresponding updates in KAGOME ASAP? |
Fragment trees would maintain parachain forks of backable candidates. In the end, the provisioner subsystem on the block author would only choose the first one to back. Now, fragment chains enable validators to not maintain parachain forks. If the block author gets two backable candidates that have the same parent, it will not even fetch/store the second one. We make the assumption that parachains shouldn't be creating forks (often or ever) and if they do, they won't utilise the full throughput. This shouldn't break compatibility with nodes that still hold fragment trees. We don't expect every validator to use this implementation rightaway. That being said, this PR is more than a switch from fragment-trees to fragment chains. It enables parallel validation of candidates of the same para across different backing groups (needed for elastic scaling). |
Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't know the parent candidate yet). Needed for elastic scaling: paritytech#3541. Without this, candidate B will not be validated and backed until candidate A (its parent) is validated and a backing statement reaches the validator. Due to the high complexity of the subsystem, I rewrote parts of it so that we don't concern ourselves with candidates which form cycles or which form parachain forks. We now have "Fragment chains" instead of "Fragment trees". This greatly simplifies some of the code and is a compromise we can make. We just need to make sure that cycle-producing parachains don't brick the relay chain and that fork-producing parachains can still make some progress (on one core at least). The only forks that are allowed are those on the relay chain, obviously. Unconnected candidates are kept in the `CandidateStorage` and whenever a new candidate is introduced, we try to repopulate the chain with as many candidates as we can. Also fixes paritytech#3219 Guide changes will be done as part of: paritytech#3699 TODOs: - [x] see if we can replace the `Cow` over the candidate commitments with an `Arc` over the entire `ProspectiveCandidate`. It's only being overwritten in unit tests. We can work around that. - [x] finish fragment_chain unit tests - [x] add more prospective-parachains subsystem tests - [x] test with zombienet what happens if a parachain is creating cycles (it should not brick the relay chain). - [x] test with zombienet a parachain that is creating forks. it should keep producing blocks from time to time (one bad collator should not DOS the parachain, even if throughput decreases) - [x] add some more logs and metrics - [x] add prdoc and remove the "silent" label --------- Signed-off-by: Andrei Sandu <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>
Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't know the parent candidate yet). Needed for elastic scaling: paritytech#3541. Without this, candidate B will not be validated and backed until candidate A (its parent) is validated and a backing statement reaches the validator. Due to the high complexity of the subsystem, I rewrote parts of it so that we don't concern ourselves with candidates which form cycles or which form parachain forks. We now have "Fragment chains" instead of "Fragment trees". This greatly simplifies some of the code and is a compromise we can make. We just need to make sure that cycle-producing parachains don't brick the relay chain and that fork-producing parachains can still make some progress (on one core at least). The only forks that are allowed are those on the relay chain, obviously. Unconnected candidates are kept in the `CandidateStorage` and whenever a new candidate is introduced, we try to repopulate the chain with as many candidates as we can. Also fixes paritytech#3219 Guide changes will be done as part of: paritytech#3699 TODOs: - [x] see if we can replace the `Cow` over the candidate commitments with an `Arc` over the entire `ProspectiveCandidate`. It's only being overwritten in unit tests. We can work around that. - [x] finish fragment_chain unit tests - [x] add more prospective-parachains subsystem tests - [x] test with zombienet what happens if a parachain is creating cycles (it should not brick the relay chain). - [x] test with zombienet a parachain that is creating forks. it should keep producing blocks from time to time (one bad collator should not DOS the parachain, even if throughput decreases) - [x] add some more logs and metrics - [x] add prdoc and remove the "silent" label --------- Signed-off-by: Andrei Sandu <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good update
Makes paritytech#4035 easier to review
Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't know the parent candidate yet). Needed for elastic scaling: paritytech#3541. Without this, candidate B will not be validated and backed until candidate A (its parent) is validated and a backing statement reaches the validator. Due to the high complexity of the subsystem, I rewrote parts of it so that we don't concern ourselves with candidates which form cycles or which form parachain forks. We now have "Fragment chains" instead of "Fragment trees". This greatly simplifies some of the code and is a compromise we can make. We just need to make sure that cycle-producing parachains don't brick the relay chain and that fork-producing parachains can still make some progress (on one core at least). The only forks that are allowed are those on the relay chain, obviously. Unconnected candidates are kept in the `CandidateStorage` and whenever a new candidate is introduced, we try to repopulate the chain with as many candidates as we can. Also fixes paritytech#3219 Guide changes will be done as part of: paritytech#3699 TODOs: - [x] see if we can replace the `Cow` over the candidate commitments with an `Arc` over the entire `ProspectiveCandidate`. It's only being overwritten in unit tests. We can work around that. - [x] finish fragment_chain unit tests - [x] add more prospective-parachains subsystem tests - [x] test with zombienet what happens if a parachain is creating cycles (it should not brick the relay chain). - [x] test with zombienet a parachain that is creating forks. it should keep producing blocks from time to time (one bad collator should not DOS the parachain, even if throughput decreases) - [x] add some more logs and metrics - [x] add prdoc and remove the "silent" label --------- Signed-off-by: Andrei Sandu <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>
Resolves #4800 # Problem In #4035, we removed support for parachain forks and cycles and added support for backing unconnected candidates (candidates for which we don't yet know the full path to the latest included block), which is useful for elastic scaling (parachains using multiple cores). Removing support for backing forks turned out to be a bad idea, as there are legitimate cases for a parachain to fork (if they have other consensus mechanism for example, like BABE or PoW). This leads to validators getting lower backing rewards (depending on whether they back the winning fork or not) and a higher pressure on only the half of the backing group (during availability-distribution for example). Since we don't yet have approval voting rewards, backing rewards are a pretty big deal (which may change in the future). # Description A backing group is now allowed to back forks. Once a candidate becomes backed (has the minimum backing votes), we don't accept new forks unless they adhere to the new fork selection rule (have a lower candidate hash). This helps with keeping the implementation simpler, since forks will only be taken into account for candidates which are not backed yet (only seconded). Having this fork selection rule also helps with reducing the work backing validators need to do, since they have a shared way of picking the winning fork. Once they see a candidate backed, they can all decide to back a fork and not accept new ones. But they still accept new ones during the seconding phase (until the backing quorum is reached). Therefore, a block author which is not part of the backing group will likely not even see the forks (only the winning one). Just as before, a parachain producing forks will still not be able to leverage elastic scaling but will still work with a single core. Also, cycles are still not accepted. ## Some implementation details `CandidateStorage` is no longer a subsystem-wide construct. It was previously holding candidates from all relay chain forks and complicated the code. Each fragment chain now holds their candidate chain and their potential candidates. This should not increase the storage consumption since the heavy candidate data is already wrapped in an Arc and shared. It however allows for great simplifications and increase readability. `FragmentChain`s are now only creating a chain with backed candidates and the fork selection rule. As said before, `FragmentChain`s are now also responsible for maintaining their own potential candidate storage. Since we no longer have the subsytem-wide `CandidateStorage`, when getting a new leaf update, we use the storage of our latest ancestor, which may contain candidates seconded/backed that are still in scope. When a candidate is backed, the fragment chains which hold it are recreated (due to the fork selection rule, it could trigger a "reorg" of the fragment chain). I generally tried to simplify the subsystem and not introduce unneccessary optimisations that would otherwise complicate the code and not gain us much (fragment chains wouldn't realistically ever hold many candidates) TODO: - [x] update metrics - [x] update docs and comments - [x] fix and add unit tests - [x] tested with fork-producing parachain - [x] tested with cycle-producing parachain - [x] versi test - [x] prdoc
Resolves #4800 In #4035, we removed support for parachain forks and cycles and added support for backing unconnected candidates (candidates for which we don't yet know the full path to the latest included block), which is useful for elastic scaling (parachains using multiple cores). Removing support for backing forks turned out to be a bad idea, as there are legitimate cases for a parachain to fork (if they have other consensus mechanism for example, like BABE or PoW). This leads to validators getting lower backing rewards (depending on whether they back the winning fork or not) and a higher pressure on only the half of the backing group (during availability-distribution for example). Since we don't yet have approval voting rewards, backing rewards are a pretty big deal (which may change in the future). A backing group is now allowed to back forks. Once a candidate becomes backed (has the minimum backing votes), we don't accept new forks unless they adhere to the new fork selection rule (have a lower candidate hash). This helps with keeping the implementation simpler, since forks will only be taken into account for candidates which are not backed yet (only seconded). Having this fork selection rule also helps with reducing the work backing validators need to do, since they have a shared way of picking the winning fork. Once they see a candidate backed, they can all decide to back a fork and not accept new ones. But they still accept new ones during the seconding phase (until the backing quorum is reached). Therefore, a block author which is not part of the backing group will likely not even see the forks (only the winning one). Just as before, a parachain producing forks will still not be able to leverage elastic scaling but will still work with a single core. Also, cycles are still not accepted. `CandidateStorage` is no longer a subsystem-wide construct. It was previously holding candidates from all relay chain forks and complicated the code. Each fragment chain now holds their candidate chain and their potential candidates. This should not increase the storage consumption since the heavy candidate data is already wrapped in an Arc and shared. It however allows for great simplifications and increase readability. `FragmentChain`s are now only creating a chain with backed candidates and the fork selection rule. As said before, `FragmentChain`s are now also responsible for maintaining their own potential candidate storage. Since we no longer have the subsytem-wide `CandidateStorage`, when getting a new leaf update, we use the storage of our latest ancestor, which may contain candidates seconded/backed that are still in scope. When a candidate is backed, the fragment chains which hold it are recreated (due to the fork selection rule, it could trigger a "reorg" of the fragment chain). I generally tried to simplify the subsystem and not introduce unneccessary optimisations that would otherwise complicate the code and not gain us much (fragment chains wouldn't realistically ever hold many candidates) TODO: - [x] update metrics - [x] update docs and comments - [x] fix and add unit tests - [x] tested with fork-producing parachain - [x] tested with cycle-producing parachain - [x] versi test - [x] prdoc
Reworks prospective-parachains so that we allow a number of unconnected candidates (for which we don't know the parent candidate yet). Needed for elastic scaling: #3541. Without this, candidate B will not be validated and backed until candidate A (its parent) is validated and a backing statement reaches the validator.
Due to the high complexity of the subsystem, I rewrote parts of it so that we don't concern ourselves with candidates which form cycles or which form parachain forks. We now have "Fragment chains" instead of "Fragment trees". This greatly simplifies some of the code and is a compromise we can make. We just need to make sure that cycle-producing parachains don't brick the relay chain and that fork-producing parachains can still make some progress (on one core at least). The only forks that are allowed are those on the relay chain, obviously.
Unconnected candidates are kept in the
CandidateStorage
and whenever a new candidate is introduced, we try to repopulate the chain with as many candidates as we can.Also fixes #3219
Guide changes will be done as part of: #3699
TODOs:
Cow
over the candidate commitments with anArc
over the entireProspectiveCandidate
. It's only being overwritten in unit tests. We can work around that.