-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add elastic scaling MVP guide #4663
Conversation
//! 1. **A parachain can use at most 3 cores at a time.** This limitation stems from the fact that | ||
//! every parablock has an execution timeout of 2 seconds and the relay chain block authoring | ||
//! takes 6 seconds. Therefore, assuming parablock authoring is sequential, a collator only has | ||
//! enough time to build 3 candidates in a relay chain slot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that using full 2s of execution is the only usecase, it is also possible to use little computation but reach PoV limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I created this guide assuming parachains that want to use multiple cores would do so to achieve higher throughput, but it can be also used to achieve lower latency (at least to inclusion in a candidate). I'll rephrase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
higher throughput,
can also mean more data.
//! 1. Increase the `BLOCK_PROCESSING_VELOCITY` to the desired value. In this example, 3. | ||
//! | ||
//! ```rust | ||
//! const BLOCK_PROCESSING_VELOCITY: u32 = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! const BLOCK_PROCESSING_VELOCITY: u32 = 3; | |
//! const BLOCK_PROCESSING_VELOCITY: u32 = (RELAY_CHAIN_SLOT_TIME / MIN_SLOT_DURATION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use docify please :)
//! 2. Decrease the `MILLISECS_PER_BLOCK` to the desired value. In this example, 2000. | ||
//! | ||
//! ```rust | ||
//! const MILLISECS_PER_BLOCK: u32 = 2000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! const MILLISECS_PER_BLOCK: u32 = 2000; | |
//! const MILLISECS_PER_BLOCK: u32 = MIN_SLOT_DURATION; |
//! | ||
//! **This guide assumes full familiarity with Asynchronous Backing and its terminology, as defined | ||
//! in <https://wiki.polkadot.network/docs/maintain-guides-async-backing>. | ||
//! Furthermore, the parachain should have already been upgraded according to the guide.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also link to #4363 once it is merged.
Moreover, I think you can also benefit a bit from the suggestiosn similar to #4363 (comment)
//! still [work in progress](https://github.com/paritytech/polkadot-sdk/issues/1829). | ||
//! Below are described the current limitations of the MVP: | ||
//! | ||
//! 1. **Limited core count**. Parachain block authoring is sequential, so the second block will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know that these 3 para-blocks are still valid when imported in 3 parallel cores?
For example, there are 2 tx in each parablock. The collator proposes [t1, t2, t3, t4, t5, t6]
and they are all valid. But the validity of t6
depends on the execution of t1
. When imported in 3 cores, t1
and t6
are no longer present.
In general, I would assume all of this to be fixed in the cumulus block building code. My question is, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 blocks are expected to form a chain, the ones that don't will not be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 blocks are expected to form a chain, the ones that don't will not be included.
yes, also a candidate will not be included until all of its ancestors are included. if one ancestor is not included (times out availability) or is concluded invalid via a dispute, all of its descendants will also be evicted from the cores. So we only deal with candidate chains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still don't get this.
@sandreim if they form a chain, and part of the chain is executed in one core and part of it in another core, how does either of the cores check that the whole thing is a chain?
in my example, [t1, t2, t3, t4, t5, t6]
, [t1, t2, t3]
goes into one core, [t4, t5, t6]
into another. The whole [t1 -> t6]
indeed forms a chain, and execution of t5 depends on the execution of t2.
Perhaps what you mean to say is that the transactions that go into different cores must in fact be independent of one another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transactions are not independent. We achieve parallel execution even in that case, and still check they form a chain by passing in the appropriate validation inputs (
pub struct PersistedValidationData<H = Hash, N = BlockNumber> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the answer?
[t1, t2, t3]
goes into one core, [t4, t5, t6]
, but the PoV of the latter contains the full execution of the former?
I think this is fine, but truthfully to scale up, I think different transactions going into different cores must be independent, or else the system can only scale as much as you can jack up one collator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the PoV of the latter contains the full execution of the former?
PoV of [t4, t5, t6]
would refer to the state post [t1, t2, t3]
execution.
I think different transactions going into different cores must be independent, or else the system can only scale as much as you can jack up one collator
One way of achieving that without jacking up one collator would be to have a DAG instead of a blockchain (two blocks having the same parent state). But then you'd need to somehow ensure they are truly independent. This could be done with e.g. specifying dependencies in the transactions themselves (a ala Solana or Ethereum access lists).
Another way would be to rely on multiple CPU cores of a collator and implement execution on the collator side differently with optimistic concurrency control (ala Monad). This only requires modification on the collator side and does not affect transaction format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks @ordian.
I totally agree with all of your directions as well. I am not sure if you have seen it or not, but my MSC Thesis was on the same topic 🙈 https://github.com/kianenigma/SonicChain. I think what I have done here is similar to access list, and it should be quite easy to add to FRAME and Substrate: each tx to declare, via its code author, what storage keys it "thinks" it will access. Then the collators can easily agree among themselves to collate non-conflicting transactions.
This is a problem that is best solve from the collator side, and once there is a lot of demand. Polkadot is already doing what it should do, and should not do any "magic" to handle this.
Once there is more demand:
- Either collators just jack up, as they kinda are expected to do now. This won't scale a lot but it will for a bit.
- I think the access list stuff is super cool and will scale
- OCC is fancy but similarly doesn't scale, because there is only so many CPU cores, and you are still bound to one collator somehow filling up 8 Polkadot cores. Option 2 is much more powerful, because you can enable 8 collators to fill 8 blocks simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCC is fancy but similarly doesn't scale, because there is only so many CPU cores, and you are still bound to one collator somehow filling up 8 Polkadot cores. Option 2 is much more powerful, because you can enable 8 collators to fill 8 blocks simultaneously.
I agree here only partially. First, you can't produce (para)blocks at a rate faster than collators/full-nodes can import them. Unless they are not checking everything themselves. But even if they are not checking, this assumes that the bottleneck will be CPU and not storage/IO, which is not currently the case. Even with NOMT and other future optimizations, you can't accept transactions faster than you can modify the state. You need to know the latest state in order to check transactions. Unless we're talking about sharding parachain's state itself.
Another argument is that single threaded performance is going to reach a plateau eventually (whether it's Moor's law or physics) and nowadays we see even smartphones have 8 cores, so why not utilize them all instead of doing everything single-threaded?
That being said, I think options 2 and 3 are composable, you can do both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current status quo is that we rely on 1 (beefy collators). 2 for sure is something that can scale well, but it seems to be complicated and is not really compatible with the relay chain which expects chains not a DAG. #4696 (comment) shows how the limitations of what is possible with ref hw and 5 collators.
We did a nice brainstorming session with @skunert and @eskimor on the subject some time ago. We think that best way to go forward is to implement a transaction streaming mechanism. At the begging of each slot, the block author sends transactions to the next block author as it pushes them in the current block. By the time it announces the block, the next author should already have all state changes applied and doesn't need to wait to import it and can immediately start building his own block. And so on.
If that is not enough, next block author can start to speculatively build it's next block update the transactions and state as it learns what the current author is putting in his blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems best to me for you to first coordinate with #4363, possily push it to completion of Radha is not available, then build this on top of.
That's a good suggestion, I think we should align on better terminology and minimize the amount of changes required to enable these features. |
Applied changes to #4363 Needs one more approval for merge |
Would be happy to review this once it is updated based on the previous guides. |
Ok I thought about using docify here. But how can I, considering that the parachain template is not updated for elastic scaling yet? (and we don't plan to update it yet as it's still an experimental MVP). |
another problem I see with docify is that it only works on types. you cannot annotate and import blocks of code (unless you define artificial functions for them) |
I tried using docify as much as it made sense. Please review |
* master: add elastic scaling MVP guide (#4663) Send PeerViewChange with high priority (#4755) [ci] Update forklift in CI image (#5032) Adjust base value for statement-distribution regression tests (#5028) [pallet_contracts] Add support for transient storage in contracts host functions (#4566) [1 / 5] Optimize logic for gossiping assignments (#4848) Remove `pallet-getter` usage from pallet-session (#4972) command-action: added scoped permissions to the github tokens (#5016) net/litep2p: Propagate ValuePut events to the network backend (#5018) rpc: add back rpc logger (#4952) Updated substrate-relay version for tests (#5017) Remove most all usage of `sp-std` (#5010) Use sp_runtime::traits::BadOrigin (#5011)
Resolves #4468 Gives instructions on how to enable elastic scaling MVP to parachain teams. Still a draft because it depends on further changes we make to the slot-based collator: #4097 Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet
Resolves paritytech#4468 Gives instructions on how to enable elastic scaling MVP to parachain teams. Still a draft because it depends on further changes we make to the slot-based collator: paritytech#4097 Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet
* master: (125 commits) add elastic scaling MVP guide (#4663) Send PeerViewChange with high priority (#4755) [ci] Update forklift in CI image (#5032) Adjust base value for statement-distribution regression tests (#5028) [pallet_contracts] Add support for transient storage in contracts host functions (#4566) [1 / 5] Optimize logic for gossiping assignments (#4848) Remove `pallet-getter` usage from pallet-session (#4972) command-action: added scoped permissions to the github tokens (#5016) net/litep2p: Propagate ValuePut events to the network backend (#5018) rpc: add back rpc logger (#4952) Updated substrate-relay version for tests (#5017) Remove most all usage of `sp-std` (#5010) Use sp_runtime::traits::BadOrigin (#5011) network/tx: Ban peers with tx that fail to decode (#5002) Try State Hook for Bounties (#4563) [statement-distribution] Add metrics for distributed statements in V2 (#4554) added sync command (#4818) Bridges V2 refactoring backport and `pallet_bridge_messages` simplifications (#4935) xcm-executor: Improve logging (#4996) Remove usage of `sp-std` on templates (#5001) ...
Resolves paritytech#4468 Gives instructions on how to enable elastic scaling MVP to parachain teams. Still a draft because it depends on further changes we make to the slot-based collator: paritytech#4097 Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet
* master: (130 commits) add elastic scaling MVP guide (#4663) Send PeerViewChange with high priority (#4755) [ci] Update forklift in CI image (#5032) Adjust base value for statement-distribution regression tests (#5028) [pallet_contracts] Add support for transient storage in contracts host functions (#4566) [1 / 5] Optimize logic for gossiping assignments (#4848) Remove `pallet-getter` usage from pallet-session (#4972) command-action: added scoped permissions to the github tokens (#5016) net/litep2p: Propagate ValuePut events to the network backend (#5018) rpc: add back rpc logger (#4952) Updated substrate-relay version for tests (#5017) Remove most all usage of `sp-std` (#5010) Use sp_runtime::traits::BadOrigin (#5011) network/tx: Ban peers with tx that fail to decode (#5002) Try State Hook for Bounties (#4563) [statement-distribution] Add metrics for distributed statements in V2 (#4554) added sync command (#4818) Bridges V2 refactoring backport and `pallet_bridge_messages` simplifications (#4935) xcm-executor: Improve logging (#4996) Remove usage of `sp-std` on templates (#5001) ...
Resolves #4468
Gives instructions on how to enable elastic scaling MVP to parachain teams.
Still a draft because it depends on further changes we make to the slot-based collator: #4097
Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet