add elastic scaling MVP guide #4663

alindima · 2024-05-31T14:43:39Z

Resolves #4468

Gives instructions on how to enable elastic scaling MVP to parachain teams.

Still a draft because it depends on further changes we make to the slot-based collator: #4097

Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet

sandreim · 2024-05-31T14:48:41Z

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

+//! 1. **A parachain can use at most 3 cores at a time.** This limitation stems from the fact that
+//!    every parablock has an execution timeout of 2 seconds and the relay chain block authoring
+//!    takes 6 seconds. Therefore, assuming parablock authoring is sequential, a collator only has
+//!    enough time to build 3 candidates in a relay chain slot.


This assumes that using full 2s of execution is the only usecase, it is also possible to use little computation but reach PoV limit.

yeah I created this guide assuming parachains that want to use multiple cores would do so to achieve higher throughput, but it can be also used to achieve lower latency (at least to inclusion in a candidate). I'll rephrase

higher throughput,

can also mean more data.

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

sandreim · 2024-05-31T15:27:20Z

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

+//! 1. Increase the `BLOCK_PROCESSING_VELOCITY` to the desired value. In this example, 3.
+//!
+//!      ```rust
+//!      const BLOCK_PROCESSING_VELOCITY: u32 = 3;


Suggested change

//! const BLOCK_PROCESSING_VELOCITY: u32 = 3;

//! const BLOCK_PROCESSING_VELOCITY: u32 = (RELAY_CHAIN_SLOT_TIME / MIN_SLOT_DURATION);

use docify please :)

sandreim · 2024-05-31T15:28:55Z

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

+//! 2. Decrease the `MILLISECS_PER_BLOCK` to the desired value. In this example, 2000.
+//!
+//!      ```rust
+//!      const MILLISECS_PER_BLOCK: u32 = 2000;


Suggested change

//! const MILLISECS_PER_BLOCK: u32 = 2000;

//! const MILLISECS_PER_BLOCK: u32 = MIN_SLOT_DURATION;

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

kianenigma · 2024-06-04T06:51:19Z

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

+//!
+//! **This guide assumes full familiarity with Asynchronous Backing and its terminology, as defined
+//! in <https://wiki.polkadot.network/docs/maintain-guides-async-backing>.
+//! Furthermore, the parachain should have already been upgraded according to the guide.**


You can also link to #4363 once it is merged.

Moreover, I think you can also benefit a bit from the suggestiosn similar to #4363 (comment)

PTAL: https://paritytech.github.io/polkadot-sdk/master/polkadot_sdk_docs/meta_contributing/index.html#why-rust-docs

kianenigma · 2024-06-04T06:54:16Z

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

+//! still [work in progress](https://github.com/paritytech/polkadot-sdk/issues/1829).
+//! Below are described the current limitations of the MVP:
+//!
+//! 1. **Limited core count**. Parachain block authoring is sequential, so the second block will


How do we know that these 3 para-blocks are still valid when imported in 3 parallel cores?

For example, there are 2 tx in each parablock. The collator proposes [t1, t2, t3, t4, t5, t6] and they are all valid. But the validity of t6 depends on the execution of t1. When imported in 3 cores, t1 and t6 are no longer present.

In general, I would assume all of this to be fixed in the cumulus block building code. My question is, does it?

These 3 blocks are expected to form a chain, the ones that don't will not be included.

These 3 blocks are expected to form a chain, the ones that don't will not be included.

yes, also a candidate will not be included until all of its ancestors are included. if one ancestor is not included (times out availability) or is concluded invalid via a dispute, all of its descendants will also be evicted from the cores. So we only deal with candidate chains

Sorry, I still don't get this.

@sandreim if they form a chain, and part of the chain is executed in one core and part of it in another core, how does either of the cores check that the whole thing is a chain?

in my example, [t1, t2, t3, t4, t5, t6], [t1, t2, t3] goes into one core, [t4, t5, t6] into another. The whole [t1 -> t6] indeed forms a chain, and execution of t5 depends on the execution of t2.

Perhaps what you mean to say is that the transactions that go into different cores must in fact be independent of one another?

The transactions are not independent. We achieve parallel execution even in that case, and still check they form a chain by passing in the appropriate validation inputs (

polkadot-sdk/polkadot/primitives/src/v7/mod.rs

Line 661 in d1979d4

pub struct PersistedValidationData<H = Hash, N = BlockNumber> {

) . We can validate t2 because we already have the parent head data of t1 from the collator of t2. So we can correctly construct the inputs and the PoV contains the right data ( t2 was built after t1 by the collator).

Is this the answer?

[t1, t2, t3] goes into one core, [t4, t5, t6], but the PoV of the latter contains the full execution of the former?

I think this is fine, but truthfully to scale up, I think different transactions going into different cores must be independent, or else the system can only scale as much as you can jack up one collator.

but the PoV of the latter contains the full execution of the former?

PoV of [t4, t5, t6] would refer to the state post [t1, t2, t3] execution.

I think different transactions going into different cores must be independent, or else the system can only scale as much as you can jack up one collator

One way of achieving that without jacking up one collator would be to have a DAG instead of a blockchain (two blocks having the same parent state). But then you'd need to somehow ensure they are truly independent. This could be done with e.g. specifying dependencies in the transactions themselves (a ala Solana or Ethereum access lists).

Another way would be to rely on multiple CPU cores of a collator and implement execution on the collator side differently with optimistic concurrency control (ala Monad). This only requires modification on the collator side and does not affect transaction format.

Okay, thanks @ordian.

I totally agree with all of your directions as well. I am not sure if you have seen it or not, but my MSC Thesis was on the same topic 🙈 https://github.com/kianenigma/SonicChain. I think what I have done here is similar to access list, and it should be quite easy to add to FRAME and Substrate: each tx to declare, via its code author, what storage keys it "thinks" it will access. Then the collators can easily agree among themselves to collate non-conflicting transactions.

This is a problem that is best solve from the collator side, and once there is a lot of demand. Polkadot is already doing what it should do, and should not do any "magic" to handle this.

Once there is more demand:

Either collators just jack up, as they kinda are expected to do now. This won't scale a lot but it will for a bit.

I think the access list stuff is super cool and will scale

OCC is fancy but similarly doesn't scale, because there is only so many CPU cores, and you are still bound to one collator somehow filling up 8 Polkadot cores. Option 2 is much more powerful, because you can enable 8 collators to fill 8 blocks simultaneously.

OCC is fancy but similarly doesn't scale, because there is only so many CPU cores, and you are still bound to one collator somehow filling up 8 Polkadot cores. Option 2 is much more powerful, because you can enable 8 collators to fill 8 blocks simultaneously.

I agree here only partially. First, you can't produce (para)blocks at a rate faster than collators/full-nodes can import them. Unless they are not checking everything themselves. But even if they are not checking, this assumes that the bottleneck will be CPU and not storage/IO, which is not currently the case. Even with NOMT and other future optimizations, you can't accept transactions faster than you can modify the state. You need to know the latest state in order to check transactions. Unless we're talking about sharding parachain's state itself.

Another argument is that single threaded performance is going to reach a plateau eventually (whether it's Moor's law or physics) and nowadays we see even smartphones have 8 cores, so why not utilize them all instead of doing everything single-threaded?

That being said, I think options 2 and 3 are composable, you can do both.

Current status quo is that we rely on 1 (beefy collators). 2 for sure is something that can scale well, but it seems to be complicated and is not really compatible with the relay chain which expects chains not a DAG. #4696 (comment) shows how the limitations of what is possible with ref hw and 5 collators.

We did a nice brainstorming session with @skunert and @eskimor on the subject some time ago. We think that best way to go forward is to implement a transaction streaming mechanism. At the begging of each slot, the block author sends transactions to the next block author as it pushes them in the current block. By the time it announces the block, the next author should already have all state changes applied and doesn't need to wait to import it and can immediately start building his own block. And so on.

If that is not enough, next block author can start to speculatively build it's next block update the transactions and state as it learns what the current author is putting in his blocks.

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs

kianenigma

It seems best to me for you to first coordinate with #4363, possily push it to completion of Radha is not available, then build this on top of.

sandreim · 2024-06-04T07:25:46Z

It seems best to me for you to first coordinate with #4363, possily push it to completion of Radha is not available, then build this on top of.

That's a good suggestion, I think we should align on better terminology and minimize the amount of changes required to enable these features.

DrW3RK · 2024-06-10T04:24:12Z

Applied changes to #4363 Needs one more approval for merge

kianenigma · 2024-06-13T02:56:36Z

Applied changes to #4363 Needs one more approval for merge

Almost, but not yet :)

kianenigma · 2024-07-09T16:40:50Z

Would be happy to review this once it is updated based on the previous guides.

alindima · 2024-07-10T09:02:39Z

Would be happy to review this once it is updated based on the previous guides.

Thanks, I'll let you know when this is ready. I've put this on hold a bit waiting for #4097 to be merged (and it has been) and getting more specific performance numbers here: #4696

…aling-mvp-guide

alindima · 2024-07-11T07:30:05Z

Ok I thought about using docify here. But how can I, considering that the parachain template is not updated for elastic scaling yet? (and we don't plan to update it yet as it's still an experimental MVP).
AFAICT I can only reference existing code which is not really helpful. We can use docify only after we modify the template, by which time the async backing guide will no longer be able to use docify. @kianenigma

alindima · 2024-07-12T07:03:04Z

another problem I see with docify is that it only works on types. you cannot annotate and import blocks of code (unless you define artificial functions for them)

alindima · 2024-07-12T08:36:16Z

I tried using docify as much as it made sense. Please review

* master: add elastic scaling MVP guide (#4663) Send PeerViewChange with high priority (#4755) [ci] Update forklift in CI image (#5032) Adjust base value for statement-distribution regression tests (#5028) [pallet_contracts] Add support for transient storage in contracts host functions (#4566) [1 / 5] Optimize logic for gossiping assignments (#4848) Remove `pallet-getter` usage from pallet-session (#4972) command-action: added scoped permissions to the github tokens (#5016) net/litep2p: Propagate ValuePut events to the network backend (#5018) rpc: add back rpc logger (#4952) Updated substrate-relay version for tests (#5017) Remove most all usage of `sp-std` (#5010) Use sp_runtime::traits::BadOrigin (#5011)

Resolves #4468 Gives instructions on how to enable elastic scaling MVP to parachain teams. Still a draft because it depends on further changes we make to the slot-based collator: #4097 Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet

Resolves paritytech#4468 Gives instructions on how to enable elastic scaling MVP to parachain teams. Still a draft because it depends on further changes we make to the slot-based collator: paritytech#4097 Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet

* master: (125 commits) add elastic scaling MVP guide (#4663) Send PeerViewChange with high priority (#4755) [ci] Update forklift in CI image (#5032) Adjust base value for statement-distribution regression tests (#5028) [pallet_contracts] Add support for transient storage in contracts host functions (#4566) [1 / 5] Optimize logic for gossiping assignments (#4848) Remove `pallet-getter` usage from pallet-session (#4972) command-action: added scoped permissions to the github tokens (#5016) net/litep2p: Propagate ValuePut events to the network backend (#5018) rpc: add back rpc logger (#4952) Updated substrate-relay version for tests (#5017) Remove most all usage of `sp-std` (#5010) Use sp_runtime::traits::BadOrigin (#5011) network/tx: Ban peers with tx that fail to decode (#5002) Try State Hook for Bounties (#4563) [statement-distribution] Add metrics for distributed statements in V2 (#4554) added sync command (#4818) Bridges V2 refactoring backport and `pallet_bridge_messages` simplifications (#4935) xcm-executor: Improve logging (#4996) Remove usage of `sp-std` on templates (#5001) ...

Resolves paritytech#4468 Gives instructions on how to enable elastic scaling MVP to parachain teams. Still a draft because it depends on further changes we make to the slot-based collator: paritytech#4097 Parachains cannot use this yet because the collator was not released and no relay chain network has been configured for elastic scaling yet

* master: (130 commits) add elastic scaling MVP guide (#4663) Send PeerViewChange with high priority (#4755) [ci] Update forklift in CI image (#5032) Adjust base value for statement-distribution regression tests (#5028) [pallet_contracts] Add support for transient storage in contracts host functions (#4566) [1 / 5] Optimize logic for gossiping assignments (#4848) Remove `pallet-getter` usage from pallet-session (#4972) command-action: added scoped permissions to the github tokens (#5016) net/litep2p: Propagate ValuePut events to the network backend (#5018) rpc: add back rpc logger (#4952) Updated substrate-relay version for tests (#5017) Remove most all usage of `sp-std` (#5010) Use sp_runtime::traits::BadOrigin (#5011) network/tx: Ban peers with tx that fail to decode (#5002) Try State Hook for Bounties (#4563) [statement-distribution] Add metrics for distributed statements in V2 (#4554) added sync command (#4818) Bridges V2 refactoring backport and `pallet_bridge_messages` simplifications (#4935) xcm-executor: Improve logging (#4996) Remove usage of `sp-std` on templates (#5001) ...

add elastic scaling MVP guide

f948230

alindima added R0-silent Changes should not be mentioned in any release notes T11-documentation This PR/Issue is related to documentation. labels May 31, 2024

alindima marked this pull request as draft May 31, 2024 14:44

sandreim reviewed May 31, 2024

View reviewed changes

some review comments

353c56a

kianenigma reviewed Jun 4, 2024

View reviewed changes

docs/sdk/src/guides/enable_elastic_scaling_mvp.rs Show resolved Hide resolved

kianenigma requested changes Jun 4, 2024

View reviewed changes

alindima mentioned this pull request Jun 17, 2024

Add Async Backing guide #4363

Merged

skunert self-requested a review June 25, 2024 15:23

Merge remote-tracking branch 'origin/master' into alindima/elastic-sc…

f5f9181

…aling-mvp-guide

more cosmetics

e2fb61f

alindima mentioned this pull request Jul 11, 2024

allow use of slot based collator in parachain template #5007

Closed

use docify where possible

fa2dbb9

alindima removed the R0-silent Changes should not be mentioned in any release notes label Jul 12, 2024

add prdoc

dd0ed5e

alindima marked this pull request as ready for review July 12, 2024 08:35

ignore

bc48720

kianenigma approved these changes Jul 16, 2024

View reviewed changes

sandreim approved these changes Jul 17, 2024

View reviewed changes

sandreim added this pull request to the merge queue Jul 17, 2024

Merged via the queue into master with commit 0db5092 Jul 17, 2024
157 of 162 checks passed

sandreim deleted the alindima/elastic-scaling-mvp-guide branch July 17, 2024 09:53

This was referenced Aug 21, 2024

Update polkadot-sdk from v1.11.0 to stable2407 moondance-labs/tanssi#659

Closed

Update polkadot-sdk from v1.11.0 to stable2407 moonbeam-foundation/moonbeam#2912

Closed

TDemeco mentioned this pull request Oct 7, 2024

feat: ⏫ upgrade to Polkadot SDK stable2407 Moonsong-Labs/storage-hub#222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add elastic scaling MVP guide #4663

add elastic scaling MVP guide #4663

alindima commented May 31, 2024

sandreim May 31, 2024

alindima Jun 3, 2024

eskimor Jun 19, 2024

sandreim May 31, 2024

kianenigma Jun 4, 2024

sandreim May 31, 2024

kianenigma Jun 4, 2024 •

edited

Loading

kianenigma Jun 4, 2024

sandreim Jun 4, 2024

alindima Jun 4, 2024

kianenigma Jul 21, 2024

sandreim Jul 22, 2024 •

edited

Loading

kianenigma Jul 22, 2024

ordian Jul 22, 2024 •

edited

Loading

kianenigma Jul 22, 2024

ordian Jul 22, 2024

sandreim Jul 22, 2024 •

edited

Loading

kianenigma left a comment

sandreim commented Jun 4, 2024

DrW3RK commented Jun 10, 2024

kianenigma commented Jun 13, 2024

kianenigma commented Jul 9, 2024

alindima commented Jul 10, 2024

alindima commented Jul 11, 2024

alindima commented Jul 12, 2024 •

edited

Loading

alindima commented Jul 12, 2024

	//! const BLOCK_PROCESSING_VELOCITY: u32 = 3;
	//! const BLOCK_PROCESSING_VELOCITY: u32 = (RELAY_CHAIN_SLOT_TIME / MIN_SLOT_DURATION);

	//! const MILLISECS_PER_BLOCK: u32 = 2000;
	//! const MILLISECS_PER_BLOCK: u32 = MIN_SLOT_DURATION;

add elastic scaling MVP guide #4663

add elastic scaling MVP guide #4663

Conversation

alindima commented May 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kianenigma Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreim Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ordian Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreim Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

kianenigma left a comment

Choose a reason for hiding this comment

sandreim commented Jun 4, 2024

DrW3RK commented Jun 10, 2024

kianenigma commented Jun 13, 2024

kianenigma commented Jul 9, 2024

alindima commented Jul 10, 2024

alindima commented Jul 11, 2024

alindima commented Jul 12, 2024 • edited Loading

alindima commented Jul 12, 2024

kianenigma Jun 4, 2024 •

edited

Loading

sandreim Jul 22, 2024 •

edited

Loading

ordian Jul 22, 2024 •

edited

Loading

sandreim Jul 22, 2024 •

edited

Loading

alindima commented Jul 12, 2024 •

edited

Loading