Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add elastic scaling MVP guide #4663

Merged
merged 7 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions docs/sdk/src/guides/enable_elastic_scaling_mvp.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
//! # Enable elastic scaling MVP for a parachain
//!
//! **This guide assumes full familiarity with Asynchronous Backing and its terminology, as defined
//! in <https://wiki.polkadot.network/docs/maintain-guides-async-backing>.
//! Furthermore, the parachain should have already been upgraded according to the guide.**
Copy link
Contributor

@kianenigma kianenigma Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also link to #4363 once it is merged.

Moreover, I think you can also benefit a bit from the suggestiosn similar to #4363 (comment)

PTAL: https://paritytech.github.io/polkadot-sdk/master/polkadot_sdk_docs/meta_contributing/index.html#why-rust-docs

//!
//! ## Quick introduction to elastic scaling
//!
//! [Elastic scaling](https://polkadot.network/blog/elastic-scaling-streamling-growth-on-polkadot)
//! is a feature that will enable parachains to seamlessly scale up/down their block space usage in
//! order to increase throughput or lower their latency.
//!
//! At present, with Asynchronous Backing enabled, a parachain can only include a block on the relay
//! chain every 6 seconds, irregardless of how many cores the parachain acquires. Elastic scaling
//! builds further on the 10x throughput increase of Async Backing, enabling collators to submit up
//! to 3 parachain blocks per relay chain block, resulting in a further 3x throughput increase.
//!
//! ## Current limitations of the MVP
//!
//! The full implementation of elastic scaling spans across the entire relay/parachain stack and is
//! still [work in progress](https://github.com/paritytech/polkadot-sdk/issues/1829).
//! Below are described the current limitations of the MVP:
//!
//! 1. **Limited core count**. Parachain block authoring is sequential, so the second block will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that these 3 para-blocks are still valid when imported in 3 parallel cores?

For example, there are 2 tx in each parablock. The collator proposes [t1, t2, t3, t4, t5, t6] and they are all valid. But the validity of t6 depends on the execution of t1. When imported in 3 cores, t1 and t6 are no longer present.

In general, I would assume all of this to be fixed in the cumulus block building code. My question is, does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 blocks are expected to form a chain, the ones that don't will not be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 blocks are expected to form a chain, the ones that don't will not be included.

yes, also a candidate will not be included until all of its ancestors are included. if one ancestor is not included (times out availability) or is concluded invalid via a dispute, all of its descendants will also be evicted from the cores. So we only deal with candidate chains

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still don't get this.

@sandreim if they form a chain, and part of the chain is executed in one core and part of it in another core, how does either of the cores check that the whole thing is a chain?

in my example, [t1, t2, t3, t4, t5, t6], [t1, t2, t3] goes into one core, [t4, t5, t6] into another. The whole [t1 -> t6] indeed forms a chain, and execution of t5 depends on the execution of t2.

Perhaps what you mean to say is that the transactions that go into different cores must in fact be independent of one another?

Copy link
Contributor

@sandreim sandreim Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transactions are not independent. We achieve parallel execution even in that case, and still check they form a chain by passing in the appropriate validation inputs (

pub struct PersistedValidationData<H = Hash, N = BlockNumber> {
) . We can validate t2 because we already have the parent head data of t1 from the collator of t2. So we can correctly construct the inputs and the PoV contains the right data ( t2 was built after t1 by the collator).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the answer?

[t1, t2, t3] goes into one core, [t4, t5, t6], but the PoV of the latter contains the full execution of the former?

I think this is fine, but truthfully to scale up, I think different transactions going into different cores must be independent, or else the system can only scale as much as you can jack up one collator.

Copy link
Member

@ordian ordian Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the PoV of the latter contains the full execution of the former?

PoV of [t4, t5, t6] would refer to the state post [t1, t2, t3] execution.

I think different transactions going into different cores must be independent, or else the system can only scale as much as you can jack up one collator

One way of achieving that without jacking up one collator would be to have a DAG instead of a blockchain (two blocks having the same parent state). But then you'd need to somehow ensure they are truly independent. This could be done with e.g. specifying dependencies in the transactions themselves (a ala Solana or Ethereum access lists).

Another way would be to rely on multiple CPU cores of a collator and implement execution on the collator side differently with optimistic concurrency control (ala Monad). This only requires modification on the collator side and does not affect transaction format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks @ordian.

I totally agree with all of your directions as well. I am not sure if you have seen it or not, but my MSC Thesis was on the same topic 🙈 https://github.com/kianenigma/SonicChain. I think what I have done here is similar to access list, and it should be quite easy to add to FRAME and Substrate: each tx to declare, via its code author, what storage keys it "thinks" it will access. Then the collators can easily agree among themselves to collate non-conflicting transactions.

This is a problem that is best solve from the collator side, and once there is a lot of demand. Polkadot is already doing what it should do, and should not do any "magic" to handle this.

Once there is more demand:

  1. Either collators just jack up, as they kinda are expected to do now. This won't scale a lot but it will for a bit.
  2. I think the access list stuff is super cool and will scale
  3. OCC is fancy but similarly doesn't scale, because there is only so many CPU cores, and you are still bound to one collator somehow filling up 8 Polkadot cores. Option 2 is much more powerful, because you can enable 8 collators to fill 8 blocks simultaneously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCC is fancy but similarly doesn't scale, because there is only so many CPU cores, and you are still bound to one collator somehow filling up 8 Polkadot cores. Option 2 is much more powerful, because you can enable 8 collators to fill 8 blocks simultaneously.

I agree here only partially. First, you can't produce (para)blocks at a rate faster than collators/full-nodes can import them. Unless they are not checking everything themselves. But even if they are not checking, this assumes that the bottleneck will be CPU and not storage/IO, which is not currently the case. Even with NOMT and other future optimizations, you can't accept transactions faster than you can modify the state. You need to know the latest state in order to check transactions. Unless we're talking about sharding parachain's state itself.

Another argument is that single threaded performance is going to reach a plateau eventually (whether it's Moor's law or physics) and nowadays we see even smartphones have 8 cores, so why not utilize them all instead of doing everything single-threaded?

That being said, I think options 2 and 3 are composable, you can do both.

Copy link
Contributor

@sandreim sandreim Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current status quo is that we rely on 1 (beefy collators). 2 for sure is something that can scale well, but it seems to be complicated and is not really compatible with the relay chain which expects chains not a DAG. #4696 (comment) shows how the limitations of what is possible with ref hw and 5 collators.

We did a nice brainstorming session with @skunert and @eskimor on the subject some time ago. We think that best way to go forward is to implement a transaction streaming mechanism. At the begging of each slot, the block author sends transactions to the next block author as it pushes them in the current block. By the time it announces the block, the next author should already have all state changes applied and doesn't need to wait to import it and can immediately start building his own block. And so on.

If that is not enough, next block author can start to speculatively build it's next block update the transactions and state as it learns what the current author is putting in his blocks.

//! start being built only after the previous block is imported. The current block production is
//! capped at 2 seconds of execution. Therefore, assuming the full 2 seconds are used, a
//! parachain can only utilise at most 3 cores in a relay chain slot of 6 seconds. If the full
//! execution time is not being used, higher core counts can be achieved.
//! 2. **Single collator requirement for consistently scaling beyond a core at full authorship
//! duration of 2 seconds per block.** Using the current implementation with multiple collators
//! adds additional latency to the block production pipeline. Assuming block execution takes
//! about the same as authorship, the additional overhead is equal the duration of the authorship
//! plus the block announcement. Since each collator must first import the previous block before
//! authoring a new one, this would amount to 12 seconds for 3 cores and 8 seconds for 2 cores.
//! 3. **Trusted collator set.** The collator set needs to be trusted until there’s a mitigation
alindima marked this conversation as resolved.
Show resolved Hide resolved
//! that would prevent or deter multiple collators from submitting the same collation.
//! A solution is being discussed [here](https://github.com/polkadot-fellows/RFCs/issues/92).
//! 4. **Fixed scaling.** For true elasticity, the parachain must be able to seamlessly acquire or
//! sell coretime as the user demand grows and shrinks over time, in an automated manner. This is
//! currently lacking - a parachain can only scale up or down by “manually” acquiring coretime.
//! This is not in the scope of the relay chain functionality. Parachains can already start
//! implementing such autoscaling, but we aim to provide a framework/examples for developing
//! autoscaling strategies.
//!
//! ## Using elastic scaling MVP
//!
//! ### Prerequisites
//!
//! - Ensure Asynchronous Backing is enabled on the network and you have enabled it on the parachain
//! using [this guide](https://wiki.polkadot.network/docs/maintain-guides-async-backing).
//! - Ensure the `AsyncBackingParams.max_candidate_depth` value is configured to a value that is at
//! least double the maximum targeted parachain velocity. For example, if the parachain will build
//! at most 3 candidates per relay chain block, the `max_candidate_depth` should be at least 6.
//! - Use a trusted single collator for maximum throughput.
//! - Ensure enough coretime is assigned to the parachain. For maximum throughput the upper bound is
//! 3 cores.
//! - Use the latest cumulus release, which includes the necessary elastic scaling changes.
//!
//! The following steps assume using the cumulus parachain template.
//!
//! ### Phase 1 - Update Parachain Node
//!
//! This phase consists of plugging in the new slot-based collator node.
//!
//! 1. In `node/src/service.rs` import the slot based collator instead of the lookahead collator.
//!
//! ```rust
//! use cumulus_client_consensus_aura::collators::slot_based::{self as aura, Params as AuraParams};
//! ```
//!
//! 2. In `start_consensus()`
kianenigma marked this conversation as resolved.
Show resolved Hide resolved
//! - Remove the `overseer_handle` param (also remove the
//! `OverseerHandle` type import if it’s not used elsewhere).
//! - In `AuraParams`, remove the `sync_oracle` and `overseer_handle` fields and add a
//! `slot_drift` field with a value of `Duration::from_secs(1)`.
//! - Replace the single future returned by `aura::run` with the two futures returned by it and
//! spawn them as separate tasks:
//! ```rust
//! let (collation_future, block_builder_future) = aura::run::<
//! Block,
//! sp_consensus_aura::sr25519::AuthorityPair,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a requirement. The slot based collator should be activated by a CLI parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is targeted at paras that use the cumulus template, not polkadot-parachain. If they want to switch to using elastic scaling they can't just use a CLI parameter, they still need to do these steps.

Once the slot based collator is merged I'll add a section here for parachains utilising polkadot-parachain binary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guides should be built for template/parachain, not polkadot-parachain IMO.

That being said, as noted above, I think this doc can be written better if it reuses the code of template/parachain

//! _,
//! _,
//! _,
//! _,
//! _,
//! _,
//! _,
//! _>(params);
//! task_manager
//! .spawn_essential_handle()
//! .spawn("collation-task", None, collation_future);
//! task_manager
//! .spawn_essential_handle()
//! .spawn("block-builder-task", None, block_builder_future);
//! ```
//!
//! 3. In `start_parachain_node()` remove the `sync_service` and `overseer_handle` params passed to
//! `start_consensus`
//!
//! ### Phase 2 - Activate fixed factor scaling in the runtime
//!
//! This phase consists of a couple of changes needed to be made to the parachain’s runtime in order
//! to utilise fixed factor scaling.
//!
//! First of all, you need to decide the upper limit to how many parachain blocks you need to
//! produce per relay chain block (in direct correlation with the number of acquired cores). This
//! should be either 1 (no scaling), 2 or 3. This is called the parachain velocity.
//!
//! If you configure a velocity which is different from the number of assigned cores, the measured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just compute the rest of the values based on the minimum parachain block time (MIN_SLOT_DURATION)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes they can. I specifically say how they all relate to each other and the formulas to derive them. I think the parachain teams can decide how to code them, this is just an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the constant computations based on maximum velocity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely have seen this velocity stuff in the async backing guides PR as well, I think it is best to first push that to completion in the best possible shape, then build this on top of it.

//! velocity in practice will be the minimum of these two.
//!
//! The chosen velocity should also be used to compute:
//! - The slot duration, by dividing the 6000 ms duration of the relay chain slot duration by the
//! velocity.
//! - The unincluded segment capacity, by multiplying the velocity with 2 and adding 1 to
//! it.
//!
//! Let’s assume a desired velocity of 3 parachain blocks per relay chain block. The needed changes
//! would all be done in `runtime/src/lib.rs`:
//!
//! 1. Increase the `BLOCK_PROCESSING_VELOCITY` to the desired value. In this example, 3.
//!
//! ```rust
//! const BLOCK_PROCESSING_VELOCITY: u32 = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! const BLOCK_PROCESSING_VELOCITY: u32 = 3;
//! const BLOCK_PROCESSING_VELOCITY: u32 = (RELAY_CHAIN_SLOT_TIME / MIN_SLOT_DURATION);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use docify please :)

//! ```
//!
//! 2. Set the `MILLISECS_PER_BLOCK` to the desired value.
//!
//! ```rust
//! const MILLISECS_PER_BLOCK: u32 =
//! RELAY_CHAIN_SLOT_DURATION_MILLIS / BLOCK_PROCESSING_VELOCITY;
//! ```
//! Note: for a parachain which measures time in terms of its own block number, changing block
//! time may cause complications, requiring additional changes. See the section ["Timing by
//! block number" of the async backing guide](https://wiki.polkadot.network/docs/maintain-guides-async-backing#timing-by-block-number).
//!
//! 3. Increase the `UNINCLUDED_SEGMENT_CAPACITY` to the desired value.
//!
//! ```rust
//! const UNINCLUDED_SEGMENT_CAPACITY: u32 = 2 * BLOCK_PROCESSING_VELOCITY + 1;
//! ```
3 changes: 3 additions & 0 deletions docs/sdk/src/guides/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,6 @@ pub mod enable_pov_reclaim;

/// How to enable metadata hash verification in the runtime.
pub mod enable_metadata_hash;

/// How to enable elastic scaling MVP on a parachain.
pub mod enable_elastic_scaling_mvp;
Loading