Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download KeyValues in remote externalities early #3584

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

dharjeezy
Copy link
Contributor

@dharjeezy dharjeezy commented Mar 5, 2024

closes #2494

Polkadot address: 12GyGD3QhT4i2JJpNzvMf96sxxBLWymz4RdGCxRH5Rj5agKW

@dharjeezy dharjeezy marked this pull request as ready for review March 7, 2024 16:02
Copy link
Contributor

@liamaharon liamaharon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking like a good start, but I have some questions.

Could you please also

  • write a test confirming a snapshot created from rococo state using the stream technique matches a snapshot created using the existing technique? you can use wss://rococo-try-runtime-node.parity-chains.parity.io:443.
  • benchmark downloading rococo state against the existing implementation and post the results in your PR

Thanks

Comment on lines 448 to 451
let builder = Arc::new(self.clone());

for (start_key, end_key) in start_keys.into_iter().zip(end_keys) {
let permit = parallel
.clone()
.acquire_owned()
.await
.expect("semaphore is not closed until the end of loop");

let builder = builder.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clone the builder here?

Comment on lines 443 to 444
let start_keys: Vec<Option<&StorageKey>> = binding.iter().map(Some).collect();
let start_keys = start_keys.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why re-allocate start_keys?

@liamaharon liamaharon added the T0-node This PR/Issue is related to the topic “node”. label Mar 18, 2024
block: B::Hash,
parallel: usize,
) -> Result<Vec<StorageKey>, &'static str> {
) -> impl Stream<Item = Vec<KeyValue>> + 'a {
/// Divide the workload and return the start key of each chunks. Guaranteed to return a
/// non-empty list.
fn gen_start_keys(prefix: &StorageKey) -> Vec<StorageKey> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to still divide the workload when using streams? I tried to use this and it seemed to process each chunk sequentially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, i am trying to build off what was done initially, it isn't sequential actually, the stream keeps receiving keys and storing them as long as they are available

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure why it's necessary to divide the workload when using the stream approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. How can i determine the vec of storage key using the prefix?

@dharjeezy
Copy link
Contributor Author

dharjeezy commented Mar 23, 2024

  • wss://rococo-try-runtime-node.parity-chains.parity.io:443

Hello @liamaharon this is the result of the test ran

The first is the changes i made where we are streaming keys and saving them.

Screenshot 2024-03-23 at 18 06 44

The second is the initial, where all keys are gotten first.

Screenshot 2024-03-23 at 18 16 41

@liamaharon
Copy link
Contributor

So based on your benchmark the time to scrape has gone from 46s to 196s? That should be looked into.

@dharjeezy
Copy link
Contributor Author

So based on your benchmark the time to scrape has gone from 46s to 196s? That should be looked into.

Oh, so the previous function just only download keys only without retrieving their value and storing them
The new function i created downloads key and retrieve value and store them on the fly.

Maybe i will need to include key retrieval and storage when testing for the previous function.

…ing_kvs_early

# Conflicts:
#	substrate/utils/frame/remote-externalities/src/lib.rs
@dharjeezy dharjeezy requested a review from liamaharon May 1, 2024 13:12
@dharjeezy
Copy link
Contributor Author

@liamaharon can you please help review again

@liamaharon
Copy link
Contributor

@liamaharon can you please help review again

Hey @dharjeezy, have you benchmarked scraping Polkadot state again?

@dharjeezy
Copy link
Contributor Author

dharjeezy commented May 28, 2024

@liamaharon here are the images showing the time taking

The first image is the current way in which kvs are downloaded

Screenshot 2024-05-28 at 13 49 20

The second image is the new one based on my changes using stream

Screenshot 2024-05-28 at 14 09 23

@liamaharon
Copy link
Contributor

We expect that starting to downloading the values while scraping the keys should result in significant speedup. Can you please take a look at why we're not observing that?

@dharjeezy
Copy link
Contributor Author

Hello @liamaharon i have tried improving how the code works, my recent commit introduces a way of batch inserting into the storage through concurrent means, but i noticed the speed is still not significant enough.

…ing_kvs_early

# Conflicts:
#	substrate/utils/frame/remote-externalities/Cargo.toml
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable-int
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6591822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remote Externalities: Start downloading KVs early
3 participants