-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download KeyValues in remote externalities early #3584
base: master
Are you sure you want to change the base?
Download KeyValues in remote externalities early #3584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking like a good start, but I have some questions.
Could you please also
- write a test confirming a snapshot created from rococo state using the stream technique matches a snapshot created using the existing technique? you can use
wss://rococo-try-runtime-node.parity-chains.parity.io:443
. - benchmark downloading rococo state against the existing implementation and post the results in your PR
Thanks
let builder = Arc::new(self.clone()); | ||
|
||
for (start_key, end_key) in start_keys.into_iter().zip(end_keys) { | ||
let permit = parallel | ||
.clone() | ||
.acquire_owned() | ||
.await | ||
.expect("semaphore is not closed until the end of loop"); | ||
|
||
let builder = builder.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clone the builder here?
let start_keys: Vec<Option<&StorageKey>> = binding.iter().map(Some).collect(); | ||
let start_keys = start_keys.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why re-allocate start_keys?
block: B::Hash, | ||
parallel: usize, | ||
) -> Result<Vec<StorageKey>, &'static str> { | ||
) -> impl Stream<Item = Vec<KeyValue>> + 'a { | ||
/// Divide the workload and return the start key of each chunks. Guaranteed to return a | ||
/// non-empty list. | ||
fn gen_start_keys(prefix: &StorageKey) -> Vec<StorageKey> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to still divide the workload when using streams? I tried to use this and it seemed to process each chunk sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, i am trying to build off what was done initially, it isn't sequential actually, the stream keeps receiving keys and storing them as long as they are available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure why it's necessary to divide the workload when using the stream approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. How can i determine the vec of storage key using the prefix?
…' into dami/start_downloading_kvs_early
Hello @liamaharon this is the result of the test ran The first is the changes i made where we are streaming keys and saving them. The second is the initial, where all keys are gotten first. |
So based on your benchmark the time to scrape has gone from 46s to 196s? That should be looked into. |
Oh, so the previous function just only download keys only without retrieving their value and storing them Maybe i will need to include key retrieval and storage when testing for the previous function. |
…ing_kvs_early # Conflicts: # substrate/utils/frame/remote-externalities/src/lib.rs
@liamaharon can you please help review again |
Hey @dharjeezy, have you benchmarked scraping Polkadot state again? |
@liamaharon here are the images showing the time taking The first image is the current way in which kvs are downloaded The second image is the new one based on my changes using stream |
We expect that starting to downloading the values while scraping the keys should result in significant speedup. Can you please take a look at why we're not observing that? |
Hello @liamaharon i have tried improving how the code works, my recent commit introduces a way of batch inserting into the storage through concurrent means, but i noticed the speed is still not significant enough. |
…ing_kvs_early # Conflicts: # substrate/utils/frame/remote-externalities/Cargo.toml
The CI pipeline was cancelled due to failure one of the required jobs. |
closes #2494
Polkadot address: 12GyGD3QhT4i2JJpNzvMf96sxxBLWymz4RdGCxRH5Rj5agKW