-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to all_forks syncing #2114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automatically approving tomaka's pull requests. This auto-approval will be removed once more maintainers are active.
twiggy diff reportDifference in .wasm size before and after this pull request.
|
@tomaka thanks for the write up 🙏 it's very helpful.
is there some sort of the automatic switching between optimistic and all-sync options in reality? or the user has to manually choose the method which makes more sense to him/her? |
The situation at the moment is:
The idea for the full node is that you would always start in optimistic mode and then transition to all-forks mode. Substrate currently uses some heuristics that are something along the lines of "if the majority of peers have a best block that is >50 blocks higher than ours, then use optimistic mode" (the equivalent of optimistic mode is actually called "major syncing" in Substrate). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but have a few nitpicks
Co-authored-by: Anton Kaliaev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
I don't know how to title this PR more appropriately than "generic improvements to the code".
This PR basically finishes the original vision of the
pending_blocks.rs
module by adding aunnecessary_unverified_blocks
function.For context, the data structure in
pending_blocks.rs
tracks blocks that we know exist but can't be verified yet (because they're not a direct child of a block that has been verified, or because haven't downloaded their header yet, or in the case of the full node because we only have their header and not their body).This newly-added
unnecessary_unverified_blocks
function returns a list of unverified blocks that aren't strictly necessary to make the syncing progress. These blocks are blocks that are known to be on a bad fork, and blocks that are somewhere in-between the best block of a peer and a locally-known fork.For example, if we are at block 10 and a peer announces block 15, we will download block 14, then block 13, then block 12, etc. (note: in reality we should download all 5 blocks at once, but in the worst case scenario we'll download them one by one). After block 13 has been downloaded, we can remove block 14 from memory (and keep block 13) without impacting the syncing, as the download of block 12 will still happen.
These unnecessary unverified blocks are removed from the data structure if it reaches a certain size.
It is important to do so, in order to avoid spam attacks where peers announce for example block 9999 when the chain is actually only at block 10. The PR sets the arbitrary threshold to 100 blocks, meaning that in that example, after we've downloaded block 9900, we will start removing block 9998, then block 9997, etc. Despite removing blocks, we will, in finite time, still be able to determine whether this alternative chain is good or bad. If the announcement of block 9999 turned out to be legit, then we will be synchronized at block 110, and will start downloading the blocks from block 9999 again from scratch. It isn't great, but again it guarantees that we'll be fully in sync in finite time, and without an explosion in memory usage.
For long range downloads, such as the initial syncing, the so-called "optimistic" syncing should be used instead. In other words, long-range downloads are out of scope of this code.
In addition to this new method, this PR does a lot of renaming and documentation clean ups, in order to clarify how the data structure works. I've had to do this in order to implement
unnecessary_unverified_blocks
because I was myself a bit lost. It introduces some small hacks intoall_forks.rs
, butall_forks.rs
isn't super clean at the moment, and I don't think its quality is really degraded by these changes. It is necessary to putpending_blocks.rs
in a clean state beforeall_forks.rs
itself can be cleaned up.@melekes I understand that this is a bit complicated since you don't know about this code at all, so I'm fine with you not properly reviewing this.