Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximize Peer Capacity When Syncing #13820

Merged
merged 11 commits into from
Mar 30, 2024
Merged

Conversation

nisdas
Copy link
Member

@nisdas nisdas commented Mar 28, 2024

What type of PR is this?

Bug Fix

What does this PR do? Why is it needed?

  • We filter for peers who have higher capacities and therefore bandwidth so that we can receive blocks quicker. Currently we wait and select for suboptimal peers which slows syncing to the chain head significantly.
  • Add unit test for new method

Which issues(s) does this PR fix?

N.A

Other notes for review

@nisdas nisdas added Ready For Review A pull request ready for code review Priority: High High priority item labels Mar 28, 2024
@nisdas nisdas requested a review from a team as a code owner March 28, 2024 07:13
bestPeers := f.hasSufficientBandwith(wantedPeers, req.Count)
// We append the best peers to the front so that higher capacity
// peers are dialed first.
peers = append(bestPeers, peers...)
Copy link
Contributor

@rkapka rkapka Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this result in bestPeers appearing twice in the slice? So that you'll request blobs for these peers twice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial idea is to provide peers with throughput and if all of them fail we just fallback to the whole peer set. It does duplicate it, but I don't see too much benefit in removing duplicates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do decide to request from multiple peers in a loop (despite my other comment), the first call to requestBlob for a peer that wasn't in the hasSufficientBandwidth result will call waitForBandwidth, causing the thread to block. In the worst case this is one of the bestPeers, which may have a longer wait because we just made a request to it.

Instead of doing that, I would suggest only appending pid - ie if none of the peers that seem to have bandwidth in the first check are able to serve the blobs, wait until the peer that served you the blocks has bandwidth.

@@ -606,6 +626,18 @@ func (f *blocksFetcher) waitForBandwidth(pid peer.ID, count uint64) error {
return nil
}

func (f *blocksFetcher) hasSufficientBandwith(peers []peer.ID, count uint64) []peer.ID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Bandwith -> Bandwidth

// We append the best peers to the front so that higher capacity
// peers are dialed first.
peers = append(bestPeers, peers...)
for i := 0; i < len(peers); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the addition of hasSufficientBandwith (sic), to fall back to other peers if bandwidth isn't available. But I worry that this loop can burn through too much peer capacity in unhappy cases (eg bad block batch, all peers fail to give corresponding blobs). Could we just try the first best peer and then fail the batch as usual?

Copy link
Member Author

@nisdas nisdas Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous behaviour was to try all peers rather than failing at one peer. If we do fail after 1 peer, it might make this more fragile. The main reason for our blob issues has been that we wait for 1 peer. If you notice for blocks we have always tried to dial many peers before exiting the method

prestonvanloon
prestonvanloon previously approved these changes Mar 30, 2024
@nisdas nisdas enabled auto-merge March 30, 2024 14:09
@nisdas nisdas added this pull request to the merge queue Mar 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2024
@prestonvanloon prestonvanloon added this pull request to the merge queue Mar 30, 2024
Merged via the queue into develop with commit 65b90ab Mar 30, 2024
16 of 17 checks passed
@prestonvanloon prestonvanloon deleted the maximizePeerCapacity branch March 30, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High High priority item Ready For Review A pull request ready for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants