Maximize Peer Capacity When Syncing #13820

nisdas · 2024-03-28T07:13:08Z

What type of PR is this?

Bug Fix

What does this PR do? Why is it needed?

We filter for peers who have higher capacities and therefore bandwidth so that we can receive blocks quicker. Currently we wait and select for suboptimal peers which slows syncing to the chain head significantly.
Add unit test for new method

Which issues(s) does this PR fix?

N.A

Other notes for review

rkapka · 2024-03-28T07:23:49Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

+	bestPeers := f.hasSufficientBandwith(wantedPeers, req.Count)
+	// We append the best peers to the front so that higher capacity
+	// peers are dialed first.
+	peers = append(bestPeers, peers...)


Won't this result in bestPeers appearing twice in the slice? So that you'll request blobs for these peers twice

The initial idea is to provide peers with throughput and if all of them fail we just fallback to the whole peer set. It does duplicate it, but I don't see too much benefit in removing duplicates

If we do decide to request from multiple peers in a loop (despite my other comment), the first call to requestBlob for a peer that wasn't in the hasSufficientBandwidth result will call waitForBandwidth, causing the thread to block. In the worst case this is one of the bestPeers, which may have a longer wait because we just made a request to it.

Instead of doing that, I would suggest only appending pid - ie if none of the peers that seem to have bandwidth in the first check are able to serve the blobs, wait until the peer that served you the blocks has bandwidth.

beacon-chain/sync/initial-sync/blocks_fetcher.go

Co-authored-by: Radosław Kapka <[email protected]>

kasey · 2024-03-28T13:59:20Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

@@ -606,6 +626,18 @@ func (f *blocksFetcher) waitForBandwidth(pid peer.ID, count uint64) error {
 	return nil
 }

+func (f *blocksFetcher) hasSufficientBandwith(peers []peer.ID, count uint64) []peer.ID {


typo: Bandwith -> Bandwidth

kasey · 2024-03-28T14:21:25Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

+	// We append the best peers to the front so that higher capacity
+	// peers are dialed first.
+	peers = append(bestPeers, peers...)
+	for i := 0; i < len(peers); i++ {


I like the addition of hasSufficientBandwith (sic), to fall back to other peers if bandwidth isn't available. But I worry that this loop can burn through too much peer capacity in unhappy cases (eg bad block batch, all peers fail to give corresponding blobs). Could we just try the first best peer and then fail the batch as usual?

The previous behaviour was to try all peers rather than failing at one peer. If we do fail after 1 peer, it might make this more fragile. The main reason for our blob issues has been that we wait for 1 peer. If you notice for blocks we have always tried to dial many peers before exiting the method

…bs/geth-sharding into maximizePeerCapacity

into maximizePeerCapacity

nisdas added 3 commits March 28, 2024 14:45

maximize it

a6a1a88

fix it

505b212

lint

6b5e9f3

nisdas added Ready For Review A pull request ready for code review Priority: High High priority item labels Mar 28, 2024

nisdas requested a review from a team as a code owner March 28, 2024 07:13

nisdas requested review from rauljordan, potuz and terencechain March 28, 2024 07:13

rkapka reviewed Mar 28, 2024

View reviewed changes

beacon-chain/sync/initial-sync/blocks_fetcher.go Outdated Show resolved Hide resolved

nisdas and others added 4 commits March 28, 2024 16:24

add test

268d303

Update beacon-chain/sync/initial-sync/blocks_fetcher.go

c5d3d54

Co-authored-by: Radosław Kapka <[email protected]>

logs

5a40e3f

Merge branch 'develop' into maximizePeerCapacity

536474b

kasey reviewed Mar 28, 2024

View reviewed changes

nisdas added 3 commits March 29, 2024 13:21

kasey's review

038cfc5

Merge branch 'maximizePeerCapacity' of https://github.com/prysmaticla…

854c55a

…bs/geth-sharding into maximizePeerCapacity

Merge branch 'develop' of https://github.com/prysmaticlabs/geth-sharding

cd803c8

into maximizePeerCapacity

prestonvanloon previously approved these changes Mar 30, 2024

View reviewed changes

kasey's review

dcd7612

nisdas dismissed prestonvanloon’s stale review via dcd7612 March 30, 2024 14:06

prestonvanloon approved these changes Mar 30, 2024

View reviewed changes

nisdas enabled auto-merge March 30, 2024 14:09

nisdas added this pull request to the merge queue Mar 30, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2024

prestonvanloon added this pull request to the merge queue Mar 30, 2024

Merged via the queue into develop with commit 65b90ab Mar 30, 2024
16 of 17 checks passed

prestonvanloon deleted the maximizePeerCapacity branch March 30, 2024 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximize Peer Capacity When Syncing #13820

Maximize Peer Capacity When Syncing #13820

nisdas commented Mar 28, 2024

rkapka Mar 28, 2024 •

edited

Loading

nisdas Mar 28, 2024

kasey Mar 28, 2024

kasey Mar 28, 2024

kasey Mar 28, 2024

nisdas Mar 29, 2024 •

edited

Loading

Maximize Peer Capacity When Syncing #13820

Maximize Peer Capacity When Syncing #13820

Conversation

nisdas commented Mar 28, 2024

rkapka Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

nisdas Mar 28, 2024

Choose a reason for hiding this comment

kasey Mar 28, 2024

Choose a reason for hiding this comment

kasey Mar 28, 2024

Choose a reason for hiding this comment

kasey Mar 28, 2024

Choose a reason for hiding this comment

nisdas Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

rkapka Mar 28, 2024 •

edited

Loading

nisdas Mar 29, 2024 •

edited

Loading