-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximize Peer Capacity When Syncing #13820
Conversation
bestPeers := f.hasSufficientBandwith(wantedPeers, req.Count) | ||
// We append the best peers to the front so that higher capacity | ||
// peers are dialed first. | ||
peers = append(bestPeers, peers...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this result in bestPeers
appearing twice in the slice? So that you'll request blobs for these peers twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial idea is to provide peers with throughput and if all of them fail we just fallback to the whole peer set. It does duplicate it, but I don't see too much benefit in removing duplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do decide to request from multiple peers in a loop (despite my other comment), the first call to requestBlob
for a peer that wasn't in the hasSufficientBandwidth
result will call waitForBandwidth
, causing the thread to block. In the worst case this is one of the bestPeers
, which may have a longer wait because we just made a request to it.
Instead of doing that, I would suggest only appending pid
- ie if none of the peers that seem to have bandwidth in the first check are able to serve the blobs, wait until the peer that served you the blocks has bandwidth.
Co-authored-by: Radosław Kapka <[email protected]>
@@ -606,6 +626,18 @@ func (f *blocksFetcher) waitForBandwidth(pid peer.ID, count uint64) error { | |||
return nil | |||
} | |||
|
|||
func (f *blocksFetcher) hasSufficientBandwith(peers []peer.ID, count uint64) []peer.ID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Bandwith
-> Bandwidth
// We append the best peers to the front so that higher capacity | ||
// peers are dialed first. | ||
peers = append(bestPeers, peers...) | ||
for i := 0; i < len(peers); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the addition of hasSufficientBandwith
(sic), to fall back to other peers if bandwidth isn't available. But I worry that this loop can burn through too much peer capacity in unhappy cases (eg bad block batch, all peers fail to give corresponding blobs). Could we just try the first best peer and then fail the batch as usual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous behaviour was to try all peers rather than failing at one peer. If we do fail after 1 peer, it might make this more fragile. The main reason for our blob issues has been that we wait for 1 peer. If you notice for blocks we have always tried to dial many peers before exiting the method
…bs/geth-sharding into maximizePeerCapacity
What type of PR is this?
Bug Fix
What does this PR do? Why is it needed?
Which issues(s) does this PR fix?
N.A
Other notes for review