catchup: Dynamic parallel catchup #5802

winder · 2023-10-24T19:38:47Z

Summary

This PR adds some rate limiting to the block download service. It will download two blocks and check the download duration. If they were fetched in less than the round time it allows full parallelism, otherwise it keeps the rate limited since we're likely at the latest round.

Test Plan

So far, I've tested manually with the print statement in the code. Not sure how to unit test this.

codecov · 2023-10-24T20:04:28Z

Codecov Report

Merging #5802 (2e6b628) into master (f5d901a) will decrease coverage by 0.05%.
The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master    #5802      +/-   ##
==========================================
- Coverage   55.61%   55.57%   -0.05%     
==========================================
  Files         475      475              
  Lines       66869    66880      +11     
==========================================
- Hits        37189    37166      -23     
- Misses      27161    27192      +31     
- Partials     2519     2522       +3

Files	Coverage Δ
catchup/service.go	`71.05% <86.66%> (-0.19%)`	⬇️

... and 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

catchup/service.go

algorandskiy

Could you clarify why duration measurement works? I expected fetchers to exit immediately if there is no such block, or return a block it is available in remote ledger.

catchup/service.go

algorandskiy · 2023-10-25T18:00:36Z

I still think a new REST or WS endpoint streaming blocks would work better.

Follower node connects to a new /v2/blocks/monitor endpoint
A single goroutine handles it on server side by using block notifier, and streams a block
Follower writes blocks into the ledger if it is sequential. If blocks is too new, AddBlock errors, and eventually regular catchup will fire downloading everything.
Regular catchup will not fire since new blocks are supplied with this new mechanism (similarly to regular operations when agreement service adds blocks).

winder · 2023-10-25T18:41:02Z

I still think a new REST or WS endpoint streaming blocks would work better.

1. Follower node connects to a new `/v2/blocks/monitor` endpoint

2. A single goroutine handles it on server side by using block notifier, and streams a block

3. Follower writes blocks into the ledger if it is sequential. If blocks is too new, `AddBlock` errors, and eventually regular catchup will fire downloading everything.

4. Regular catchup will not fire since new blocks are supplied with this new mechanism (similarly to regular operations when agreement service adds blocks).

I'm not sure what that has to do with this PR

algorandskiy · 2023-10-25T18:48:47Z

I'm not sure what that has to do with this PR

As I understand the purpose of this PR is reduce exceed requests to non-existing yet blocks for follower nodes that already caught up. I propose an alternative approach to the same problem.

winder · 2023-10-25T19:00:10Z

I'm not sure what that has to do with this PR

As I understand the purpose of this PR is reduce exceed requests to non-existing yet blocks for follower nodes that already caught up. I propose an alternative approach to the same problem.

Unfortunately it's not a solution because Conduit uses the sync API.

algorandskiy

This PR actually reduces number of requests made by a follower when it caught up.
Need to remove the debugging log.

gmalouf

Looks good, I think we should cleanly get to a resolution on the convo around line 47 of service.go.

cce

Using download duration & 1s seems like a weird hack that affects all nodes, not just follower nodes, and is very arbitrary (why 1s?). @algorandskiy shouldn't the right solution to requesting blocks too far into the future involve smarter polling, or introduce additional calls to WaitMem() like how fetchAndWrite waits for lookbackComplete?

I think a better solution would incorporate feedback from peers about what their latest was, like: https://github.com/cce/go-algorand/tree/blockNotAvailableErrMsg-latest

@algorandskiy is working on this approach — smarter polling that introduces pauses in between retries when asking for blocks from the future, in #5809 — I think it would make this PR unnecessary. The problem this PR and #5809 are both trying to solve is aggressive polling once you get close to latest.

algorandskiy · 2023-10-28T02:52:27Z

These are two different cases: 1) extra requests due to parallel requests - this PR addresses this 2) extra requests due to retries on errors. My PR addresses that.

catchup/service.go

cce

Assuming this has been tested and works without causing the catchup pipeline to fail when continuously following the latest, this could be a workaround that prevents cancelling the pipeline unnecessarily. But ideally we replace it later with better polling and backoff, perhaps incorporating information from #5819 which tells you the latest round of the node you're requesting from.

It would be a little nicer to limit the scope of this change only to follower nodes, since regular nodes don't need to worry about following the tip of the chain using catchup.

winder added 4 commits October 20, 2023 11:26

WIP

f19289d

WIP

d1c57cc

Recover previous work.

83cb474

Add temporary debugging line.

cc632b3

winder requested review from cce and algorandskiy October 24, 2023 19:38

winder commented Oct 25, 2023

View reviewed changes

catchup/service.go Show resolved Hide resolved

winder commented Oct 25, 2023

View reviewed changes

catchup/service.go Show resolved Hide resolved

winder changed the title ~~Dynamic parallel catchup~~ catchup: Dynamic parallel catchup Oct 25, 2023

winder added the Enhancement label Oct 25, 2023

winder marked this pull request as ready for review October 25, 2023 13:15

winder commented Oct 25, 2023

View reviewed changes

catchup/service.go Outdated Show resolved Hide resolved

algorandskiy reviewed Oct 25, 2023

View reviewed changes

catchup/service.go Show resolved Hide resolved

catchup/service.go Show resolved Hide resolved

catchup/service.go Show resolved Hide resolved

winder requested a review from gmalouf October 26, 2023 14:59

algorandskiy previously approved these changes Oct 27, 2023

View reviewed changes

gmalouf reviewed Oct 27, 2023

View reviewed changes

cce requested changes Oct 27, 2023

View reviewed changes

Merge branch 'master' into will/dynamic-parallel-catchup

2e6b628

winder dismissed algorandskiy’s stale review via 2e6b628 October 28, 2023 13:02

winder requested review from algorandskiy, cce and gmalouf November 2, 2023 12:42

algorandskiy approved these changes Nov 2, 2023

View reviewed changes

cce reviewed Nov 2, 2023

View reviewed changes

catchup/service.go Show resolved Hide resolved

cce approved these changes Nov 9, 2023

View reviewed changes

winder merged commit 22b7096 into algorand:master Nov 9, 2023
10 checks passed

winder deleted the will/dynamic-parallel-catchup branch November 9, 2023 20:48

Algo-devops-service mentioned this pull request Nov 9, 2023

go-algorand 3.20.0-beta Release PR #5826

Merged

This was referenced Nov 16, 2023

go-algorand 3.20.0-beta Release PR #5835

Merged

go-algorand 3.20.0-beta Release PR #5837

Merged

Algo-devops-service mentioned this pull request Nov 24, 2023

go-algorand 3.20.0-stable Release PR #5844

Closed

Algo-devops-service mentioned this pull request Dec 4, 2023

go-algorand 3.20.1-stable Release PR #5852

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catchup: Dynamic parallel catchup #5802

catchup: Dynamic parallel catchup #5802

winder commented Oct 24, 2023 •

edited

Loading

codecov bot commented Oct 24, 2023 •

edited

Loading

algorandskiy left a comment

algorandskiy commented Oct 25, 2023 •

edited

Loading

winder commented Oct 25, 2023

algorandskiy commented Oct 25, 2023

winder commented Oct 25, 2023

algorandskiy left a comment

gmalouf left a comment

cce left a comment •

edited

Loading

algorandskiy commented Oct 28, 2023

cce left a comment •

edited

Loading

catchup: Dynamic parallel catchup #5802

catchup: Dynamic parallel catchup #5802

Conversation

winder commented Oct 24, 2023 • edited Loading

Summary

Test Plan

codecov bot commented Oct 24, 2023 • edited Loading

Codecov Report

algorandskiy left a comment

Choose a reason for hiding this comment

algorandskiy commented Oct 25, 2023 • edited Loading

winder commented Oct 25, 2023

algorandskiy commented Oct 25, 2023

winder commented Oct 25, 2023

algorandskiy left a comment

Choose a reason for hiding this comment

gmalouf left a comment

Choose a reason for hiding this comment

cce left a comment • edited Loading

Choose a reason for hiding this comment

algorandskiy commented Oct 28, 2023

cce left a comment • edited Loading

Choose a reason for hiding this comment

winder commented Oct 24, 2023 •

edited

Loading

codecov bot commented Oct 24, 2023 •

edited

Loading

algorandskiy commented Oct 25, 2023 •

edited

Loading

cce left a comment •

edited

Loading

cce left a comment •

edited

Loading