Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catchup: do not loop forever if there is no peers #6037

Merged

Conversation

algorandskiy
Copy link
Contributor

Summary

  • Add context cancellation check to fetchRound peers retrieval loop
  • This prevented some e2e tests to finish when a other nodes quit but the last node fell into catchup mode
  • Revert debug logging in TestCatchupOverGossip from tests: preserve logs on LibGoalFixture failure #6030

Diagnosed with debug logging in TestCatchupOverGossip: catchup service attempted to stop but never finished while keep logging entries below until it got killed.

{
  "Context": "sync",
  "file": "service.go",
  "function": "github.com/algorand/go-algorand/catchup.(*Service).fetchRound",
  "level": "debug",
  "line": 762,
  "msg": "fetchRound: was unable to obtain a peer to retrieve the block from",
  "name": "127.0.0.1:0",
  "time": "2024-06-20T01:23:49.586961Z"
}

Test Plan

Existing tests

* Add context cancellation check to fetchRound peers retrieval loop
* This prevented some e2e tests to finish when a other nodes quit
  but the last node fell into catchup mode
Copy link

codecov bot commented Jun 20, 2024

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 55.87%. Comparing base (d02ee6a) to head (338f8bf).

Files Patch % Lines
catchup/service.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6037      +/-   ##
==========================================
- Coverage   55.87%   55.87%   -0.01%     
==========================================
  Files         482      482              
  Lines       68571    68576       +5     
==========================================
+ Hits        38317    38318       +1     
+ Misses      27652    27649       -3     
- Partials     2602     2609       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@cce cce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ... I see you are just following the pattern below, but there is a slightly shorter 1-line check for ctx cancelled/timeout besides Done():

if err := s.ctx.Err(); err != nil {
	logging.Base().Debugf("ctx is done: %v", err)
	return
}

as per https://pkg.go.dev/context#Context

	// If Done is not yet closed, Err returns nil.
	// If Done is closed, Err returns a non-nil error explaining why:
	// Canceled if the context was canceled
	// or DeadlineExceeded if the context's deadline passed.

Copy link
Contributor

@jasonpaulos jasonpaulos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @cce's comment, but that's not a blocker

@algorandskiy algorandskiy merged commit 052f832 into algorand:master Jun 20, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants