-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ledger: add callback to clear state between commitRound retries #6190
Conversation
…e corruption in catchpointtracker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, looks okay overall.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6190 +/- ##
==========================================
- Coverage 51.88% 51.85% -0.04%
==========================================
Files 639 639
Lines 85489 85508 +19
==========================================
- Hits 44359 44339 -20
- Misses 38320 38354 +34
- Partials 2810 2815 +5 ☔ View full report in Codecov by Sentry. |
0feac4b
to
49dad4c
Compare
Summary
Some of our unit tests use an in-memory SQLite DB, rather than file-based SQLite, to make tests run faster. This requires enabling shared-cache mode so multiple goroutines can hold connections to the same in-memory DB. However, in shared cache mode, even read operations require table-level locks.
To handle these lock errors, our dbutil.go wrapper around SQLite transactions (
AtomicContext
) implements retry logic, where a provided function is retried multiple times when the error issqlite3.ErrLocked
orsqlite3.ErrBusy
. Our unit tests that use concurrent connections to the same in-memory SQLite DB often show many retries before successfully committing, due to contention. In regular on-disk operation, shared cache mode is not enabled, and these errors do not occur.The catchpointtracker's
commitRound()
function flushes a batch of round updates to a merkle trie. UnfortunatelycommitRound()
cannot be safely retried inside anAtomicContext
, because it updates the trie's SQLite table as well as an in-memory cache. When retries occur, the DB transaction are rolled back (along with other tracker's committed), but the in-memory data is not rolled back. This extra callback allows the catchpointtracker to clear state between retries ofcommitRound()
.Related: #5568
Test Plan
This should only impact tests that use in-memory SQLite, used for faster test performance, and make them more reliable. A new test
TestCatchpointTrackerFastRoundsDBRetry
was added that tries to corrupt the merkle trie was added, and is flaky (fails most of the time depending on timing/luck) without this PR.