-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate and fix panics on fast iavl branch #892
Comments
From the most recent discussion, we have a guess that there are problems with atomic commit. Currently, each module has a separate iavl store. The commit is atomic within a tree. However, it is not atomic inter-tree or between modules. As a result, if an operator kills the node at the wrong time, the module's stores may be at different heights. Relevant SDK issue: cosmos/cosmos-sdk#6370 However, although this non-atomic commit between stores exists, our problem might still be something else since the error "was already saved to a different hash" is coming from a check within a tree. More investigation is still needed. |
@UnityChaos @ValarDragon any other updates? |
The SDK logical commit not being atomic is looking more likely to be the cause from what Unity was saying. Copying from Unity's message: okay, so yeah after the crash on pruning, 6.3 (using fast cache), is reading the value for the next height:
(When trying to replay block 3249090, 6.3 reads 14240 from fast cache, whereas 6.2 reads 25939 from commit multi store) |
Can we detect whether were replaying a block vs executing a new block? If we're replaying a block then don't enable fast cache? |
I'm trying to see why replaying a block is breaking things. The fast nodes should still stay consistent with the actual nodes even if we are replaying |
PR with the fix: osmosis-labs/cosmos-sdk#115 |
Still need to address the "active reader" issue related to misconfig between pruning and snapshot. Unity explained how to trigger that bug in the PR above |
This PR currently contains the latest changes related to this issue. |
This issue seems to be resolved, closing |
Background
Validators are experiencing issues on the new fast node branch with panics such as:
This is an issue to investigate and fix the problems reported.
Acceptance Criteria
The text was updated successfully, but these errors were encountered: