-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data source rewind across more than one block #5083
Conversation
The `Action::Restart` will already drop the block stream and replace the guard in the `instances` map, cancelling it in a very real way.
When reverting, the target block should be the target block of the revert, so pass `revert_to_ptr` as the parameter and change the function to `fn revert_state_to`. When restarting, it should revert to the last known good block, but not actually revert it. The usage in the `Err` case was redundant with the usage during restart.
let query = format!( | ||
"delete from {} where block_range @> $1 and lower(block_range) >= $1", | ||
"delete from {} where block_range &> int4range($1, null)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you removed the lower(block_range)
clause because there's no BRIN index on the data sources table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is.
) -> Result<(), StoreError> { | ||
// Use `@>` to leverage the gist index. | ||
// This assumes all ranges are of the form [x, +inf). | ||
pub(crate) fn revert(&self, conn: &PgConnection, block: BlockNumber) -> Result<(), StoreError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's always some uncertainty with the various revert
functions whether they revert block
itself or not, it would be good to have a comment on this method that states that it will revert all blocks with a number >= block
, i.e., that it's akin to a revert_to(block-1)
@@ -1361,7 +1349,7 @@ where | |||
.deployment_head | |||
.set(subgraph_ptr.number as f64); | |||
|
|||
self.revert_state(subgraph_ptr.number)?; | |||
self.revert_state_to(revert_to_ptr.number)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where reverting by more than one block came into play, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this and the implementation in dynds/private.rs
were both wrong I think.
let mut a = self.blocks.get(&a).unwrap(); | ||
let mut b = self.blocks.get(&b).unwrap(); | ||
while a.number() > b.number() { | ||
dbg!(a.ptr().number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a leftover dbg!
a = self.blocks.get(&a.parent_ptr()?).unwrap(); | ||
} | ||
while b.number() > a.number() { | ||
dbg!(b.ptr().number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@@ -811,7 +803,6 @@ where | |||
// Handle unexpected stream errors by marking the subgraph as failed. | |||
Err(e) => { | |||
self.metrics.stream.deployment_failed.set(1.0); | |||
self.revert_state(block_ptr.block_number())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this revert_state removed?
#4810 was wrong. And that wasn't the only thing, the way the runner reverted in-memory state also didn't seem quite right. This adds a test which uncovered these issues. Should resolve #4994, unless that was hitting the assert for a different cause.
Needs an integration cluster run, of course.