-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid writer poisoned
errors
#4533
Conversation
For correctness, you should use The current approach also has a performance issue, which is that it clears the entity cache on every |
I added a commit that hopefully addresses that |
let logger = self.store.logger.clone(); | ||
if let Err(e) = self.stop().await { | ||
warn!(logger, "Writable had error when stopping, it is safe to ignore this error"; "error" => e.to_string()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it sufficient to send a stop request here, shouldn't this also join the task handle of the writer, to ensure is finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually wondering if this is too defensive - if the writer is poisoned, it's because we had an error in the Queue
, and start_writer
will return, i.e. shut down, on any error. Maybe I should just remove that code and just leave a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this would always fail - when a writer has been poisoned, it doesn't allow adding anything more to the queue. I changed this code to just log a warning if the join handle indicates that the background writer is still running. Joining here is a bit tricky since it risks blocking indexing of that subgraph if we really do have a problem with the writer stopping.
When the subgraph runner encounters an error, it needs to restart the store to clear any errors that might have happened.
We only need to reset the state of the runner when the `WritableStore` actually had to be restarted because of an error; if it had an error, we have to reset the state. Use `SubgraphRunner.revert_state` to properly reset the runner state.
Rebased to latest master. |
Woops .. I screwed up, I thought this had been approved already, and merged it. |
If a subgraph encountered an error, it would get restarted with some backoff. If that error was caused by some store interaction, the
WritableStore
would remain in poisoned state and refuse to process any more writes.This PR changes that so that the
WritableStore
is restarted when a subgraph is restarted, clearing the error. That requires that we clear any internal state that is based on assumptions of what has been written so far. I am not entirely sure that the changes incore/src/subgraph/runner.rs
is enough to do that, and would appreciate a thorough review of what other state might have to be cleared.