-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Longevity job is getting error "Storage error: Hummock error: ReadCurrentEpoch error Cannot read when cluster is under recovery" #7841
Comments
This error is expected when there are queries running in We can retry the batch query after recovery is done. |
@hzxa21 As this error came with the default setting, what should be done to avoid such errors? |
This is expected after some node crashes and the cluster is recovering itself. But why node failed during the longevity test? @sumittal |
This error itself is expected as explained by Patrick. The problem in this test is exactly same as rwc-3-longevity-20230208-170541: the cluster cannot succeed recovery because:
I suspect there is some meta addr resolution issue and is investigating. BTW if worker node's heartbeat request doesn't succeed for long time(10 min here), worker node is expected to exit, which seems not happen in this test (don't find |
I'm taking a day off for something today. @shanicky would you please help to TAL. 🙏 |
See #7841 (comment) |
Describe the bug
Recent longevity is getting error "Storage error: Hummock error: ReadCurrentEpoch error Cannot read when cluster is under recovery" while executing "select * from LIMIT 1" query.
Job Details:
https://buildkite.com/risingwave-test/longevity-test/builds/359#01863c49-2ff3-4ded-8e38-821bd2136889
Step/timeline:
17:08 UTC: created materialized view q22,q101,q102 with 'STREAMING_PARALLELISM=3'.
18:38 UTC: Able to fetch the records from materialized view.
3)19:08 UTC: now we are unable to fetch the data from materialized view and getting "Storage error: Hummock error: ReadCurrentEpoch error Cannot read when cluster is under recovery" error.
But, There was not a single pod crash.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: