Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch: lookup join accesses invalid epoch #6979

Closed
Tracked by #6640
BugenZhao opened this issue Dec 20, 2022 · 4 comments
Closed
Tracked by #6640

batch: lookup join accesses invalid epoch #6979

BugenZhao opened this issue Dec 20, 2022 · 4 comments
Assignees
Labels
component/batch Batch related related issue. type/bug Something isn't working

Comments

@BugenZhao
Copy link
Member

When running e2e parallel on CI:

2022-12-20T06:12:18.527699Z ERROR risingwave_storage::monitor::monitored_store: Failed in get: Hummock error: Expired Epoch: watermark 3557394892259328, epoch 3557394889375744.
  backtrace of inner error:
   0: <risingwave_storage::hummock::error::HummockError as core::convert::From<risingwave_storage::hummock::error::HummockErrorInner>>::from
             at ./src/storage/src/hummock/error.rs:65:10
   1: <T as core::convert::Into<U>>::into
             at /rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/convert/mod.rs:726:9
   2: risingwave_storage::hummock::error::HummockError::expired_epoch
             at ./src/storage/src/hummock/error.rs:119:9
   3: risingwave_storage::hummock::utils::validate_epoch
             at ./src/storage/src/hummock/utils.rs:60:20
   4: risingwave_storage::hummock::state_store::<impl risingwave_storage::hummock::HummockStorage>::build_read_version_tuple
             at ./src/storage/src/hummock/state_store.rs:117:9
   5: risingwave_storage::hummock::state_store::<impl risingwave_storage::hummock::HummockStorage>::get::{{closure}}
             at ./src/storage/src/hummock/state_store.rs:67:13
   6: <risingwave_storage::store_impl::verify::VerifyStateStore<A,E> as risingwave_storage::store::StateStoreRead>::get::{{closure}}
             at ./src/storage/src/store_impl.rs:316:79
   7: <S as risingwave_storage::store_impl::boxed_state_store::DynamicDispatchedStateStoreRead>::get::{{closure}}
             at ./src/storage/src/store_impl.rs:707:47
   8: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/future/future.rs:124:9
   9: <async_stack_trace::StackTraced<F,_> as core::future::future::Future>::poll
             at ./src/utils/async_stack_trace/src/lib.rs:149:36
  10: <risingwave_storage::monitor::monitored_store::MonitoredStateStore<S> as risingwave_storage::store::StateStoreRead>::get::{{closure}}
             at ./src/storage/src/monitor/monitored_store.rs:112:17
  11: risingwave_storage::table::batch_table::storage_table::StorageTable<S>::get_row::{{closure}}
             at ./src/storage/src/table/batch_table/storage_table.rs:259:81
  12: <risingwave_batch::executor::join::distributed_lookup_join::InnerSideExecutorBuilder<S> as risingwave_batch::executor::join::local_lookup_join::LookupExecutorBuilder>::add_scan_range::{{closure}}
             at ./src/batch/src/executor/join/distributed_lookup_join.rs:389:17
  13: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/future/future.rs:124:9
  14: risingwave_batch::executor::join::lookup_join_base::LookupJoinBase<K>::do_execute::{{closure}}
             at ./src/batch/src/executor/join/lookup_join_base.rs:102:65

Compute node logs at https://buildkite.com/risingwavelabs/pull-request/builds/14221#01852e1b-256d-4d2a-ba95-45a72ff71360.

@BugenZhao BugenZhao added type/bug Something isn't working component/batch Batch related related issue. labels Dec 20, 2022
@github-actions github-actions bot added this to the release-0.1.16 milestone Dec 20, 2022
@BugenZhao
Copy link
Member Author

@BugenZhao
Copy link
Member Author

cc @chenzl25 Would you please help to take a look? 🥰

@chenzl25
Copy link
Contributor

cc @chenzl25 Would you please help to take a look? 🥰

Sure.

@chenzl25
Copy link
Contributor

I think I know why epoch expired, because for distributed queries, once we have scheduled the query, we will unpin epoch immediately. However it is impossible for a distributed lookup join to create all its iterators at the beginning, because it needs to wait for the outer side to send data to the inner side so that it can lookup the table at runtime.

Stage(Scheduled(stage_id)) => {
    tracing::trace!(
        "Query stage {:?}-{:?} scheduled.",
        self.query.query_id,
        stage_id
    );
    self.scheduled_stages_count += 1;
    stages_with_table_scan.remove(&stage_id);
    if stages_with_table_scan.is_empty() {
        // We can be sure here that all the Hummock iterators have been created,
        // thus they all successfully pinned a HummockVersion.
        // So we can now unpin their epoch.
        tracing::trace!("Query {:?} has scheduled all of its stages that have table scan (iterator creation).", self.query.query_id);
        pinned_snapshot_to_drop.take();
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/batch Batch related related issue. type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants