-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix possible stuck on error in HashedDictionaryParallelLoader #60926
Conversation
This is an automated comment for commit 7e90871 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
if (!shards_queues[shard]->push(std::move(shards_blocks[shard]))) | ||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Could not push to shards queue #{}", shard); | ||
const auto & current_block = shards_blocks[shard]; | ||
while (!shards_queues[shard]->tryPush(current_block, /* milliseconds= */ 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously we moved block, shouldn't we do it now too?
if (!shards_queues[shard]->push(std::move(shards_blocks[shard]))) | ||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Could not push to shards queue #{}", shard); | ||
const auto & current_block = shards_blocks[shard]; | ||
while (!shards_queues[shard]->tryPush(current_block, /* milliseconds= */ 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understood it correctly, we will fail the whole load operation if even a single block took more than 100ms to insert it. it seems to restrictive. and probably there are some sources that produce big blocks or a single block for the whole output.
maybe we could make it configurable or just set to a much bigger value. like 10s or a minute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We retry tryPush
until block is inserted.
I replaced push
to tryPush
and if
to while
, because I don't want thread to be blocked without checking if workers are alive. So, if we cannot push, we will check status of workers and loading_timeout
each 100 ms. If everything is alright we will retry push, otherwise we will have a chance to stop.
It's also the reason why std::move
is not used anymore, I hope it won't be significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also the reason why std::move is not used anymore, I hope it won't be significant
yep, makes sense.
my question was "what if it cannot be inserted in 100ms at all", so we would retry but it won't help because the t/o is too low. but since it is just insertion in queue (i.e. shouldn't depend on block size) maybe it is indeed ok.
Fix #61013
It's tricky to reproduce in CI, it requires indserting specific faults into the code (wait until there're threads waiting on push and throwing exception from workter in that very moment)
It took >5min to stop the server:
No stuck threads related to dictionary in gdb.log, most likely something else... |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
SHARDS