-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize supervisor stop logic to make it run faster #17535
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some suggestions
} | ||
log.info("Waiting for [%d] supervisors to shutdown", stopFutures.size()); | ||
try { | ||
FutureUtils.coalesce(stopFutures).get(80, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use a timeout of 80s here since each supervisor could have a different value of shutdown timeout. We could either just do get()
with no args (which would be no worse than what the code is currently doing) or use a longer timeout.
|
||
private volatile boolean started = false; | ||
|
||
@Inject | ||
public SupervisorManager(MetadataSupervisorManager metadataSupervisorManager) | ||
{ | ||
this.metadataSupervisorManager = metadataSupervisorManager; | ||
this.shutdownExec = MoreExecutors.listeningDecorator( | ||
Execs.multiThreaded(25, "supervisor-manager-shutdown-%d") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25 may be excessive in some cases and inadequate in others. Maybe initialize the executor lazily inside the stop()
method, then the number of required threads can be computed at run time. The shutdownExec
need not be a class-level field either.
Alternatively, instead of using a completely new executor, you could consider using the scheduledExec
inside each supervisor. That executor basically just sits idle most of the time and is responsible only for submitting RunNotice
to the notice queue.
You could add a stopAsync
method to SeekableStreamSupervisor
that does the following:
- returns a future that we coalesce and wait upon
- internally submits a runnable to the
scheduledExec
to perform the actualstop
I guess the only thing we will miss out on is parallelizing the autoscaler.stop()
which should not be a concern, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue i was running into with this strategy is that part of the stop logic is shutting down the scheduledExec executor, and I couldn't really think of a great way to avoid this chicken-and-egg problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could perhaps work around that problem by doing something like this:
stopAsync
sets the supervisor state to STOPPINGstopAsync
then submits astop()
runnable to thescheduledExec
buildRunTask
method should check and submit theRunNotice
only if the state of the supervisor is not STOPPINGstop()
can callscheduledExec.shutdown()
instead ofscheduledExec.shutdownNow()
Another alternative is to simply create a shutdownExec
inside stopAsync
.
Difference from the current approach would be that the SupervisorManager
doesn't need to handle the lifecycle of the shutdown executor.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this makes sense, will update
FutureUtils.coalesce(stopFutures).get(80, TimeUnit.SECONDS); | ||
} | ||
catch (Exception e) { | ||
log.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log the exception too just in case it is something other than a timeout.
if (!stopGracefully) { | ||
stopped = true; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have already parallelized the stop
of supervisors, is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could probably pull it out
Sometimes the LifecycleStop method of SupervisorManager (SupervisorManager.stop()) can take a long time to run. This is because the method iterates through all running supervisors and calls stop on them serially. Each streaming supervisor.stop() call tries to push a ShutdownNotice to its notice queue and then wait for the ShutdownNotice to run and set stopped = true up to tuningConfig.shutdownTimeout. This means the total run time can be the sum of tuningConfig.shutdownTimeout (default 80 seconds) across all supervisors.
This long stop time can cause lots of issues, most notably overlord leadership issues if the ZK leader is terminated (but the ensemble maintains quorum). This is because a overlord pod can get becomeLeader queued up behind stopLeader if it disconnects and then reconnects to ZK (the giant lock shared between the two methods).
This PR attempts to ensure SupervisorManager completes faster to prevent this issue. (although I feel some of the leadership process on the overlord specifically maybe needs to be revisited in general).
Still working on some unit tests for this change
Description
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
Release note
Improve recovery time for overlord leadership after zk nodes are bounced.
Key changed/added classes in this PR
SupervisorManager
SeekableStreamSupervisor
This PR has: