-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiplex worker implementation limits parallelism #494
Comments
Fascinating, great detective work! Want to put up a PR and I can test against our repo (multiplex workers performance regressed so I'm super keen to try your fix) |
@jongerrish If enabling multiplex workers was the only change at play for you, I think this shouldn't regress from normal persistent workers - at worst you'd be performing the same (doing everything in serial). That being said, I'll try to have a PR up in the next few days so you can test it out. |
Well regular persistent workers we give like 8 max instances so 8 processes for the |
Yes, I meant within each persistent worker work would happen serially. There might be some other behavior at play - maybe Bazel tries to schedule more work on a worker if it's known to support multiplexing, or something like that. |
Yeah, that makes sense. There should be just a single process for |
@jdai8 how did you generate that graph out of curiosity? Would like to have that tool in my tool belt :) |
@jeffzoch take a look at the profiling section in the bazel docs: https://docs.bazel.build/versions/master/skylark/performance.html#performance-profiling. It's definitely a useful tool to have! Specifically, I ran:
in my reproducer project. Limiting the number of max instances sends every work request to one worker (forcing us to multiplex). |
Welp. I feel stupid. Lemme go fix that -- coroutines are still a bit obtuse for me. |
I don't think #496 solves this. When running in my repro project
I would expect a profile to look like this: (I generated this with my fix at #495). However, with the fix on master, I get something that looks like this, suggesting actions are still running sequentially: |
Reading docs on flows...
emphasis on retains a sequential nature of flow if changing the context does not call for changing the dispatcher |
Yeah for parallel requests in flow you need to do something like this (I use this helper all the time): /**
* suspendingParallelMap
* @param scope - CoroutineScope
* @param f - suspending function
*/
fun <A, B> Flow<A>.suspendingParallelMap(scope: CoroutineScope, f: suspend (A) -> B): Flow<B> {
return flowOn(Dispatchers.IO)
.map { scope.async { f(it) } }
.buffer() //default concurrency limit of 64
.map { it.await() }
}
/* You can pass a size to the buffer to limit concurrency @restingbull |
Personally I find streams (assuming you are referring to Java 8 Streams) lackluster for this kind of thing since the amount of control you have over the executor is not great - controlling the level of concurrency per task you want to run isnt as straightforward and it tends to shine in cpu-intensive tasks (since by default it runs on the FJP). Channels are a good approach too not too dissimilar from Flow. Some of the channel api is getting deprecated though as Flow gets more features and I find working with flow to be more pleasant. YMMV |
While #498 parallelizes the compilation, it still sequences writing the work responses to stdout, since we're serially I'm seeing this when enabling the multiplex flag: The small actions compile in parallel, but they're blocked on writing to stdout until the big action finishes. Once the big action finishes, they all write very quickly. This is slightly better than the previous profile, where there is still a delay (to do the compilation) before each subsequent small action finishes. Since the underlying work here is thread-blocking, I'm not sure what value coroutines and flow offer - it seems to be tripping us up more than it's helping. Wouldn't it be simpler just to use threads? This is what Bazel does for Java compilation. |
@jdai8 are you referring to this https://github.com/restingbull/rules_kotlin/blob/6998aba9ee01198e04a47d39302939ecbd7fda34/src/main/kotlin/io/bazel/worker/PersistentWorker.kt#L115-L122 being serial? If so, that would make sense - collection is done serially. If we want this part to be parallelize we should apply the same I am also a bit confused on the usage of the private ThreadAwareDispatcher but im also not intimately familiar with exactly how this compilation works. Normally you can just use the Dispatchers.IO as your Dispatcher for this work and call it a day but again I admit im not familiar with how this code is called |
@restingbull this PR is just for sharing ideas - but I was wondering if things would still work (performantly) by tweaking the persistent worker a bit
|
Thanks @jeffzoch! Performance-wise, #501 looks good to me 👍 I still think using threads directly (as in #495) is a simpler implementation, since it doesn't look like we're using any coroutine/flow-specific features. I'll leave it up to @restingbull and others though. |
Given that #501 was merged can we close this? |
Yeah. Thanks! |
The current multiplex worker implementation (at HEAD) sequences reads and writes:
rules_kotlin/src/main/kotlin/io/bazel/worker/PersistentWorker.kt
Lines 83 to 113 in 51fe508
I think this artificially limits performance in certain cases. For example, if we receive a large work request first, all subsequent work requests will have to wait for it to finish before we write their work responses to stdout.
I tested this on a simple reproducer project here: https://github.com/jdai8/rules_kotlin_coroutine_repro. This project has one large source file
Big.kt
and several smallSmallX.kt
files. As you can see from this screenshot,//src:small
and//src:small4
have to wait for//src:big
to finish, even though the other small compilation actions have already finished (in <5s).Let me know if I'm understanding this correctly. If so, I'm happy to put up a PR.
The text was updated successfully, but these errors were encountered: