-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coroutine scheduler monitoring #1360
Comments
Please, take a look at |
This does look pretty useful, but it also seems like it might have a notable performance impact? The monitoring that looks attractive to me would be getting a gauge on the sizes of the CoroutineScheduler queues (global and local). Our biggest fear is accidentally putting slow blocking work (or worse, deadlocks) in our main dispatcher (which happened to us once on a previous project using Kotlin coroutines incorrectly, and also when using Ratpack’s coroutine-style execution). So getting alerted if work is building up over time (ie, if the queues are getting too big/growing indefinitely) seems helpful. Would it be reasonable to expose some of these stats somewhere? These stats are specific to the CoroutineScheduler so I don't think kotlinx-coroutines-debug is relevant. As an awful hack we are considering parsing |
@glasser Yes, that can be done without the slow debug mode and makes sense. I'll keep it open as an enhancement. |
Thanks! Should I interpret that as "you're going to do it" or "you'd accept patches"? |
Unfortunately, we are not ready to accept patches right now because the scheduler is being actively reworked. But it would be really helpful if you could provide a more detailed example of the desired API shape and problem you want to solve with this API. For example, "Ideally, we'd see it as pluggable SPI service for dispatcher with the following methods ..., so we could use to trigger our monitoring if ..." |
Interesting — is there a branch or design doc or something for the reworking? Curious how it's changing. My proposal is pretty simple. A few of the core objects involved with coroutine scheduling should be (a) publicly accessible and (b) expose a few properties that provide statistics about them. It's fine if these are documented as "experiment, up for change, don't rely on this" and as "fetching these properties may have a performance impact if done frequently" (eg, Most specifically, I'd want to have access to
I don't need kotlinx.coroutines to provide any machinery for hooking this up to my metrics service: I'm happy to keep at application (or external library) level the code that takes the dispatchers I care about, polls them for metrics, and publishes to my metrics service of choice. |
No for both, though changes will be, of course, properly documented. But mostly it's about changing the parking/spinning strategy without violating liveness property to reduce CPU consumption during the low rate of the requests and to have a robust idle thread termination. Change is just too intrusive and touches all the places in the scheduler. Thanks for the details! |
This is for server usage. We are currently porting a few web servers from Ratpack to Ktor. Ratpack has a similar async structure (with a recommended usage of a pool of "compute" threads approximately equal in size to the number of CPUs plus a scaling "blocking" pool) to Kotlin coroutines, but because you have to do all work with explicit Promise composition rather than the nice syntax of Kotlin coroutines, we've found that developers often don't bother to keep blocking work out of the compute pool, and often implement error handling incorrectly (eg by putting try/catch/finally or retry loops around functions that return Promises rather than properly using the Promise API). Our hope is that Kotlin coroutines will be much more accessible. But we still want to monitor that we're not clogging up the pools! (Ratpack Promises also have some other odd behavior — eg, |
+1 to everything that @glasser said. Looking to start replacing some thread pools with coroutines in our high-volume, production, back-end service, and would feel a lot better about it if we had some way to emit metrics about the health of the pools/scheduler. Thanks! |
I have an app that launches millions of coroutines that are CPU bound and they are taking longer than would be expected to complete. I am wondering if they are taking a long time because of the overhead of them being scheduled and executed. Would like to have monitoring on the queue size for this reason. |
Any updates on this? Any news when it may be implemented? We are also interested in monitoring number of Coroutinies, and it is really disappointing, that such basic metric is not available by default. |
Any updates on this? Any other ways of getting similar numbers? Wanting metrics basically because of the same reasons as @glasser . :) |
Any updates? I'm interested as well. |
Also interested in this |
We aim to implement it in the next releases after 1.5.0 |
Our use case is also high load server side.
|
@qwwdfsad any updates? Also very much interested in this. |
@soudmaijer for us this is so critical that I implemented the 'awful hack' that glasser mentioned. See https://github.com/joost-de-vries/spring-reactor-coroutine-metrics/tree/coroutineDispatcherMetrics/src/main/kotlin/metrics |
Does that mean that this will be addressed in 1.6.0 (which appears to be close to release)? |
In IJ we have own unlimited executor (let's call it Using effectively unlimited IO dispatcher will allow us to drop own executor service (single pool for the whole app approach) and avoid unnecessary thread switches which inevitably happen between |
Is there any update on this issue? |
any update? |
@joost-de-vries is your hack still working out reasonably well for you? |
Is there any update on this issue? |
Are there any monitoring tools available for how many coroutines are currently active and their state etc? It would be nice if it could be exposed so that something like Prometheus can scrape it and visualise it in grafana.
It will also help in debugging leaks - and errors that might occur if we see coroutines just rising linearly
If not can this be done by looking at the thread stats instead?
Go exposes it via
runtime.NumGoroutines()
Related:? #494
The text was updated successfully, but these errors were encountered: