-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add time-series metrics for memory consumption by proxy data structures #6473
Comments
cc @cpretzer |
The proxy doesn't have a great deal of global data structures in the traditional sense; we don't have a single global hash map of service discovery destinations, for example. Instead, most memory allocations are either per-service or per-request. Currently, we have metrics that track the number of services that have been built in different parts of the proxy ( Another thing we don't currently have a metric for, but that would definitely be helpful, is recording the number of asynchronous tasks spawned on the proxy's Beyond that, there might be a few other things that could be worth doing, but they're likely not as high impact as using the existing stack metrics and adding a count of how many tasks are currently active. We could consider instrumenting the proxy's buffers to track queue depth, letting us determine how many requests are waiting in a queue to be sent to a particular service. However, because the buffer queues are bounded, and requests that have been in a queue for too long are timed out, these queues shouldn't result in unbounded memory growth. Also, we could potentially enhance the existing stack metrics so that we are recording the actual memory use of those services, as well as counts. However, this could take some work, since many of these services own pointers to heap-allocated data. We would need to recursively traverse any such types to sum the size of their children as well. It turns out that determining the amount of heap memory used by an object in a language without a large runtime that manages allocations is surprisingly difficult --- while there is a Rust library for doing this, it isn't actively maintained and appears to not work correctly for reference-counted pointers ( |
Closing as I'm not going to have the time to work on this any more. |
#6441 and #6066 both report
linkerd-proxy
being OOM killed for using too much memory. To the extent not already tracked, it would be useful in debugging this to add time-series metrics that track the size of various data structures in the proxy sidecar. Capturing these metrics may aid in debugging the linked issues.I am willing to implement such metrics. I need assistance though in figuring out which are the more useful places in the proxy to instrument. I can then put together a PR.
The text was updated successfully, but these errors were encountered: