-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate cold-start timing collection #2495
Comments
/milestone Scaling: Sub-Second Cold Start |
I'm curious, are there any open proposals on how to address this feature request at this time? |
@sjug you worked on this a bit. Do you have any recommendations? |
I don't have any recommendations at this time. However, I'd be interested in learning more about ideas that others have discussed in this space. |
#2323 would be nice to get in for reporting 'how much time is spent before our application begins executing' to testgrid |
#2667 will somewhat contribute to this as well, as it'll measure the time it takes to scale an application up from 1-X, arguably cold-start as well, just a slightly different category. |
We have most of the metrics on the Kubernetes pod cold start available in
Prometheus already. This ticket would include the difference on top of the
pod start time. I don't know what level of tracing we have now for just the
knative layer, or if we want to add Prometheus "metrics" all the way up.
…On Mon, Dec 10, 2018, 11:58 PM Markus Thömmes ***@***.***> wrote:
#2667 <#2667> will somewhat
contribute to this as well, as it'll measure the time it takes to scale an
application up from 1-X, arguably cold-start as well, just a slightly
different category.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2495 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHM2se2G_Ih4TBhOZWYwUTnN3zkMrSMiks5u32WxgaJpZM4YemjA>
.
|
I've started poking at adding some more detailed tracing to autoscaler in #2726 - AIUI we only support promethus metrics recording in CI so I am not sure how we want to close that gap. IMO tracing is the right tool for the job here form the development side so I think our options are:
Thoughts from the CI folks? |
we have enabled zipkin and prometheus in our perf tests. Have we enabled zipkin tracing for autoscaler? Last time I checked(few months ago :)), zipkin tracing was not available for internal knative components. |
We add the trace-id for every request made from the spoof library https://github.com/knative/pkg/blob/master/test/spoof/spoof.go#L172 |
Working on adding tracing internally here: #2726 |
@greghaynes You definitely have the best instrumentation here. Would it be possible to try and add it to automation in 0.8? For "v1" scope, I'd just like to see a benchmark in place that keeps us from going backwards on cold-start performance, not necessarily the deep visibility @greghaynes has, but bonus points the more we have. |
I think we have this covered already in
|
Is there a way to get the cold-start timing collection from Prometheus now? @greghaynes |
@yuxiaoba not yet, the implementation is still experimental and hasn't been merged. If you are interested in the implementation, you can have a look here: https://github.com/nimakaviani/serving/tree/pod-tracing-redux |
Expected Behavior
I expect to have a record of time spent broken down in to components during cold-start. Including:
This should be automated and ideally reported to somewhere we can track over time (similar to testgrid).
Actual Behavior
These measurements can only be gotten ad-hoc.
The text was updated successfully, but these errors were encountered: