Automate cold-start timing collection #2495

greghaynes · 2018-11-14T21:21:15Z

Expected Behavior

I expect to have a record of time spent broken down in to components during cold-start. Including:

How much time is spent before we ask our deployment to scale up
How much time is spent before our user application begins executing

This should be automated and ideally reported to somewhere we can track over time (similar to testgrid).

Actual Behavior

These measurements can only be gotten ad-hoc.

greghaynes · 2018-11-14T21:21:52Z

/milestone Scaling: Sub-Second Cold Start

vantuvt · 2018-12-10T16:23:37Z

I'm curious, are there any open proposals on how to address this feature request at this time?

josephburnett · 2018-12-10T18:24:47Z

@sjug you worked on this a bit. Do you have any recommendations?

vantuvt · 2018-12-10T18:26:43Z

I don't have any recommendations at this time. However, I'd be interested in learning more about ideas that others have discussed in this space.

greghaynes · 2018-12-10T19:58:34Z

#2323 would be nice to get in for reporting 'how much time is spent before our application begins executing' to testgrid

markusthoemmes · 2018-12-11T07:58:39Z

#2667 will somewhat contribute to this as well, as it'll measure the time it takes to scale an application up from 1-X, arguably cold-start as well, just a slightly different category.

sjug · 2018-12-11T18:49:43Z

We have most of the metrics on the Kubernetes pod cold start available in Prometheus already. This ticket would include the difference on top of the pod start time. I don't know what level of tracing we have now for just the knative layer, or if we want to add Prometheus "metrics" all the way up.

…

On Mon, Dec 10, 2018, 11:58 PM Markus Thömmes ***@***.***> wrote: #2667 <#2667> will somewhat contribute to this as well, as it'll measure the time it takes to scale an application up from 1-X, arguably cold-start as well, just a slightly different category. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2495 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHM2se2G_Ih4TBhOZWYwUTnN3zkMrSMiks5u32WxgaJpZM4YemjA> .

greghaynes · 2018-12-14T21:23:29Z

I've started poking at adding some more detailed tracing to autoscaler in #2726 - AIUI we only support promethus metrics recording in CI so I am not sure how we want to close that gap. IMO tracing is the right tool for the job here form the development side so I think our options are:

Get zipkin recording in our perf tests
Also report promethus metrics for each of these spans

Thoughts from the CI folks?

/cc @jessiezcc @adrcunha @srinivashegde86

srinivashegde86 · 2019-03-01T18:36:27Z

we have enabled zipkin and prometheus in our perf tests. Have we enabled zipkin tracing for autoscaler? Last time I checked(few months ago :)), zipkin tracing was not available for internal knative components.

srinivashegde86 · 2019-03-01T18:38:04Z

We add the trace-id for every request made from the spoof library https://github.com/knative/pkg/blob/master/test/spoof/spoof.go#L172

greghaynes · 2019-03-01T18:38:35Z

Working on adding tracing internally here: #2726

mattmoor · 2019-06-24T17:34:14Z

@greghaynes You definitely have the best instrumentation here. Would it be possible to try and add it to automation in 0.8?

For "v1" scope, I'd just like to see a benchmark in place that keeps us from going backwards on cold-start performance, not necessarily the deep visibility @greghaynes has, but bonus points the more we have.

mattmoor · 2019-07-11T14:33:16Z

I think we have this covered already in

serving/test/performance/scale_from_zero_test.go

Line 250 in e76807b

func TestScaleFromZero1(t *testing.T) {

and https://mako.dev/benchmark?benchmark_key=5762723203776512&maxruns=100. Sure they could be better, but we are tracking them.

yuxiaoba · 2019-11-01T06:06:27Z

Is there a way to get the cold-start timing collection from Prometheus now? @greghaynes

nimakaviani · 2019-11-09T18:44:36Z

@yuxiaoba not yet, the implementation is still experimental and hasn't been merged. If you are interested in the implementation, you can have a look here: https://github.com/nimakaviani/serving/tree/pod-tracing-redux

knative-prow-robot added area/monitoring kind/feature Well-understood/specified features, ready for coding. labels Nov 14, 2018

josephburnett added this to the Scaling: Sub-Second Cold Start milestone Nov 15, 2018

greghaynes mentioned this issue Nov 30, 2018

Add scale from zero performance test #2323

Merged

greghaynes mentioned this issue Dec 14, 2018

Add tracing to activator #2726

Merged

greghaynes mentioned this issue Mar 1, 2019

Add more detailed measurements for scaling performance tests #3329

Closed

mattmoor modified the milestones: Performance: Sub-Second Cold Start, Serving "v1" (ready for production) May 6, 2019

mattmoor added area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels May 15, 2019

eallred-google added the P1 P1 label Jun 6, 2019

mattmoor modified the milestones: Serving "v1" (ready for production), Serving 0.8 Jun 24, 2019

mattmoor closed this as completed Jul 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate cold-start timing collection #2495

Automate cold-start timing collection #2495

greghaynes commented Nov 14, 2018

greghaynes commented Nov 14, 2018

vantuvt commented Dec 10, 2018

josephburnett commented Dec 10, 2018

vantuvt commented Dec 10, 2018

greghaynes commented Dec 10, 2018

markusthoemmes commented Dec 11, 2018

sjug commented Dec 11, 2018 via email

greghaynes commented Dec 14, 2018

srinivashegde86 commented Mar 1, 2019

srinivashegde86 commented Mar 1, 2019

greghaynes commented Mar 1, 2019

mattmoor commented Jun 24, 2019

mattmoor commented Jul 11, 2019

yuxiaoba commented Nov 1, 2019 •

edited

Loading

nimakaviani commented Nov 9, 2019

Automate cold-start timing collection #2495

Automate cold-start timing collection #2495

Comments

greghaynes commented Nov 14, 2018

Expected Behavior

Actual Behavior

greghaynes commented Nov 14, 2018

vantuvt commented Dec 10, 2018

josephburnett commented Dec 10, 2018

vantuvt commented Dec 10, 2018

greghaynes commented Dec 10, 2018

markusthoemmes commented Dec 11, 2018

sjug commented Dec 11, 2018 via email

greghaynes commented Dec 14, 2018

srinivashegde86 commented Mar 1, 2019

srinivashegde86 commented Mar 1, 2019

greghaynes commented Mar 1, 2019

mattmoor commented Jun 24, 2019

mattmoor commented Jul 11, 2019

yuxiaoba commented Nov 1, 2019 • edited Loading

nimakaviani commented Nov 9, 2019

yuxiaoba commented Nov 1, 2019 •

edited

Loading