Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate cold-start timing collection #2495

Closed
greghaynes opened this issue Nov 14, 2018 · 15 comments
Closed

Automate cold-start timing collection #2495

greghaynes opened this issue Nov 14, 2018 · 15 comments
Labels
area/autoscale area/monitoring area/test-and-release It flags unit/e2e/conformance/perf test issues for product features kind/feature Well-understood/specified features, ready for coding. P1 P1
Milestone

Comments

@greghaynes
Copy link
Contributor

Expected Behavior

I expect to have a record of time spent broken down in to components during cold-start. Including:

  • How much time is spent before we ask our deployment to scale up
  • How much time is spent before our user application begins executing

This should be automated and ideally reported to somewhere we can track over time (similar to testgrid).

Actual Behavior

These measurements can only be gotten ad-hoc.

@knative-prow-robot knative-prow-robot added area/monitoring kind/feature Well-understood/specified features, ready for coding. labels Nov 14, 2018
@greghaynes
Copy link
Contributor Author

/milestone Scaling: Sub-Second Cold Start

@josephburnett josephburnett added this to the Scaling: Sub-Second Cold Start milestone Nov 15, 2018
@vantuvt
Copy link

vantuvt commented Dec 10, 2018

I'm curious, are there any open proposals on how to address this feature request at this time?

@josephburnett
Copy link
Contributor

@sjug you worked on this a bit. Do you have any recommendations?

@vantuvt
Copy link

vantuvt commented Dec 10, 2018

I don't have any recommendations at this time. However, I'd be interested in learning more about ideas that others have discussed in this space.

@greghaynes
Copy link
Contributor Author

#2323 would be nice to get in for reporting 'how much time is spent before our application begins executing' to testgrid

@markusthoemmes
Copy link
Contributor

#2667 will somewhat contribute to this as well, as it'll measure the time it takes to scale an application up from 1-X, arguably cold-start as well, just a slightly different category.

@sjug
Copy link

sjug commented Dec 11, 2018 via email

@greghaynes
Copy link
Contributor Author

I've started poking at adding some more detailed tracing to autoscaler in #2726 - AIUI we only support promethus metrics recording in CI so I am not sure how we want to close that gap. IMO tracing is the right tool for the job here form the development side so I think our options are:

  • Get zipkin recording in our perf tests
  • Also report promethus metrics for each of these spans

Thoughts from the CI folks?

/cc @jessiezcc @adrcunha @srinivashegde86

@srinivashegde86
Copy link
Contributor

we have enabled zipkin and prometheus in our perf tests. Have we enabled zipkin tracing for autoscaler? Last time I checked(few months ago :)), zipkin tracing was not available for internal knative components.

@srinivashegde86
Copy link
Contributor

We add the trace-id for every request made from the spoof library https://github.com/knative/pkg/blob/master/test/spoof/spoof.go#L172

@greghaynes
Copy link
Contributor Author

Working on adding tracing internally here: #2726

@mattmoor mattmoor modified the milestones: Performance: Sub-Second Cold Start, Serving "v1" (ready for production) May 6, 2019
@mattmoor mattmoor added area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels May 15, 2019
@eallred-google eallred-google added the P1 P1 label Jun 6, 2019
@mattmoor
Copy link
Member

@greghaynes You definitely have the best instrumentation here. Would it be possible to try and add it to automation in 0.8?

For "v1" scope, I'd just like to see a benchmark in place that keeps us from going backwards on cold-start performance, not necessarily the deep visibility @greghaynes has, but bonus points the more we have.

@mattmoor
Copy link
Member

I think we have this covered already in

func TestScaleFromZero1(t *testing.T) {
and https://mako.dev/benchmark?benchmark_key=5762723203776512&maxruns=100. Sure they could be better, but we are tracking them.

@yuxiaoba
Copy link

yuxiaoba commented Nov 1, 2019

Is there a way to get the cold-start timing collection from Prometheus now? @greghaynes

@nimakaviani
Copy link
Contributor

@yuxiaoba not yet, the implementation is still experimental and hasn't been merged. If you are interested in the implementation, you can have a look here: https://github.com/nimakaviani/serving/tree/pod-tracing-redux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscale area/monitoring area/test-and-release It flags unit/e2e/conformance/perf test issues for product features kind/feature Well-understood/specified features, ready for coding. P1 P1
Projects
None yet
Development

No branches or pull requests