Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: add busy_duration stats #4179

Merged
merged 7 commits into from
Nov 8, 2021
Merged

stats: add busy_duration stats #4179

merged 7 commits into from
Nov 8, 2021

Conversation

Darksonn
Copy link
Contributor

Add busy duration statistic. Refs: #3845

cc @Matthias247

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics labels Oct 19, 2021
}

pub(crate) fn about_to_park(&mut self) {
self.park_count += 1;

let busy_duration = self.last_resume_time.elapsed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should just add up those busy_duration times per worker in a continuously incrementing time value - then the application could still calculate diffs as below?

I guess the benefit of tokio doing it is that one can get smaller durations than one could get if only querying the runtime states every 1s - but on the other hand all information about everything except the last 16 parks is lost?

Maybe both metrics would be helpful, the smaller one for getting some samples around busy times and the full aggregate for having overall load averages (total_busy_time / total_time_elapsed is load on the worker), but for the fine grained one I'm not sure if 16parks is a good unit or not. It's a pretty short duration in the typical case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to split the considerations, I'm happy to start trying out one version with just a total_busy_duration metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a total_busy_time sounds reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which direction do you prefer than? Adding that and keeping the other parts? Or just adding total_busy_time for now? I'm good with any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding total_busy_time to this PR sounds reasonable to me.

@Darksonn Darksonn requested a review from Matthias247 November 2, 2021 12:37
@LucioFranco LucioFranco merged commit f45320a into master Nov 8, 2021
@LucioFranco LucioFranco deleted the busy_duration branch November 8, 2021 19:59
@carllerche
Copy link
Member

@Darksonn it seemed like the conclusion was to only add total_busy_time but the other metrics are still exposed. Did I misundertand?

@carllerche
Copy link
Member

I would recommend sticking w/ total. busy_total + a loop counter (park_count). Those two combined should be enough to get a good sense for busy time per iteration. The listener can compute an EWMA of busy time per park and that should be good enough

@carllerche
Copy link
Member

If the goal is to be able to see outliers hidden by averages, one option besides using histograms would be to let the user specify duration bands and track the total busy time within those bands. For example, the user could specify that they expect most park loops to spend at most 1ms "busy", that 1ms -> 5ms is "high", and over 5ms is unexpected. Using user-specified bands, we can track the total busy time within those bands along with the total count of times a loop landed in the band. The observer could use those total times to compute EMWAs. This should satisfy the objective without having to manage synchronization related to passing histograms out of the runtime.

hawkw added a commit that referenced this pull request Nov 15, 2021
# 1.14.0 (November 15, 2021)

### Fixed

- macros: fix compiler errors when using `mut` patterns in `select!`
  ([#4211])
- sync: fix a data race between `oneshot::Sender::send` and awaiting a
  `oneshot::Receiver` when the oneshot has been closed ([#4226])
- sync: make `AtomicWaker` panic safe ([#3689])
- runtime: fix basic scheduler dropping tasks outside a runtime context
  ([#4213])

### Added

- stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223])

### Changed

- io: updated `copy` buffer size to match `std::io::copy` ([#4209])

### Documented

- io: rename buffer to file in doc-test ([#4230])
- sync: fix Notify example ([#4212])

[#4211]: #4211
[#4226]: #4226
[#3689]: #3689
[#4213]: #4213
[#4179]: #4179
[#4223]: #4223
[#4209]: #4209
[#4230]: #4230
[#4212]: #4212
hawkw added a commit that referenced this pull request Nov 16, 2021
# 1.14.0 (November 15, 2021)

### Fixed

- macros: fix compiler errors when using `mut` patterns in `select!`
  ([#4211])
- sync: fix a data race between `oneshot::Sender::send` and awaiting a
  `oneshot::Receiver` when the oneshot has been closed ([#4226])
- sync: make `AtomicWaker` panic safe ([#3689])
- runtime: fix basic scheduler dropping tasks outside a runtime context
  ([#4213])

### Added

- stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223])

### Changed

- io: updated `copy` buffer size to match `std::io::copy` ([#4209])

### Documented

- io: rename buffer to file in doc-test ([#4230])
- sync: fix Notify example ([#4212])

[#4211]: #4211
[#4226]: #4226
[#3689]: #3689
[#4213]: #4213
[#4179]: #4179
[#4223]: #4223
[#4209]: #4209
[#4230]: #4230
[#4212]: #4212
hawkw added a commit that referenced this pull request Nov 16, 2021
# 1.14.0 (November 15, 2021)

### Fixed

- macros: fix compiler errors when using `mut` patterns in `select!`
  ([#4211])
- sync: fix a data race between `oneshot::Sender::send` and awaiting a
  `oneshot::Receiver` when the oneshot has been closed ([#4226])
- sync: make `AtomicWaker` panic safe ([#3689])
- runtime: fix basic scheduler dropping tasks outside a runtime context
  ([#4213])

### Added

- stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223])

### Changed

- io: updated `copy` buffer size to match `std::io::copy` ([#4209])

### Documented

- io: rename buffer to file in doc-test ([#4230])
- sync: fix Notify example ([#4212])

[#4211]: #4211
[#4226]: #4226
[#3689]: #3689
[#4213]: #4213
[#4179]: #4179
[#4223]: #4223
[#4209]: #4209
[#4230]: #4230
[#4212]: #4212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants