Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: add busy_duration stats #4179

Merged
merged 7 commits into from
Nov 8, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 79 additions & 1 deletion tokio/src/runtime/stats/stats.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
//! This file contains the types necessary to collect various types of stats.
use crate::loom::sync::atomic::{AtomicU64, Ordering::Relaxed};

use std::convert::TryFrom;
use std::time::{Duration, Instant};

/// This type contains methods to retrieve stats from a Tokio runtime.
#[derive(Debug)]
pub struct RuntimeStats {
Expand All @@ -14,6 +17,10 @@ pub struct WorkerStats {
park_count: AtomicU64,
steal_count: AtomicU64,
poll_count: AtomicU64,
busy_duration_min: AtomicU64,
busy_duration_max: AtomicU64,
busy_duration_last: AtomicU64,
busy_duration_total: AtomicU64,
}

impl RuntimeStats {
Expand All @@ -24,6 +31,10 @@ impl RuntimeStats {
park_count: AtomicU64::new(0),
steal_count: AtomicU64::new(0),
poll_count: AtomicU64::new(0),
busy_duration_min: AtomicU64::new(0),
busy_duration_max: AtomicU64::new(0),
busy_duration_last: AtomicU64::new(0),
busy_duration_total: AtomicU64::new(0),
});
}

Expand Down Expand Up @@ -54,13 +65,43 @@ impl WorkerStats {
pub fn poll_count(&self) -> u64 {
self.poll_count.load(Relaxed)
}

/// Returns the duration for which the runtime was busy last time it was busy.
pub fn busy_duration_last(&self) -> Duration {
Duration::from_nanos(self.busy_duration_last.load(Relaxed))
}

/// Returns the total amount of time this worker has been busy for.
pub fn busy_duration_total(&self) -> Duration {
Duration::from_nanos(self.busy_duration_total.load(Relaxed))
}

/// Returns the smallest busy duration since the last 16 parks.
pub fn busy_duration_min(&self) -> Duration {
Duration::from_nanos(self.busy_duration_min.load(Relaxed))
}

/// Returns the largest busy duration since the last 16 parks.
pub fn busy_duration_max(&self) -> Duration {
Duration::from_nanos(self.busy_duration_max.load(Relaxed))
}
}

pub(crate) struct WorkerStatsBatcher {
my_index: usize,
park_count: u64,
steal_count: u64,
poll_count: u64,
/// The last 16 busy durations in nanoseconds.
///
/// This array is set to contain the same value 16 times after the first
/// iteration. Since we only publish the min, max and last value in the
/// array, then this gives the correct result.
busy_duration: [u64; 16],
busy_duration_i: usize,
/// The total busy duration in nanoseconds.
busy_duration_total: u64,
last_resume_time: Instant,
}

impl WorkerStatsBatcher {
Expand All @@ -70,6 +111,10 @@ impl WorkerStatsBatcher {
park_count: 0,
steal_count: 0,
poll_count: 0,
busy_duration: [0; 16],
busy_duration_i: usize::MAX,
busy_duration_total: 0,
last_resume_time: Instant::now(),
}
}
pub(crate) fn submit(&mut self, to: &RuntimeStats) {
Expand All @@ -78,13 +123,46 @@ impl WorkerStatsBatcher {
worker.park_count.store(self.park_count, Relaxed);
worker.steal_count.store(self.steal_count, Relaxed);
worker.poll_count.store(self.poll_count, Relaxed);

let mut min = u64::MAX;
let mut max = 0;
let last = self.busy_duration[self.busy_duration_i % 16];
for &val in &self.busy_duration {
if val <= min {
min = val;
}
if val >= max {
max = val;
}
}
worker.busy_duration_min.store(min, Relaxed);
worker.busy_duration_max.store(max, Relaxed);
worker.busy_duration_last.store(last, Relaxed);
worker
.busy_duration_total
.store(self.busy_duration_total, Relaxed);
}

pub(crate) fn about_to_park(&mut self) {
self.park_count += 1;

let busy_duration = self.last_resume_time.elapsed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should just add up those busy_duration times per worker in a continuously incrementing time value - then the application could still calculate diffs as below?

I guess the benefit of tokio doing it is that one can get smaller durations than one could get if only querying the runtime states every 1s - but on the other hand all information about everything except the last 16 parks is lost?

Maybe both metrics would be helpful, the smaller one for getting some samples around busy times and the full aggregate for having overall load averages (total_busy_time / total_time_elapsed is load on the worker), but for the fine grained one I'm not sure if 16parks is a good unit or not. It's a pretty short duration in the typical case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to split the considerations, I'm happy to start trying out one version with just a total_busy_duration metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a total_busy_time sounds reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which direction do you prefer than? Adding that and keeping the other parts? Or just adding total_busy_time for now? I'm good with any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding total_busy_time to this PR sounds reasonable to me.

let busy_duration = u64::try_from(busy_duration.as_nanos()).unwrap_or(u64::MAX);
self.busy_duration_total += busy_duration;
if self.busy_duration_i == usize::MAX {
// We are parking for the first time. Set array to contain current
// duration in every slot.
self.busy_duration_i = 0;
self.busy_duration = [busy_duration; 16];
} else {
self.busy_duration_i = (self.busy_duration_i + 1) % 16;
self.busy_duration[self.busy_duration_i] = busy_duration;
}
}

pub(crate) fn returned_from_park(&mut self) {}
pub(crate) fn returned_from_park(&mut self) {
self.last_resume_time = Instant::now();
}

#[cfg(feature = "rt-multi-thread")]
pub(crate) fn incr_steal_count(&mut self, by: u16) {
Expand Down