-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: reduce disabled span Drop
overhead
#1974
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added
crate/tracing
Related to the `tracing` crate
kind/perf
Performance improvements
labels
Mar 8, 2022
tobz
approved these changes
Mar 8, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core premise of the change looks good. Overhead reduction is 👌🏻 .
(All of this is modulo the failing checks, obviously.)
yeah, i broke the |
Signed-off-by: Eliza Weisman <[email protected]>
That's a really impressive performance improvement! |
hawkw
added a commit
that referenced
this pull request
Mar 8, 2022
## Motivation Disabled spans introduce a non-trivial amount of overhead, even when no `tracing` subscriber is in use. This is primarily due to the need to create and then drop the empty `Span` struct even when the span is disabled. While thinking about #1970 a bit, I noticed that one source of overhead is that dropping a disabled span always causes a function call, even when the span is empty. This could be avoided. ## Solution In this branch, I've changed the `Drop` impls for `Span`, `Entered`, and `EnteredSpan` to be `#[inline(always)]`. In the always-inlined functions, we perform a check for whether or not the span is empty, and if it is not empty, we call into the dispatcher method to drop the span or guard. The dispatcher methods are no longer inlined. Now, the function call only occurs when the span _is_ enabled, rather than always occurring in the `Drop` call. This significantly reduces the overhead for holding a disabled span, or a disabled `Entered` guard, in a scope. Also, the `log` integration when dropping a span would always check if the span had metadata, even when `log` is disabled. This means we would do an extra branch that wasn't necessary. I moved that into the macro that guards for whether or not the `log` crate is enabled, which also significantly reduces overhead. This change reduces the overhead of a disabled span by 50-70%, per the `no_subscriber.rs` benchmarks. I also improved those benchmarks a bit to test more cases, in order to find the precise difference in overhead between just constructing `Span::none()` and the actual `span!` macros. <details> <summary><code>no_subscriber.rs</code> benchmark results</summary> ``` no_subscriber/span time: [696.37 ps 696.53 ps 696.73 ps] change: [-50.599% -50.577% -50.544%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe no_subscriber/span_enter time: [465.58 ps 466.35 ps 467.61 ps] change: [-71.350% -71.244% -71.138%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe no_subscriber/empty_span time: [226.15 ps 226.73 ps 227.36 ps] change: [-84.404% -84.045% -83.663%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe no_subscriber/empty_struct time: [693.32 ps 693.76 ps 694.30 ps] change: [+1.7164% +1.9701% +2.2540%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe no_subscriber/event time: [294.32 ps 301.68 ps 310.85 ps] change: [+0.3073% +2.1111% +4.1919%] (p = 0.03 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe no_subscriber/relaxed_load time: [463.24 ps 463.74 ps 464.33 ps] change: [+1.4046% +1.6735% +1.9366%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 1 (1.00%) low severe 6 (6.00%) high mild 9 (9.00%) high severe no_subscriber/acquire_load time: [465.28 ps 465.68 ps 466.08 ps] change: [+0.6837% +1.1755% +1.6034%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe no_subscriber/log time: [231.11 ps 231.27 ps 231.45 ps] change: [-4.4700% -2.3810% -0.9164%] (p = 0.00 < 0.05) Change within noise threshold. Found 17 outliers among 100 measurements (17.00%) 3 (3.00%) low mild 8 (8.00%) high mild 6 (6.00%) high severe no_subscriber_field/span time: [1.6334 ns 1.6343 ns 1.6354 ns] change: [-12.401% -12.337% -12.279%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild no_subscriber_field/event time: [461.54 ps 461.84 ps 462.14 ps] change: [-0.3654% +0.1235% +0.5557%] (p = 0.62 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe no_subscriber_field/log time: [463.52 ps 463.98 ps 464.49 ps] change: [+0.3011% +0.8645% +1.6355%] (p = 0.01 < 0.05) Change within noise threshold. Found 18 outliers among 100 measurements (18.00%) 4 (4.00%) low mild 10 (10.00%) high mild 4 (4.00%) high severe ``` </details> Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Mar 8, 2022
## Motivation Disabled spans introduce a non-trivial amount of overhead, even when no `tracing` subscriber is in use. This is primarily due to the need to create and then drop the empty `Span` struct even when the span is disabled. While thinking about #1970 a bit, I noticed that one source of overhead is that dropping a disabled span always causes a function call, even when the span is empty. This could be avoided. ## Solution In this branch, I've changed the `Drop` impls for `Span`, `Entered`, and `EnteredSpan` to be `#[inline(always)]`. In the always-inlined functions, we perform a check for whether or not the span is empty, and if it is not empty, we call into the dispatcher method to drop the span or guard. The dispatcher methods are no longer inlined. Now, the function call only occurs when the span _is_ enabled, rather than always occurring in the `Drop` call. This significantly reduces the overhead for holding a disabled span, or a disabled `Entered` guard, in a scope. Also, the `log` integration when dropping a span would always check if the span had metadata, even when `log` is disabled. This means we would do an extra branch that wasn't necessary. I moved that into the macro that guards for whether or not the `log` crate is enabled, which also significantly reduces overhead. This change reduces the overhead of a disabled span by 50-70%, per the `no_subscriber.rs` benchmarks. I also improved those benchmarks a bit to test more cases, in order to find the precise difference in overhead between just constructing `Span::none()` and the actual `span!` macros. <details> <summary><code>no_subscriber.rs</code> benchmark results</summary> ``` no_subscriber/span time: [696.37 ps 696.53 ps 696.73 ps] change: [-50.599% -50.577% -50.544%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe no_subscriber/span_enter time: [465.58 ps 466.35 ps 467.61 ps] change: [-71.350% -71.244% -71.138%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe no_subscriber/empty_span time: [226.15 ps 226.73 ps 227.36 ps] change: [-84.404% -84.045% -83.663%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe no_subscriber/empty_struct time: [693.32 ps 693.76 ps 694.30 ps] change: [+1.7164% +1.9701% +2.2540%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe no_subscriber/event time: [294.32 ps 301.68 ps 310.85 ps] change: [+0.3073% +2.1111% +4.1919%] (p = 0.03 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe no_subscriber/relaxed_load time: [463.24 ps 463.74 ps 464.33 ps] change: [+1.4046% +1.6735% +1.9366%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 1 (1.00%) low severe 6 (6.00%) high mild 9 (9.00%) high severe no_subscriber/acquire_load time: [465.28 ps 465.68 ps 466.08 ps] change: [+0.6837% +1.1755% +1.6034%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe no_subscriber/log time: [231.11 ps 231.27 ps 231.45 ps] change: [-4.4700% -2.3810% -0.9164%] (p = 0.00 < 0.05) Change within noise threshold. Found 17 outliers among 100 measurements (17.00%) 3 (3.00%) low mild 8 (8.00%) high mild 6 (6.00%) high severe no_subscriber_field/span time: [1.6334 ns 1.6343 ns 1.6354 ns] change: [-12.401% -12.337% -12.279%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild no_subscriber_field/event time: [461.54 ps 461.84 ps 462.14 ps] change: [-0.3654% +0.1235% +0.5557%] (p = 0.62 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe no_subscriber_field/log time: [463.52 ps 463.98 ps 464.49 ps] change: [+0.3011% +0.8645% +1.6355%] (p = 0.01 < 0.05) Change within noise threshold. Found 18 outliers among 100 measurements (18.00%) 4 (4.00%) low mild 10 (10.00%) high mild 4 (4.00%) high severe ``` </details> Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Mar 9, 2022
hawkw
added a commit
that referenced
this pull request
Mar 9, 2022
# 0.1.32 (March 8th, 2022) This release reduces the overhead of creating and dropping disabled spans significantly, which should improve performance when no `tracing` subscriber is in use or when spans are disabled by a filter. ### Fixed - **attributes**: Compilation failure with `--minimal-versions` due to a too-permissive `syn` dependency ([#1960]) ### Changed - Reduced `Drop` overhead for disabled spans ([#1974]) - `tracing-attributes`: updated to [0.1.20][attributes-0.1.20] [#1974]: #1974 [#1960]: #1960 [attributes-0.1.20]: https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.20
hawkw
added a commit
that referenced
this pull request
Mar 9, 2022
# 0.1.32 (March 8th, 2022) This release reduces the overhead of creating and dropping disabled spans significantly, which should improve performance when no `tracing` subscriber is in use or when spans are disabled by a filter. ### Fixed - **attributes**: Compilation failure with `--minimal-versions` due to a too-permissive `syn` dependency ([#1960]) ### Changed - Reduced `Drop` overhead for disabled spans ([#1974]) - `tracing-attributes`: updated to [0.1.20][attributes-0.1.20] [#1974]: #1974 [#1960]: #1960 [attributes-0.1.20]: https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.20
hawkw
added a commit
that referenced
this pull request
Mar 9, 2022
hawkw
added a commit
that referenced
this pull request
Mar 9, 2022
# 0.1.32 (March 8th, 2022) This release reduces the overhead of creating and dropping disabled spans significantly, which should improve performance when no `tracing` subscriber is in use or when spans are disabled by a filter. ### Fixed - **attributes**: Compilation failure with `--minimal-versions` due to a too-permissive `syn` dependency ([#1960]) ### Changed - Reduced `Drop` overhead for disabled spans ([#1974]) - `tracing-attributes`: updated to [0.1.20][attributes-0.1.20] [#1974]: #1974 [#1960]: #1960 [attributes-0.1.20]: https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.20
kaffarell
pushed a commit
to kaffarell/tracing
that referenced
this pull request
May 22, 2024
Disabled spans introduce a non-trivial amount of overhead, even when no `tracing` subscriber is in use. This is primarily due to the need to create and then drop the empty `Span` struct even when the span is disabled. While thinking about tokio-rs#1970 a bit, I noticed that one source of overhead is that dropping a disabled span always causes a function call, even when the span is empty. This could be avoided. In this branch, I've changed the `Drop` impls for `Span`, `Entered`, and `EnteredSpan` to be `#[inline(always)]`. In the always-inlined functions, we perform a check for whether or not the span is empty, and if it is not empty, we call into the dispatcher method to drop the span or guard. The dispatcher methods are no longer inlined. Now, the function call only occurs when the span _is_ enabled, rather than always occurring in the `Drop` call. This significantly reduces the overhead for holding a disabled span, or a disabled `Entered` guard, in a scope. Also, the `log` integration when dropping a span would always check if the span had metadata, even when `log` is disabled. This means we would do an extra branch that wasn't necessary. I moved that into the macro that guards for whether or not the `log` crate is enabled, which also significantly reduces overhead. This change reduces the overhead of a disabled span by 50-70%, per the `no_subscriber.rs` benchmarks. I also improved those benchmarks a bit to test more cases, in order to find the precise difference in overhead between just constructing `Span::none()` and the actual `span!` macros. <details> <summary><code>no_subscriber.rs</code> benchmark results</summary> ``` no_subscriber/span time: [696.37 ps 696.53 ps 696.73 ps] change: [-50.599% -50.577% -50.544%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe no_subscriber/span_enter time: [465.58 ps 466.35 ps 467.61 ps] change: [-71.350% -71.244% -71.138%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe no_subscriber/empty_span time: [226.15 ps 226.73 ps 227.36 ps] change: [-84.404% -84.045% -83.663%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe no_subscriber/empty_struct time: [693.32 ps 693.76 ps 694.30 ps] change: [+1.7164% +1.9701% +2.2540%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe no_subscriber/event time: [294.32 ps 301.68 ps 310.85 ps] change: [+0.3073% +2.1111% +4.1919%] (p = 0.03 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe no_subscriber/relaxed_load time: [463.24 ps 463.74 ps 464.33 ps] change: [+1.4046% +1.6735% +1.9366%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 1 (1.00%) low severe 6 (6.00%) high mild 9 (9.00%) high severe no_subscriber/acquire_load time: [465.28 ps 465.68 ps 466.08 ps] change: [+0.6837% +1.1755% +1.6034%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe no_subscriber/log time: [231.11 ps 231.27 ps 231.45 ps] change: [-4.4700% -2.3810% -0.9164%] (p = 0.00 < 0.05) Change within noise threshold. Found 17 outliers among 100 measurements (17.00%) 3 (3.00%) low mild 8 (8.00%) high mild 6 (6.00%) high severe no_subscriber_field/span time: [1.6334 ns 1.6343 ns 1.6354 ns] change: [-12.401% -12.337% -12.279%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild no_subscriber_field/event time: [461.54 ps 461.84 ps 462.14 ps] change: [-0.3654% +0.1235% +0.5557%] (p = 0.62 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe no_subscriber_field/log time: [463.52 ps 463.98 ps 464.49 ps] change: [+0.3011% +0.8645% +1.6355%] (p = 0.01 < 0.05) Change within noise threshold. Found 18 outliers among 100 measurements (18.00%) 4 (4.00%) low mild 10 (10.00%) high mild 4 (4.00%) high severe ``` </details> Signed-off-by: Eliza Weisman <[email protected]>
kaffarell
pushed a commit
to kaffarell/tracing
that referenced
this pull request
May 22, 2024
# 0.1.23 (March 8, 2022) ### Changed - Removed `#[inline]` attributes from some `Dispatch` methods whose callers are now inlined ([tokio-rs#1974]) - Bumped minimum supported Rust version (MSRV) to Rust 1.49.0 ([tokio-rs#1913]) [tokio-rs#1913]: tokio-rs#1913 [tokio-rs#1974]: tokio-rs#1974
kaffarell
pushed a commit
to kaffarell/tracing
that referenced
this pull request
May 22, 2024
# 0.1.32 (March 8th, 2022) This release reduces the overhead of creating and dropping disabled spans significantly, which should improve performance when no `tracing` subscriber is in use or when spans are disabled by a filter. ### Fixed - **attributes**: Compilation failure with `--minimal-versions` due to a too-permissive `syn` dependency ([tokio-rs#1960]) ### Changed - Reduced `Drop` overhead for disabled spans ([tokio-rs#1974]) - `tracing-attributes`: updated to [0.1.20][attributes-0.1.20] [tokio-rs#1974]: tokio-rs#1974 [tokio-rs#1960]: tokio-rs#1960 [attributes-0.1.20]: https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.20
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Disabled spans introduce a non-trivial amount of overhead, even when no
tracing
subscriber is in use. This is primarily due to the need tocreate and then drop the empty
Span
struct even when the span isdisabled.
While thinking about #1970 a bit, I noticed that one source of overhead
is that dropping a disabled span always causes a function call, even
when the span is empty. This could be avoided.
Solution
In this branch, I've changed the
Drop
impls forSpan
,Entered
, andEnteredSpan
to be#[inline(always)]
. In the always-inlinedfunctions, we perform a check for whether or not the span is empty, and
if it is not empty, we call into the dispatcher method to drop the span
or guard. The dispatcher methods are no longer inlined. Now, the
function call only occurs when the span is enabled, rather than always
occurring in the
Drop
call. This significantly reduces the overheadfor holding a disabled span, or a disabled
Entered
guard, in a scope.Also, the
log
integration when dropping a span would always check ifthe span had metadata, even when
log
is disabled. This means we woulddo an extra branch that wasn't necessary. I moved that into the macro
that guards for whether or not the
log
crate is enabled, which alsosignificantly reduces overhead.
This change reduces the overhead of a disabled span by 50-70%, per the
no_subscriber.rs
benchmarks. I also improved those benchmarks abit to test more cases, in order to find the precise difference in overhead
between just constructing
Span::none()
and the actualspan!
macros.no_subscriber.rs
benchmark results