Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: remove mandatory liballoc dependency with no-std #1017

Merged
merged 21 commits into from
Oct 7, 2020

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Oct 6, 2020

Motivation

Presently, tracing and tracing-core allow disabling the Rust
standard library. However, even when used with the standard library
disabled, they still require the use of alloc, which means some form
of global allocator must exist. This can be a significant limitation on
some very constrained embedded devices. Additionally, the use of any
heap allocations on these devices ought to be avoided.

Now that the callsite registry is no longer a Vec, we don't need heap
allocations for storing registered callsites. However, Dispatchs are
currently always stored in Arcs, and the global registry always
contains a Vec of Weak refs to them. This is required in order to
support thread-local scoped dispatcher API, where multiple Dispatches
can coexist. However, on no-std platforms without threads, we already
don't support the thread-local scoped dispatch API. In this case, we can
reduce the complexity and the number of allocations required by the
dispatcher.

Additionally, even when the standard library is in use, if the
dispatcher is a global default rather than a scoped dispatcher, it
remains set forever. This means it can never be dropped. When this is
the case, it's possible to clone the dispatcher by cloning a 'static
reference, rather than a (comparatively) costly Arc bump.

Solution

This branch changes the global default dispatcher to be represented as a
&'static dyn Subscriber reference on all platforms. In addition, the
dependency on liballoc is now feature flagged, allowing it to be
disabled along with the dependency on libstd. This means that
tracing can now be used with no heap allocations required.

In order to use tracing without liballoc, Dispatches can only be
created from &'static dyn Subscriber references, since this is the
only way to erase their types that exists without boxed trait objects.
Therefore, I've added a new Dispatch::from_static constructor that is
always available. This is more limiting than Dispatch::new --- some
ways of modeling runtime composition of the subscriber are much harder
(although it could still contain state). However, users without alloc
are probably fairly used to this --- the log crate has the same
limitation
when used
without std.

Also, I've changed the global subscriber registry when std is
enabled to use a RwLock rather than a mutex. Now, multiple callsites
can be registered simultaneously, and a write lock is only necessary to
add a new subscriber or deregister old ones. It's now fine to use
RwLock here, because we no longer use any locking at all when std is
not enabled --- we always use the single global default subscriber, so
we don't require a list of currently live subscribers.

I've modified the tracing and tracing-core crates to add new feature
flags for enabling std and alloc separately.

Closes #861

@hawkw hawkw mentioned this pull request Oct 6, 2020
@hawkw hawkw force-pushed the eliza/no-clone-global-dispatch branch from ddb2f6f to a134fbe Compare October 6, 2020 18:17
@hawkw hawkw marked this pull request as ready for review October 6, 2020 22:12
@hawkw hawkw requested a review from a team as a code owner October 6, 2020 22:12
@hawkw hawkw changed the title [WIP] core: remove liballoc dependency with no-std core: remove mandatory liballoc dependency with no-std Oct 6, 2020
@hawkw
Copy link
Member Author

hawkw commented Oct 6, 2020

cc @Hoverbear, @steveklabnik, and any other potential no_std users

@hawkw hawkw self-assigned this Oct 6, 2020
@hawkw hawkw added area/no-std Related to improving support for no-std. crate/core Related to the `tracing-core` crate crate/tracing Related to the `tracing` crate kind/feature New feature or request meta/breaking This is a breaking change, and should wait until the next breaking release. labels Oct 6, 2020
@hawkw hawkw added this to the tracing-core 0.2 milestone Oct 6, 2020
Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; Only small nits, and do we have CI checking this?

tracing-core/src/callsite.rs Show resolved Hide resolved
tracing-core/src/callsite.rs Show resolved Hide resolved
tracing-core/src/dispatcher.rs Outdated Show resolved Hide resolved
tracing-core/src/dispatcher.rs Show resolved Hide resolved
@@ -669,6 +670,60 @@
//!
//! [log-tracer]: https://docs.rs/tracing-log/latest/tracing_log/#convert-log-records-to-tracing-events
//!
//! ### `no_std` Support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just link to core's docs here rather than repeating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's likely that more people will be reading the tracing docs than the tracing-core ones, so i thought it was better to reproduce it here...@davidbarsky wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather reproduce here for now.

hawkw and others added 3 commits October 7, 2020 09:29
@hawkw hawkw merged commit 940808c into master Oct 7, 2020
@steveklabnik
Copy link
Contributor

siiiick excited to give this a try!

dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 16, 2020
…spatch-as-ref-tokio-rs#455

* upstream/master: (28 commits)
  opentelemetry: prepare for 0.8.0 release (tokio-rs#1036)
  docs: add favicon for extra pretty docs (tokio-rs#1033)
  subscriber: fix `reload` ergonomics (tokio-rs#1035)
  chore(deps): update crossbeam-channel requirement from 0.4.2 to 0.5.0 (tokio-rs#1031)
  opentelemetry: Assign default ids if missing (tokio-rs#1027)
  chore: remove deprecated add-path from CI (tokio-rs#1026)
  attributes:  fix `#[instrument(err)]` in case of early returns (tokio-rs#1006)
  core: remove mandatory liballoc dependency with no-std (tokio-rs#1017)
  chore(deps): update cfg-if requirement from 0.1.10 to 1.0.0 (tokio-rs#1023)
  subscriber: warn if trying to enable a statically disabled level (tokio-rs#990)
  subscriber: use macros for module declarations (tokio-rs#1009)
  chore: remove `stdlib.rs` (tokio-rs#1008)
  core: fix linked list tests reusing `Registration`s (tokio-rs#1016)
  subscriber: support dash in target names (tokio-rs#1012)
  docs: switch to intra-doc links in tracing-core (tokio-rs#1010)
  tracing-opentelemetry: implement additional record types (bool, i64, u64) (tokio-rs#1007)
  core: add intrusive linked list for callsite registry (tokio-rs#988)
  serde: allow tracing-serde to work on no_std. (tokio-rs#960)
  tracing: remove `Into<Option<Id>>` impl for `Span` (tokio-rs#1003)
  tracing: make `Entered` `!Send` (tokio-rs#1001)
  ...
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 16, 2020
* upstream/master:
  opentelemetry: prepare for 0.8.0 release (tokio-rs#1036)
  docs: add favicon for extra pretty docs (tokio-rs#1033)
  subscriber: fix `reload` ergonomics (tokio-rs#1035)
  chore(deps): update crossbeam-channel requirement from 0.4.2 to 0.5.0 (tokio-rs#1031)
  opentelemetry: Assign default ids if missing (tokio-rs#1027)
  chore: remove deprecated add-path from CI (tokio-rs#1026)
  attributes:  fix `#[instrument(err)]` in case of early returns (tokio-rs#1006)
  core: remove mandatory liballoc dependency with no-std (tokio-rs#1017)
  chore(deps): update cfg-if requirement from 0.1.10 to 1.0.0 (tokio-rs#1023)
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 19, 2020
…cros

* upstream/master: (30 commits)
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
  opentelemetry: prepare for 0.8.0 release (tokio-rs#1036)
  docs: add favicon for extra pretty docs (tokio-rs#1033)
  subscriber: fix `reload` ergonomics (tokio-rs#1035)
  chore(deps): update crossbeam-channel requirement from 0.4.2 to 0.5.0 (tokio-rs#1031)
  opentelemetry: Assign default ids if missing (tokio-rs#1027)
  chore: remove deprecated add-path from CI (tokio-rs#1026)
  attributes:  fix `#[instrument(err)]` in case of early returns (tokio-rs#1006)
  core: remove mandatory liballoc dependency with no-std (tokio-rs#1017)
  chore(deps): update cfg-if requirement from 0.1.10 to 1.0.0 (tokio-rs#1023)
  subscriber: warn if trying to enable a statically disabled level (tokio-rs#990)
  subscriber: use macros for module declarations (tokio-rs#1009)
  chore: remove `stdlib.rs` (tokio-rs#1008)
  core: fix linked list tests reusing `Registration`s (tokio-rs#1016)
  subscriber: support dash in target names (tokio-rs#1012)
  docs: switch to intra-doc links in tracing-core (tokio-rs#1010)
  tracing-opentelemetry: implement additional record types (bool, i64, u64) (tokio-rs#1007)
  core: add intrusive linked list for callsite registry (tokio-rs#988)
  serde: allow tracing-serde to work on no_std. (tokio-rs#960)
  ...
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 19, 2020
…gger

* upstream/master: (31 commits)
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
  opentelemetry: prepare for 0.8.0 release (tokio-rs#1036)
  docs: add favicon for extra pretty docs (tokio-rs#1033)
  subscriber: fix `reload` ergonomics (tokio-rs#1035)
  chore(deps): update crossbeam-channel requirement from 0.4.2 to 0.5.0 (tokio-rs#1031)
  opentelemetry: Assign default ids if missing (tokio-rs#1027)
  chore: remove deprecated add-path from CI (tokio-rs#1026)
  attributes:  fix `#[instrument(err)]` in case of early returns (tokio-rs#1006)
  core: remove mandatory liballoc dependency with no-std (tokio-rs#1017)
  chore(deps): update cfg-if requirement from 0.1.10 to 1.0.0 (tokio-rs#1023)
  subscriber: warn if trying to enable a statically disabled level (tokio-rs#990)
  subscriber: use macros for module declarations (tokio-rs#1009)
  chore: remove `stdlib.rs` (tokio-rs#1008)
  core: fix linked list tests reusing `Registration`s (tokio-rs#1016)
  subscriber: support dash in target names (tokio-rs#1012)
  docs: switch to intra-doc links in tracing-core (tokio-rs#1010)
  tracing-opentelemetry: implement additional record types (bool, i64, u64) (tokio-rs#1007)
  core: add intrusive linked list for callsite registry (tokio-rs#988)
  ...
dvdplm added a commit to dvdplm/tracing that referenced this pull request Oct 22, 2020
…spatch-as-owned-tokio-rs#455

* upstream/master: (34 commits)
  subscriber: remove TraceLogger (tokio-rs#1052)
  subscriber: make Registry::enter/exit much faster (tokio-rs#1058)
  chore(deps): update env_logger requirement from 0.7 to 0.8 (tokio-rs#1050)
  chore: fix tracing-macros::dbg (tokio-rs#1054)
  chore(deps): update pin-project requirement from 0.4 to 1.0 (tokio-rs#1038)
  chore: remove duplicated section from tracing/README.md (tokio-rs#1046)
  opentelemetry: prepare for 0.8.0 release (tokio-rs#1036)
  docs: add favicon for extra pretty docs (tokio-rs#1033)
  subscriber: fix `reload` ergonomics (tokio-rs#1035)
  chore(deps): update crossbeam-channel requirement from 0.4.2 to 0.5.0 (tokio-rs#1031)
  opentelemetry: Assign default ids if missing (tokio-rs#1027)
  chore: remove deprecated add-path from CI (tokio-rs#1026)
  attributes:  fix `#[instrument(err)]` in case of early returns (tokio-rs#1006)
  core: remove mandatory liballoc dependency with no-std (tokio-rs#1017)
  chore(deps): update cfg-if requirement from 0.1.10 to 1.0.0 (tokio-rs#1023)
  subscriber: warn if trying to enable a statically disabled level (tokio-rs#990)
  subscriber: use macros for module declarations (tokio-rs#1009)
  chore: remove `stdlib.rs` (tokio-rs#1008)
  core: fix linked list tests reusing `Registration`s (tokio-rs#1016)
  subscriber: support dash in target names (tokio-rs#1012)
  ...
hawkw added a commit that referenced this pull request Dec 23, 2021
## Motivation

PR #1017 on `master` added a nice optimization for avoiding the
thread-local access in `dispatcher::get_default` when the thread-local
dispatcher is not in use. Since that PR changed a bunch of other stuff
that we can't also change on 0.1, it never made it to v0.1.x.

This branch backports *just* the `SCOPED_COUNT` optimization from #1017,
which _does_ apply on v0.1.x.
hawkw added a commit that referenced this pull request May 11, 2023
On the `master` (v0.2.x) branch, we track the number of scoped
dispatchers currently active, and use it to determine whether or not to
check thread-local storage at all. This optimization was introduced in
PR #1017.

It turns out that this change prevents the incorrect behavior where a
`None` dispatcher is cached in the thread local if a scoped dispatcher
is used before the global default is set (issue #2587). Therefore, this
commit backports the `SCOPED_COUNT` optimization (and *just* the
`SCOPED_COUNT` change) from #1017 to the `v0.1.x` branch.

Currently, this results in a substantial performance *regression*,
especially in the "no dispatcher" case. This is because the `None`
dispatcher gets cloned a lot now that it's no longer cached. A
subsequent commit will resolve this by backporting more of the changes
from #1017, but I wanted to make this in a separate commit to determine
if it fixes the tests for #2587.

```
     Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432)
Dispatch::get_clone/none
                        time:   [23.525 ns 23.534 ns 23.543 ns]
                        change: [+113.37% +113.59% +113.79%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
Dispatch::get_clone/scoped
                        time:   [11.071 ns 11.078 ns 11.084 ns]
                        change: [-0.1603% -0.0360% +0.1300%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Dispatch::get_clone/global
                        time:   [24.036 ns 24.055 ns 24.075 ns]
                        change: [+118.73% +119.28% +119.84%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

     Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87)
Dispatch::get_ref/none  time:   [14.326 ns 14.365 ns 14.407 ns]
                        change: [+265.93% +266.48% +267.13%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
Dispatch::get_ref/scoped
                        time:   [3.8639 ns 3.8681 ns 3.8725 ns]
                        change: [+0.3381% +0.6460% +0.9565%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe
Dispatch::get_ref/global
                        time:   [14.378 ns 14.472 ns 14.586 ns]
                        change: [+272.95% +274.50% +276.12%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

     Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca)
empty_span/none         time:   [230.31 ps 230.47 ps 230.64 ps]
                        change: [-0.9622% -0.8240% -0.6836%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
empty_span/scoped       time:   [233.29 ps 233.90 ps 234.48 ps]
                        change: [-0.0987% +0.1388% +0.3820%] (p = 0.28 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
empty_span/global       time:   [234.41 ps 234.60 ps 234.78 ps]
                        change: [-0.4066% -0.2249% -0.0607%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  5 (5.00%) low severe
  13 (13.00%) low mild
  1 (1.00%) high mild
empty_span/baseline_struct
                        time:   [704.20 ps 704.52 ps 704.86 ps]
                        change: [-0.2031% +0.0627% +0.2709%] (p = 0.64 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

     Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475)
enter_span/none         time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-43.986% +9.0604% +110.03%] (p = 0.82 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
enter_span/scoped       time:   [2.8087 ns 2.8171 ns 2.8263 ns]
                        change: [-6.2503% -5.8437% -5.5244%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  10 (10.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
enter_span/global       time:   [2.7827 ns 2.7881 ns 2.7942 ns]
                        change: [-7.0005% -6.6984% -6.3814%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

     Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4)
event/none              time:   [233.17 ps 233.70 ps 234.26 ps]
                        change: [+0.8021% +1.2066% +1.7762%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  9 (9.00%) low severe
  8 (8.00%) high mild
  4 (4.00%) high severe
event/scoped            time:   [9.5149 ns 9.5274 ns 9.5408 ns]
                        change: [+4.4596% +4.6910% +4.8995%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
event/scoped_recording  time:   [33.624 ns 34.086 ns 34.597 ns]
                        change: [-7.7753% +1.4363% +11.588%] (p = 0.80 > 0.05)
                        No change in performance detected.
Found 28 outliers among 100 measurements (28.00%)
  12 (12.00%) low mild
  4 (4.00%) high mild
  12 (12.00%) high severe
event/global            time:   [22.544 ns 22.587 ns 22.631 ns]
                        change: [+143.27% +143.94% +144.64%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) low severe
  1 (1.00%) low mild

     Running benches/shared.rs (target/release/deps/shared-9c2531bf46089869)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6)
span_fields/none        time:   [3.2739 ns 3.2814 ns 3.2900 ns]
                        change: [+0.6402% +0.8871% +1.1473%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
span_fields/scoped      time:   [29.177 ns 29.237 ns 29.283 ns]
                        change: [-4.7079% -4.4906% -4.2912%] (p = 0.00 < 0.05)
                        Performance has improved.
span_fields/scoped_recording
                        time:   [269.00 ns 269.32 ns 269.64 ns]
                        change: [+4.7209% +4.8928% +5.0580%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
span_fields/global      time:   [39.700 ns 39.860 ns 40.081 ns]
                        change: [+31.179% +31.920% +33.131%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

     Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442)
span_no_fields/none     time:   [1.3904 ns 1.3914 ns 1.3927 ns]
                        change: [-2.2161% -2.0729% -1.9314%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe
span_no_fields/scoped   time:   [18.811 ns 18.818 ns 18.826 ns]
                        change: [+1.9002% +2.1274% +2.3444%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe
span_no_fields/scoped_recording
                        time:   [32.819 ns 32.844 ns 32.872 ns]
                        change: [-0.2256% -0.1680% -0.1081%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
span_no_fields/global   time:   [36.148 ns 36.190 ns 36.229 ns]
                        change: [+91.653% +91.930% +92.168%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild

     Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36)
span_repeated/none      time:   [730.43 ns 731.10 ns 731.79 ns]
                        change: [+0.8904% +1.1179% +1.3512%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
span_repeated/scoped    time:   [2.4945 µs 2.4980 µs 2.5017 µs]
                        change: [+3.4807% +3.6733% +3.8473%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
span_repeated/scoped_recording
                        time:   [4.9626 µs 4.9712 µs 4.9786 µs]
                        change: [-1.2378% -0.8863% -0.5529%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
span_repeated/global    time:   [3.8247 µs 3.8270 µs 3.8291 µs]
                        change: [+55.811% +55.944% +56.091%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
  ```
hawkw added a commit that referenced this pull request May 11, 2023
This backports more of the change from #1017 to `v0.1.x`. This change
makes cloning a `&'static` global dispatcher and a `None` dispatcher no
longer require an `Arc` clone. `None` dispatchers no longer allocate.

This undoes much of the performance regression from the previous commit:

```

     Running benches/baseline.rs (target/release/deps/baseline-9b70733ce49582d2)comparison/relaxed_load time:   [467.51 ps 468.35 ps 469.23 ps]
                        change: [-0.8492% -0.5683% -0.3377%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
comparison/acquire_load time:   [473.86 ps 476.62 ps 480.23 ps]
                        change: [+0.0428% +0.6687% +1.2841%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  8 (8.00%) high mild
  6 (6.00%) high severe
comparison/log          time:   [235.40 ps 236.35 ps 237.77 ps]
                        change: [-0.3095% +0.3407% +1.2059%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) low severe
  6 (6.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

     Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432)Dispatch::get_clone/none
                        time:   [8.3471 ns 8.3649 ns 8.3863 ns]
                        change: [-64.375% -64.301% -64.226%] (p = 0.00 < 0.05)
                        Performance has improved.
Dispatch::get_clone/scoped
                        time:   [17.054 ns 17.081 ns 17.108 ns]
                        change: [+53.653% +53.819% +53.991%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
Dispatch::get_clone/global
                        time:   [8.4810 ns 8.4853 ns 8.4901 ns]
                        change: [-64.773% -64.741% -64.710%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  15 (15.00%) high mild
  1 (1.00%) high severe

     Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87)Dispatch::get_ref/none  time:   [1.8803 ns 1.8826 ns 1.8853 ns]
                        change: [-86.888% -86.856% -86.812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
Dispatch::get_ref/scoped
                        time:   [3.8294 ns 3.8422 ns 3.8529 ns]
                        change: [-1.5952% -1.3861% -1.1447%] (p = 0.00 < 0.05)
                        Performance has improved.
Dispatch::get_ref/global
                        time:   [1.8859 ns 1.8901 ns 1.8936 ns]
                        change: [-87.025% -86.963% -86.907%] (p = 0.00 < 0.05)
                        Performance has improved.

     Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca)empty_span/none         time:   [234.18 ps 234.44 ps 234.68 ps]
                        change: [+1.4833% +1.6364% +1.7788%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
empty_span/scoped       time:   [235.19 ps 236.00 ps 236.71 ps]
                        change: [+0.5261% +0.8586% +1.1517%] (p = 0.00 < 0.05)
                        Change within noise threshold.
empty_span/global       time:   [235.04 ps 235.52 ps 236.15 ps]
                        change: [-0.0215% +0.7651% +2.1704%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
empty_span/baseline_struct
                        time:   [704.83 ps 705.28 ps 705.80 ps]
                        change: [+0.0246% +0.6522% +1.5423%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

     Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475)enter_span/none         time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-49.467% +1.5232% +101.49%] (p = 0.98 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
enter_span/scoped       time:   [3.7519 ns 3.7648 ns 3.7776 ns]
                        change: [+32.984% +33.297% +33.588%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild
enter_span/global       time:   [3.7399 ns 3.7532 ns 3.7687 ns]
                        change: [+33.878% +34.287% +34.719%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
  14 (14.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

     Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4)event/none              time:   [234.52 ps 234.95 ps 235.48 ps]
                        change: [-0.2638% +0.2782% +0.7705%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
event/scoped            time:   [8.8083 ns 8.8738 ns 8.9531 ns]
                        change: [-7.2455% -6.8819% -6.4900%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
event/scoped_recording  time:   [32.472 ns 33.287 ns 34.363 ns]
                        change: [-9.3516% -0.4827% +9.1329%] (p = 0.89 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
event/global            time:   [6.7528 ns 6.7610 ns 6.7701 ns]
                        change: [-70.199% -70.111% -70.013%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  2 (2.00%) high severe

     Running benches/shared.rs (target/release/deps/shared-9c2531bf46089869)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6)span_fields/none        time:   [3.6797 ns 3.6839 ns 3.6888 ns]
                        change: [+12.223% +12.489% +12.735%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
span_fields/scoped      time:   [32.995 ns 33.166 ns 33.349 ns]
                        change: [+13.805% +14.178% +14.526%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
span_fields/scoped_recording
                        time:   [280.94 ns 283.54 ns 286.81 ns]
                        change: [+4.7144% +5.4942% +6.4857%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
span_fields/global      time:   [29.370 ns 29.434 ns 29.505 ns]
                        change: [-27.391% -26.777% -26.346%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

     Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442)span_no_fields/none     time:   [1.6604 ns 1.6663 ns 1.6725 ns]
                        change: [+18.306% +18.708% +19.155%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
span_no_fields/scoped   time:   [19.079 ns 19.107 ns 19.132 ns]
                        change: [+0.5306% +0.7813% +1.0349%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
span_no_fields/scoped_recording
                        time:   [32.628 ns 32.667 ns 32.707 ns]
                        change: [-0.3009% -0.1227% +0.0411%] (p = 0.18 > 0.05)
                        No change in performance detected.
span_no_fields/global   time:   [13.029 ns 13.042 ns 13.055 ns]
                        change: [-64.133% -64.076% -64.014%] (p = 0.00 < 0.05)
                        Performance has improved.

     Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36)span_repeated/none      time:   [253.05 ns 256.87 ns 261.12 ns]
                        change: [-65.419% -65.179% -64.926%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe
span_repeated/scoped    time:   [2.2869 µs 2.2926 µs 2.2986 µs]
                        change: [-8.5393% -8.3402% -8.1049%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
span_repeated/scoped_recording
                        time:   [4.7143 µs 4.7286 µs 4.7451 µs]
                        change: [-5.0531% -4.7290% -4.3831%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
span_repeated/global    time:   [2.1195 µs 2.1224 µs 2.1253 µs]
                        change: [-44.708% -44.624% -44.550%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
  ```
hawkw added a commit that referenced this pull request May 11, 2023
…2593)

## Motivation

Currently, when a call to `dispatcher::get_default` occurs, `tracing`
will check the thread-local default dispatcher first. If a thread-local
scoped default is set, it is returned. Otherwise, the thread will then
check the global default. If a global default is present, it is then
cached in the thread local, so that subsequent calls do not need to
check the global default.

Unfortunately, this behavior results in issues if the scoped default is
accessed (e.g. using `get_default` or creating a new span) *prior* to a
global default being set. When `get_default` runs for the first time and
there is no global default, a `none` dispatcher is cached as the
thread-local default. This means that the thread will now behave as
though its default dispatcher is `None` until the scoped default is
overridden, even if a global default is then set later. This is quite
bad, and results in issues such as #2587, #2436, and #2411.

## Solution

This branch makes several changes to remove the use of the thread-local
caching of the global default dispatcher, and to lessen the performance
impact of doing so.

On the `master` (v0.2.x) branch, we track the number of scoped
dispatchers currently active, and use it to determine whether or not to
check thread-local storage at all. This optimization was introduced in
PR #1017. This branch backports a similar change to `v0.1.x`.

In addition, #1017 also changes the dispatcher module to represent a
`Dispatch` internally using an enum of either an `Arc` in the case where
the dispatcher is scoped, or a `&'static dyn Subscriber + Send + Sync`
reference when the dispatcher is the global default. This makes cloning
and constructing the global default cheaper, and also allows us to
change the `None` dispatcher into a static singleton. That means that
the use of a `None` dispatcher no longer requires an allocation and arc
reference bump, an issue which was previously resolved by locally
caching a `None` dispatcher. A side benefit of this change is that
*cloning* a `Dispatch` is substantially cheaper when the dispatcher is a
global default, as it's just an `&'static` reference and no `Arc` bump
is necessary. This will also make cloning a `Span` cheaper when the
global default dispatcher is in use.

Finally, because the overhead of getting the global default is
substantially reduced, we are able to change the scoped default
dispatcher's behavior to remove the caching entirely. This means that
the category of bugs involving the local cache becoming stale is
resolved entirely.

Fixes #2587
Fixes #2436
Fixes #2411
Closes #2592

## Performance Impact

This change results in a change in performance characteristics. Running
the benchmarks, we observe a significant improvement in performance in
most of the benchmarks that use the global default dispatcher, and a
noticeable decrease in performance for some benchmarks using the scoped
default. In my opinion, this performance change is acceptable, as a
global default dispatcher is the common case for most users, and is
generally expected to perform better than the scoped default. In
addition, resolving the variety of bugs that are caused by the local
caching of the default when using the scoped default dispatcher is worth
a performance cost when the scoped default is in use.

<details>
<summary>Benchmark results:</summary>

```
     Running benches/baseline.rs (target/release/deps/baseline-9b70733ce49582d2)
comparison/relaxed_load time:   [456.48 ps 456.55 ps 456.63 ps]
                        change: [+3.0281% +3.3135% +3.5664%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
comparison/acquire_load time:   [438.98 ps 439.32 ps 439.76 ps]
                        change: [-0.3725% -0.2092% -0.0614%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
comparison/log          time:   [227.05 ps 227.14 ps 227.27 ps]
                        change: [+3.1351% +3.2984% +3.4537%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
```
```
     Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432)
Dispatch::get_clone/none
                        time:   [8.3974 ns 8.4004 ns 8.4039 ns]
                        change: [-22.870% -22.796% -22.728%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
Dispatch::get_clone/scoped
                        time:   [15.877 ns 15.959 ns 16.045 ns]
                        change: [+52.358% +52.943% +53.500%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe
Dispatch::get_clone/global
                        time:   [8.3962 ns 8.4000 ns 8.4054 ns]
                        change: [-19.126% -18.961% -18.817%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) low severe
  6 (6.00%) high mild
  7 (7.00%) high severe
```
```
     Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87)
Dispatch::get_ref/none  time:   [1.7551 ns 1.7564 ns 1.7579 ns]
                        change: [-51.858% -51.749% -51.644%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
Dispatch::get_ref/scoped
                        time:   [3.6341 ns 3.6365 ns 3.6397 ns]
                        change: [-2.6892% -2.5955% -2.4968%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Dispatch::get_ref/global
                        time:   [1.7668 ns 1.7686 ns 1.7713 ns]
                        change: [-52.697% -52.647% -52.603%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
```
```
     Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca)
empty_span/none         time:   [227.02 ps 227.10 ps 227.20 ps]
                        change: [-0.1729% -0.0705% +0.0495%] (p = 0.24 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
empty_span/scoped       time:   [218.51 ps 218.69 ps 218.90 ps]
                        change: [-0.7582% -0.6056% -0.4630%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
empty_span/global       time:   [217.85 ps 218.15 ps 218.56 ps]
                        change: [-2.6528% -2.4341% -2.1602%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
empty_span/baseline_struct
                        time:   [655.54 ps 656.09 ps 656.76 ps]
                        change: [-1.6595% -1.4125% -1.1776%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
```
```
     Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475)
enter_span/none         time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-43.600% +6.5764% +109.38%] (p = 0.86 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe
enter_span/scoped       time:   [2.6513 ns 2.6567 ns 2.6641 ns]
                        change: [+0.3121% +1.9504% +3.4648%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
enter_span/global       time:   [3.2108 ns 3.2160 ns 3.2220 ns]
                        change: [+25.963% +26.742% +27.434%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
```
```
     Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4)
event/none              time:   [227.04 ps 227.18 ps 227.41 ps]
                        change: [-1.6751% -1.5743% -1.4711%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
event/scoped            time:   [8.3849 ns 8.4335 ns 8.4888 ns]
                        change: [-3.4754% -3.0252% -2.6092%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
event/scoped_recording  time:   [36.916 ns 37.022 ns 37.194 ns]
                        change: [+8.1054% +18.714% +30.381%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe
event/global            time:   [6.9694 ns 7.1677 ns 7.3469 ns]
                        change: [-23.407% -21.940% -20.398%] (p = 0.00 < 0.05)
                        Performance has improved.
```
```
     Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6)
span_fields/none        time:   [3.5936 ns 3.6008 ns 3.6106 ns]
                        change: [+17.160% +17.776% +18.413%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  8 (8.00%) high mild
  6 (6.00%) high severe
span_fields/scoped      time:   [33.751 ns 33.765 ns 33.779 ns]
                        change: [+22.689% +22.873% +23.037%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe
span_fields/scoped_recording
                        time:   [270.22 ns 270.55 ns 270.91 ns]
                        change: [+10.615% +10.827% +11.028%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
span_fields/global      time:   [28.337 ns 28.428 ns 28.527 ns]
                        change: [+3.0582% +3.3355% +3.6278%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high mild
```
```
     Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442)
span_no_fields/none     time:   [1.5467 ns 1.5507 ns 1.5553 ns]
                        change: [+12.966% +13.206% +13.434%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
span_no_fields/scoped   time:   [17.796 ns 17.810 ns 17.826 ns]
                        change: [+1.0381% +1.1673% +1.2914%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
span_no_fields/scoped_recording
                        time:   [30.397 ns 30.459 ns 30.524 ns]
                        change: [-0.8489% -0.6268% -0.3915%] (p = 0.00 < 0.05)
                        Change within noise threshold.
span_no_fields/global   time:   [12.747 ns 12.791 ns 12.844 ns]
                        change: [-27.930% -27.672% -27.386%] (p = 0.00 < 0.05)
                        Performance has improved.
```
```
     Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36)
span_repeated/none      time:   [699.28 ns 699.84 ns 700.53 ns]
                        change: [+2.4125% +2.6359% +2.8862%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
span_repeated/scoped    time:   [2.5029 µs 2.5057 µs 2.5090 µs]
                        change: [+4.5095% +4.6605% +4.8122%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
span_repeated/scoped_recording
                        time:   [5.0509 µs 5.0535 µs 5.0566 µs]
                        change: [+0.7346% +1.0724% +1.3718%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
span_repeated/global    time:   [2.1264 µs 2.1272 µs 2.1282 µs]
                        change: [-11.213% -11.119% -11.031%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
```

</details>
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
…okio-rs#2593)

## Motivation

Currently, when a call to `dispatcher::get_default` occurs, `tracing`
will check the thread-local default dispatcher first. If a thread-local
scoped default is set, it is returned. Otherwise, the thread will then
check the global default. If a global default is present, it is then
cached in the thread local, so that subsequent calls do not need to
check the global default.

Unfortunately, this behavior results in issues if the scoped default is
accessed (e.g. using `get_default` or creating a new span) *prior* to a
global default being set. When `get_default` runs for the first time and
there is no global default, a `none` dispatcher is cached as the
thread-local default. This means that the thread will now behave as
though its default dispatcher is `None` until the scoped default is
overridden, even if a global default is then set later. This is quite
bad, and results in issues such as tokio-rs#2587, tokio-rs#2436, and tokio-rs#2411.

## Solution

This branch makes several changes to remove the use of the thread-local
caching of the global default dispatcher, and to lessen the performance
impact of doing so.

On the `master` (v0.2.x) branch, we track the number of scoped
dispatchers currently active, and use it to determine whether or not to
check thread-local storage at all. This optimization was introduced in
PR tokio-rs#1017. This branch backports a similar change to `v0.1.x`.

In addition, tokio-rs#1017 also changes the dispatcher module to represent a
`Dispatch` internally using an enum of either an `Arc` in the case where
the dispatcher is scoped, or a `&'static dyn Subscriber + Send + Sync`
reference when the dispatcher is the global default. This makes cloning
and constructing the global default cheaper, and also allows us to
change the `None` dispatcher into a static singleton. That means that
the use of a `None` dispatcher no longer requires an allocation and arc
reference bump, an issue which was previously resolved by locally
caching a `None` dispatcher. A side benefit of this change is that
*cloning* a `Dispatch` is substantially cheaper when the dispatcher is a
global default, as it's just an `&'static` reference and no `Arc` bump
is necessary. This will also make cloning a `Span` cheaper when the
global default dispatcher is in use.

Finally, because the overhead of getting the global default is
substantially reduced, we are able to change the scoped default
dispatcher's behavior to remove the caching entirely. This means that
the category of bugs involving the local cache becoming stale is
resolved entirely.

Fixes tokio-rs#2587
Fixes tokio-rs#2436
Fixes tokio-rs#2411
Closes tokio-rs#2592

## Performance Impact

This change results in a change in performance characteristics. Running
the benchmarks, we observe a significant improvement in performance in
most of the benchmarks that use the global default dispatcher, and a
noticeable decrease in performance for some benchmarks using the scoped
default. In my opinion, this performance change is acceptable, as a
global default dispatcher is the common case for most users, and is
generally expected to perform better than the scoped default. In
addition, resolving the variety of bugs that are caused by the local
caching of the default when using the scoped default dispatcher is worth
a performance cost when the scoped default is in use.

<details>
<summary>Benchmark results:</summary>

```
     Running benches/baseline.rs (target/release/deps/baseline-9b70733ce49582d2)
comparison/relaxed_load time:   [456.48 ps 456.55 ps 456.63 ps]
                        change: [+3.0281% +3.3135% +3.5664%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
comparison/acquire_load time:   [438.98 ps 439.32 ps 439.76 ps]
                        change: [-0.3725% -0.2092% -0.0614%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
comparison/log          time:   [227.05 ps 227.14 ps 227.27 ps]
                        change: [+3.1351% +3.2984% +3.4537%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
```
```
     Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432)
Dispatch::get_clone/none
                        time:   [8.3974 ns 8.4004 ns 8.4039 ns]
                        change: [-22.870% -22.796% -22.728%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
Dispatch::get_clone/scoped
                        time:   [15.877 ns 15.959 ns 16.045 ns]
                        change: [+52.358% +52.943% +53.500%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) high mild
  14 (14.00%) high severe
Dispatch::get_clone/global
                        time:   [8.3962 ns 8.4000 ns 8.4054 ns]
                        change: [-19.126% -18.961% -18.817%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) low severe
  6 (6.00%) high mild
  7 (7.00%) high severe
```
```
     Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87)
Dispatch::get_ref/none  time:   [1.7551 ns 1.7564 ns 1.7579 ns]
                        change: [-51.858% -51.749% -51.644%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
Dispatch::get_ref/scoped
                        time:   [3.6341 ns 3.6365 ns 3.6397 ns]
                        change: [-2.6892% -2.5955% -2.4968%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Dispatch::get_ref/global
                        time:   [1.7668 ns 1.7686 ns 1.7713 ns]
                        change: [-52.697% -52.647% -52.603%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
```
```
     Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca)
empty_span/none         time:   [227.02 ps 227.10 ps 227.20 ps]
                        change: [-0.1729% -0.0705% +0.0495%] (p = 0.24 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
empty_span/scoped       time:   [218.51 ps 218.69 ps 218.90 ps]
                        change: [-0.7582% -0.6056% -0.4630%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
empty_span/global       time:   [217.85 ps 218.15 ps 218.56 ps]
                        change: [-2.6528% -2.4341% -2.1602%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
empty_span/baseline_struct
                        time:   [655.54 ps 656.09 ps 656.76 ps]
                        change: [-1.6595% -1.4125% -1.1776%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
```
```
     Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475)
enter_span/none         time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-43.600% +6.5764% +109.38%] (p = 0.86 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe
enter_span/scoped       time:   [2.6513 ns 2.6567 ns 2.6641 ns]
                        change: [+0.3121% +1.9504% +3.4648%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
enter_span/global       time:   [3.2108 ns 3.2160 ns 3.2220 ns]
                        change: [+25.963% +26.742% +27.434%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
```
```
     Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4)
event/none              time:   [227.04 ps 227.18 ps 227.41 ps]
                        change: [-1.6751% -1.5743% -1.4711%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
event/scoped            time:   [8.3849 ns 8.4335 ns 8.4888 ns]
                        change: [-3.4754% -3.0252% -2.6092%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
event/scoped_recording  time:   [36.916 ns 37.022 ns 37.194 ns]
                        change: [+8.1054% +18.714% +30.381%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe
event/global            time:   [6.9694 ns 7.1677 ns 7.3469 ns]
                        change: [-23.407% -21.940% -20.398%] (p = 0.00 < 0.05)
                        Performance has improved.
```
```
     Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6)
span_fields/none        time:   [3.5936 ns 3.6008 ns 3.6106 ns]
                        change: [+17.160% +17.776% +18.413%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  8 (8.00%) high mild
  6 (6.00%) high severe
span_fields/scoped      time:   [33.751 ns 33.765 ns 33.779 ns]
                        change: [+22.689% +22.873% +23.037%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe
span_fields/scoped_recording
                        time:   [270.22 ns 270.55 ns 270.91 ns]
                        change: [+10.615% +10.827% +11.028%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
span_fields/global      time:   [28.337 ns 28.428 ns 28.527 ns]
                        change: [+3.0582% +3.3355% +3.6278%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high mild
```
```
     Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442)
span_no_fields/none     time:   [1.5467 ns 1.5507 ns 1.5553 ns]
                        change: [+12.966% +13.206% +13.434%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
span_no_fields/scoped   time:   [17.796 ns 17.810 ns 17.826 ns]
                        change: [+1.0381% +1.1673% +1.2914%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
span_no_fields/scoped_recording
                        time:   [30.397 ns 30.459 ns 30.524 ns]
                        change: [-0.8489% -0.6268% -0.3915%] (p = 0.00 < 0.05)
                        Change within noise threshold.
span_no_fields/global   time:   [12.747 ns 12.791 ns 12.844 ns]
                        change: [-27.930% -27.672% -27.386%] (p = 0.00 < 0.05)
                        Performance has improved.
```
```
     Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36)
span_repeated/none      time:   [699.28 ns 699.84 ns 700.53 ns]
                        change: [+2.4125% +2.6359% +2.8862%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
span_repeated/scoped    time:   [2.5029 µs 2.5057 µs 2.5090 µs]
                        change: [+4.5095% +4.6605% +4.8122%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
span_repeated/scoped_recording
                        time:   [5.0509 µs 5.0535 µs 5.0566 µs]
                        change: [+0.7346% +1.0724% +1.3718%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
span_repeated/global    time:   [2.1264 µs 2.1272 µs 2.1282 µs]
                        change: [-11.213% -11.119% -11.031%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
```

</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/no-std Related to improving support for no-std. crate/core Related to the `tracing-core` crate crate/tracing Related to the `tracing` crate kind/feature New feature or request meta/breaking This is a breaking change, and should wait until the next breaking release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core: avoid scoped dispatch overhead when scoped dispatchers are not in use
4 participants