All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
0.1.17 - 2024-12-04
-
Set MSRV to 1.80 for
LazyLock
and newsize_of
prelude import. -
Reduced thread pool memory usage by many kilobytes by using rendezvous channels instead of array-based channels.
0.1.16 - 2024-11-25
-
Thread pool for reusing threads across multi-threaded benchmarks. The result is that when running Divan benchmarks under a sampling profiler, the profiler's output will be cleaner and easier to understand. (#37)
-
Track the maximum number of allocations during a benchmark.
-
Make private
Arg::get
trait method not takeself
, so that text editors don't recommend using it. (#59) -
Cache
BenchOptions
usingLazyLock
instead ofOnceLock
, saving space and simplifying the implementation.
0.1.15 - 2024-10-31
-
CyclesCount
counter to display cycle throughput as Hertz. -
Track the maximum number of bytes allocated during a benchmark.
- Remove
has_cpuid
polyfill due to it no longer being planned for Rust, since CPUID is assumed to be available on all old x86 Rust targets.
-
List generic benchmark type parameter
A<4>
beforeA<32>
. (#64) -
Improve precision by using
f64
when calculating allocation count and sizes for the median samples. -
Multi-thread allocation counting in
sum_alloc_tallies
on macOS was loading a null pointer instead of the pointer initialized bysync_threads
.
-
Sort all output benchmark names naturally instead of lexicographically.
-
Internally reuse
&[&str]
slice forargs
names. -
Subtract overhead of
AllocProfiler
from timings. Now that Divan also tracks the maximum bytes allocated, the overhead was apparent in timings. -
Simplify
ThreadAllocInfo::clear
. -
Move measured loop overhead from
SharedContext
to globalOnceLock
. -
Macros no longer rely on
std
being re-exported by Divan. Instead they use::std
or::core
to greatly simplify code. Although this is technically a breaking change, it is extremely unlikely to doextern crate std as x
.
0.1.14 - 2024-02-17
- Set correct field in
Divan::max_time
. (#45)
-
Define
BytesCount::of_iter
in terms ofBytesCount::of_many
.
0.1.13 - 2024-02-09
- Missing update to
divan-macros
dependency.
0.1.12 - 2024-02-09
-
Display
args
option values withDebug
instead ifToString
is not implemented.This makes it simple to use enums with derived
Debug
:#[derive(Debug)] enum Arg { A, B } #[divan::bench(args = [Arg::A, Arg::B])] fn bench_args(arg: &Arg) { ... }
-
Documentation of when to use
black_box
in benchmarks.
0.1.11 - 2024-01-20
- Sorting negative
args
numbers.
0.1.10 - 2024-01-20
0.1.9 - 2024-01-20
-
args
option for providing runtime arguments to benchmarks:#[divan::bench(args = [1, 2, 3])] fn args_list(arg: usize) { ... } #[divan::bench(args = 1..=3)] fn args_range(arg: usize) { ... } const ARGS: &[usize] = [1, 2, 3]; #[divan::bench(args = ARGS)] fn args_const(arg: usize) { ... }
This option may be preferred over the similar
consts
option because:
0.1.8 - 2023-12-19
-
Reduce
AllocProfiler
footprint from 6-10ns to 1-2ns:-
Thread-local values are now exclusively owned by their threads and are no longer kept in a global list. This enables some optimizations:
-
Performing faster unsynchronized arithmetic.
-
Removing one level of pointer indirection by storing the thread-local value entirely inline in
thread_local!
, rather than storing a pointer to a globally-shared instance. -
Compiler emits SIMD arithmetic for x86_64 using
paddq
.
-
-
Improved thread-local lookup on x86_64 macOS by using a static lookup key instead of a dynamic key from
pthread_key_create
. Key 11 is used because it is reserved for Windows.The
dyn_thread_local
crate feature disables this optimization. This is recommended if your code or another dependency uses the same static key.
-
- Remove unused allocations if
AllocProfiler
is not active as the global allocator.
0.1.7 - 2023-12-13
-
Improve
AllocProfiler
implementation documentation. -
Limit
AllocProfiler
mean count outputs to 4 significant digits to not be very wide and for consistency with other outputs.
0.1.6 - 2023-12-13
AllocProfiler
allocator that tracks allocation counts and sizes during benchmarks.
0.1.5 - 2023-12-05
-
black_box_drop
convenience function forblack_box
+drop
. This is useful when benchmarking a lazyIterator
to completion withfor_each
:#[divan::bench] fn parse_iter() { let input: &str = // ... Parser::new(input) .for_each(divan::black_box_drop); }
0.1.4 - 2023-12-02
-
From
implementations for counters on references tou8
–u64
andusize
, such asFrom<&u64>
andFrom<&&u64>
. This allows for doing:bencher .with_inputs(|| { ... }) .input_counter(ItemsCount::from) .bench_values(|n| { ... });
-
Bencher::count_inputs_as<C>
method to convert inputs to aCounter
:bencher .with_inputs(|| -> usize { // ... }) .count_inputs_as::<ItemsCount>() .bench_values(|n| -> Vec<usize> { (0..n).collect() });
0.1.3 - 2023-11-21
-
Convenience shorthand options for
#[divan::bench]
and#[divan::bench_group]
counters:bytes_count
forcounter = BytesCount::from(n)
chars_count
forcounter = CharsCount::from(n)
items_count
forcounter = ItemsCount::from(n)
-
Support for NetBSD, DragonFly BSD, and Haiku OS by using pre-
main
. -
Set global thread counts using:
Divan::threads
--threads A B C...
CLI argDIVAN_THREADS=A,B,C
env var
The following example will benchmark across 2, 4, and available parallelism thread counts:
DIVAN_THREADS=0,2,4 cargo bench -q -p examples --bench atomic
-
Set global
Counter
s at runtime using:Divan::counter
Divan::items_count
Divan::bytes_count
Divan::chars_count
--items-count N
CLI arg--bytes-count N
CLI arg--chars-count N
CLI argDIVAN_ITEMS_COUNT=N
env varDIVAN_BYTES_COUNT=N
env varDIVAN_CHARS_COUNT=N
env var
-
From<C>
forItemsCount
,BytesCount
, andCharsCount
whereC
isu8
–u64
orusize
(viaCountUInt
internally). This provides an alternative to thenew
constructor. -
BytesCount::of_many
method similar toBytesCount::of
, but with a parameter by which to multiply the size of the type. -
BytesCount::u64
,BytesCount::f64
, and similar methods based onBytesCount::of_many
.
-
black_box
inside benchmark loop when deferringDrop
of outputs. This is now done after the loop. -
linkme
dependency in favor of pre-main
to register benchmarks and benchmark groups. This is generally be more portable and reliable.
- Now calling
black_box
at the end of the benchmark loop when deferring use of inputs orDrop
of outputs.
0.1.2 - 2023-10-28
- Multi-threaded benchmarks being spread across CPUs, instead of pinning the main thread to CPU 0 and having all threads inherit the main thread's affinity.
0.1.1 - 2023-10-25
- Fix using LLD as linker for Linux by using the same pre-
main
approach as Windows.
Initial release. See blog post.