Skip to content

Latest commit

 

History

History
391 lines (270 loc) · 13.1 KB

CHANGELOG.md

File metadata and controls

391 lines (270 loc) · 13.1 KB

Changelog crates.io

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

0.1.17 - 2024-12-04

Changed

  • Set MSRV to 1.80 for LazyLock and new size_of prelude import.

  • Reduced thread pool memory usage by many kilobytes by using rendezvous channels instead of array-based channels.

0.1.16 - 2024-11-25

Added

  • Thread pool for reusing threads across multi-threaded benchmarks. The result is that when running Divan benchmarks under a sampling profiler, the profiler's output will be cleaner and easier to understand. (#37)

  • Track the maximum number of allocations during a benchmark.

Changed

  • Make private Arg::get trait method not take self, so that text editors don't recommend using it. (#59)

  • Cache BenchOptions using LazyLock instead of OnceLock, saving space and simplifying the implementation.

0.1.15 - 2024-10-31

Added

  • CyclesCount counter to display cycle throughput as Hertz.

  • Track the maximum number of bytes allocated during a benchmark.

Removed

  • Remove has_cpuid polyfill due to it no longer being planned for Rust, since CPUID is assumed to be available on all old x86 Rust targets.

Fixed

  • List generic benchmark type parameter A<4> before A<32>. (#64)

  • Improve precision by using f64 when calculating allocation count and sizes for the median samples.

  • Multi-thread allocation counting in sum_alloc_tallies on macOS was loading a null pointer instead of the pointer initialized by sync_threads.

Changes

  • Sort all output benchmark names naturally instead of lexicographically.

  • Internally reuse &[&str] slice for args names.

  • Subtract overhead of AllocProfiler from timings. Now that Divan also tracks the maximum bytes allocated, the overhead was apparent in timings.

  • Simplify ThreadAllocInfo::clear.

  • Move measured loop overhead from SharedContext to global OnceLock.

  • Macros no longer rely on std being re-exported by Divan. Instead they use ::std or ::core to greatly simplify code. Although this is technically a breaking change, it is extremely unlikely to do extern crate std as x.

0.1.14 - 2024-02-17

Fixed

Changes

0.1.13 - 2024-02-09

Fixed

  • Missing update to divan-macros dependency.

0.1.12 - 2024-02-09

Added

  • Display args option values with Debug instead if ToString is not implemented.

    This makes it simple to use enums with derived Debug:

    #[derive(Debug)]
    enum Arg { A, B }
    
    #[divan::bench(args = [Arg::A, Arg::B])]
    fn bench_args(arg: &Arg) {
        ...
    }
  • Documentation of when to use black_box in benchmarks.

0.1.11 - 2024-01-20

Fixed

  • Sorting negative args numbers.

0.1.10 - 2024-01-20

Fixed

0.1.9 - 2024-01-20

Added

  • args option for providing runtime arguments to benchmarks:

    #[divan::bench(args = [1, 2, 3])]
    fn args_list(arg: usize) { ... }
    
    #[divan::bench(args = 1..=3)]
    fn args_range(arg: usize) { ... }
    
    const ARGS: &[usize] = [1, 2, 3];
    
    #[divan::bench(args = ARGS)]
    fn args_const(arg: usize) { ... }

    This option may be preferred over the similar consts option because:

    • It is compatible with more types, only requiring that the argument type implements Any, Copy, Send, Sync, and ToString. Copy is not needed if the argument is used through a reference.
    • It does not increase compile times, unlike consts which needs to generate new code for each constant used.

0.1.8 - 2023-12-19

Changes

  • Reduce AllocProfiler footprint from 6-10ns to 1-2ns:

    • Thread-local values are now exclusively owned by their threads and are no longer kept in a global list. This enables some optimizations:

      • Performing faster unsynchronized arithmetic.

      • Removing one level of pointer indirection by storing the thread-local value entirely inline in thread_local!, rather than storing a pointer to a globally-shared instance.

      • Compiler emits SIMD arithmetic for x86_64 using paddq.

    • Improved thread-local lookup on x86_64 macOS by using a static lookup key instead of a dynamic key from pthread_key_create. Key 11 is used because it is reserved for Windows.

      The dyn_thread_local crate feature disables this optimization. This is recommended if your code or another dependency uses the same static key.

Fixed

  • Remove unused allocations if AllocProfiler is not active as the global allocator.

0.1.7 - 2023-12-13

Changes

  • Improve AllocProfiler implementation documentation.

  • Limit AllocProfiler mean count outputs to 4 significant digits to not be very wide and for consistency with other outputs.

0.1.6 - 2023-12-13

Added

  • AllocProfiler allocator that tracks allocation counts and sizes during benchmarks.

0.1.5 - 2023-12-05

Added

  • black_box_drop convenience function for black_box + drop. This is useful when benchmarking a lazy Iterator to completion with for_each:

    #[divan::bench]
    fn parse_iter() {
        let input: &str = // ...
    
        Parser::new(input)
            .for_each(divan::black_box_drop);
    }

0.1.4 - 2023-12-02

Added

  • From implementations for counters on references to u8u64 and usize, such as From<&u64> and From<&&u64>. This allows for doing:

    bencher
        .with_inputs(|| { ... })
        .input_counter(ItemsCount::from)
        .bench_values(|n| { ... });
  • Bencher::count_inputs_as<C> method to convert inputs to a Counter:

    bencher
        .with_inputs(|| -> usize {
            // ...
        })
        .count_inputs_as::<ItemsCount>()
        .bench_values(|n| -> Vec<usize> {
            (0..n).collect()
        });

0.1.3 - 2023-11-21

Added

Removed

  • black_box inside benchmark loop when deferring Drop of outputs. This is now done after the loop.

  • linkme dependency in favor of pre-main to register benchmarks and benchmark groups. This is generally be more portable and reliable.

Changed

  • Now calling black_box at the end of the benchmark loop when deferring use of inputs or Drop of outputs.

0.1.2 - 2023-10-28

Fixed

  • Multi-threaded benchmarks being spread across CPUs, instead of pinning the main thread to CPU 0 and having all threads inherit the main thread's affinity.

0.1.1 - 2023-10-25

Fixed

  • Fix using LLD as linker for Linux by using the same pre-main approach as Windows.

0.1.0 - 2023-10-04

Initial release. See blog post.