Skip to content
This repository has been archived by the owner on Nov 27, 2022. It is now read-only.

How results will be updated? #13

Open
leudz opened this issue Sep 7, 2020 · 4 comments
Open

How results will be updated? #13

leudz opened this issue Sep 7, 2020 · 4 comments

Comments

@leudz
Copy link
Collaborator

leudz commented Sep 7, 2020

For now, AFAIK, the results come from TomGillen's computer.
How are we going to update them?

A few ideas, don't hesitate to propose more:

  1. Have someone do it, always the same person on the same computer
    • pro: the results are very "stable"
    • con: we rely on a single person and this person has to run the benchmarks sometimes
  2. Anyone can do it, as long as all results are updated
    • pro: easy to update
    • con: the results can go up and down
  3. Delegate to a CI
    • pro: everything is automatic so every push will update the results
    • con: we don't control the environment

Regardless of what we choose, I think the specs of the computer that ran the benchmarks should be available fairly easily, maybe in the readme itself.

@colelawrence
Copy link

It could be nice to measure performance through instruction counting, which is what https://perf.rust-lang.org/ has as their default measurement.

I'm not exactly sure how it is set up, but if I understand correctly, it can lead to a lot smaller error threshold between CI runs.

@colelawrence
Copy link

I had a bit of down time to play with this, and I ran cargo bench on my local machine to see the following results:

Note that I have a 12 core machine, so not super realistic to average users, but it does show a very different story than the benches from 22 days.

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/benchmarks-d2e805252823feec
Gnuplot not found, using plotters backend
simple_insert/legion    time:   [293.59 us 295.83 us 298.20 us]
                        change: [-32.876% -32.034% -31.134%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
simple_insert/legion_0.2.4
                        time:   [575.17 us 577.32 us 579.81 us]
                        change: [-46.035% -45.616% -45.249%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
simple_insert/bevy      time:   [764.58 us 770.45 us 776.23 us]
                        change: [-11.959% -11.503% -10.991%] (p = 0.00 < 0.05)
                        Performance has improved.
simple_insert/hecs      time:   [551.06 us 554.85 us 559.23 us]
                        change: [-12.856% -11.456% -10.117%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild
simple_insert/shipyard  time:   [2.0161 ms 2.0241 ms 2.0326 ms]
                        change: [-19.474% -18.745% -18.056%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking simple_insert/specs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
simple_insert/specs     time:   [1.4692 ms 1.4747 ms 1.4809 ms]
                        change: [-36.297% -35.899% -35.454%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

simple_iter/legion      time:   [10.937 us 10.948 us 10.960 us]
                        change: [-18.607% -18.333% -18.079%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
simple_iter/legion (packed)
                        time:   [10.909 us 10.923 us 10.938 us]
                        change: [-20.629% -20.169% -19.739%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
simple_iter/legion_0.2.4
                        time:   [9.7724 us 9.7930 us 9.8131 us]
                        change: [-27.685% -27.248% -26.854%] (p = 0.00 < 0.05)
                        Performance has improved.
simple_iter/bevy        time:   [9.9188 us 9.9376 us 9.9556 us]
                        change: [-31.195% -30.889% -30.598%] (p = 0.00 < 0.05)
                        Performance has improved.
simple_iter/hecs        time:   [21.538 us 21.567 us 21.597 us]
                        change: [-19.596% -19.308% -19.040%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
simple_iter/shipyard    time:   [72.453 us 74.302 us 76.250 us]
                        change: [-32.402% -31.321% -30.208%] (p = 0.00 < 0.05)
                        Performance has improved.
simple_iter/shipyard (packed)
                        time:   [23.034 us 23.399 us 23.790 us]
                        change: [-49.579% -49.053% -48.467%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  21 (21.00%) high severe
simple_iter/specs       time:   [33.098 us 34.259 us 35.691 us]
                        change: [-1.5233% +1.3415% +5.1496%] (p = 0.44 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  5 (5.00%) high mild
  15 (15.00%) high severe

fragmented_iter/legion  time:   [381.71 ns 382.83 ns 384.04 ns]
                        change: [-24.937% -24.698% -24.468%] (p = 0.00 < 0.05)
                        Performance has improved.
fragmented_iter/legion_0.2.4
                        time:   [1.0956 us 1.1016 us 1.1075 us]
                        change: [-38.667% -38.172% -37.722%] (p = 0.00 < 0.05)
                        Performance has improved.
fragmented_iter/bevy    time:   [1.0652 us 1.0679 us 1.0708 us]
                        change: [-40.105% -39.777% -39.451%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
fragmented_iter/hecs    time:   [1.0912 us 1.0928 us 1.0945 us]
                        change: [-39.445% -39.179% -38.961%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
fragmented_iter/shipyard
                        time:   [106.48 ns 106.64 ns 106.79 ns]
                        change: [-89.752% -89.710% -89.669%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
fragmented_iter/specs   time:   [1.6610 us 1.6633 us 1.6656 us]
                        change: [-1.5877% -1.2251% -0.8472%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

schedule/legion         time:   [235.46 us 242.47 us 249.66 us]
                        change: [+350.18% +360.77% +372.19%] (p = 0.00 < 0.05)
                        Performance has regressed.
schedule/legion (packed)
                        time:   [232.07 us 238.43 us 244.94 us]
                        change: [+324.11% +332.95% +342.10%] (p = 0.00 < 0.05)
                        Performance has regressed.
schedule/legion_0.2.4   time:   [259.52 us 262.94 us 266.13 us]
                        change: [+76.007% +77.912% +79.836%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
schedule/bevy           time:   [64.144 us 64.264 us 64.386 us]
                        change: [-32.415% -32.187% -31.959%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
schedule/shipyard       time:   [359.33 us 360.12 us 360.95 us]
                        change: [-37.786% -37.335% -36.842%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
schedule/shipyard (packed)
                        time:   [112.45 us 113.07 us 113.76 us]
                        change: [-63.112% -62.784% -62.495%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
schedule/specs          time:   [164.39 us 166.06 us 167.93 us]
                        change: [-32.313% -31.743% -31.131%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  19 (19.00%) high mild

heavy_compute/legion    time:   [333.54 us 337.30 us 341.68 us]
                        change: [-51.561% -50.821% -50.034%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
heavy_compute/legion (packed)
                        time:   [319.12 us 320.77 us 322.64 us]
                        change: [-55.677% -55.065% -54.456%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
heavy_compute/legion_0.2.4
                        time:   [3.1970 ms 3.2008 ms 3.2046 ms]
                        change: [-26.487% -26.207% -25.935%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
heavy_compute/bevy      time:   [629.79 us 634.51 us 639.19 us]
                        change: [-41.146% -40.584% -40.061%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
heavy_compute/hecs      time:   [585.76 us 588.70 us 591.76 us]
                        change: [-42.162% -41.767% -41.393%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
heavy_compute/shipyard  time:   [329.43 us 334.53 us 339.54 us]
                        change: [-57.275% -56.615% -55.946%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
heavy_compute/shipyard (packed)
                        time:   [337.05 us 343.24 us 350.40 us]
                        change: [-51.426% -50.129% -48.671%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
heavy_compute/specs     time:   [532.54 us 538.90 us 545.14 us]
                        change: [-47.185% -46.611% -46.091%] (p = 0.00 < 0.05)
                        Performance has improved.

add_remove_component/legion
                        time:   [3.7019 ms 3.7096 ms 3.7178 ms]
                        change: [-33.158% -32.596% -32.108%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
add_remove_component/legion_0.2.4
                        time:   [2.2386 ms 2.2447 ms 2.2510 ms]
                        change: [-27.178% -26.937% -26.682%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
add_remove_component/hecs
                        time:   [7.3318 ms 7.3804 ms 7.4305 ms]
                        change: [-59.699% -59.438% -59.152%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
add_remove_component/shipyard
                        time:   [207.23 us 207.65 us 208.09 us]
                        change: [-92.821% -92.787% -92.756%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
add_remove_component/specs
                        time:   [97.710 us 98.436 us 99.291 us]
                        change: [-34.454% -33.923% -33.386%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

serialize_text/legion   time:   [12.175 ms 12.200 ms 12.225 ms]
                        change: [-33.020% -31.672% -30.402%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

serialize_binary/legion time:   [4.5189 ms 4.5239 ms 4.5289 ms]
                        change: [-30.514% -29.507% -28.613%] (p = 0.00 < 0.05)
                        Performance has improved.

@TomGillen
Copy link
Collaborator

It would be nice to at the very least automate the generation of the results table in the readme.

For reference, the current results were run on an i7 5820K EE with 16GB @ 3200MHz. My laptop with a i5-6360U, while significantly slower, shows similar relative performance. @colelawrence's numbers are for the most part not that surprising on hardware that appears to be ~40% faster, although with some very notable exceptions. schedule/legion running 360% slower on faster hardware is a total mystery. fragmented_iter/shipyard running 90% faster actually brings it more in line with where I had expected shipyard to perform in a test specifically designed to torture archetypal ECS's and highlight the strengths of sparse-array based ECS libraries. Even so, it gaining such a larger speedup than the other libraries is curious.

I wonder if each library is receiving significantly different benefits with changes in CPU speed vs memory speed vs cache size?

Capturing results across a variety of hardware would obviously be interesting and probably useful, although logistically challenging.

@colelawrence
Copy link

Couldn't we use a github action to run cargo bench on master? At the very least that could give us a starting point to seeing how library version bumps affect performance.

The most urgent thing I see is to simply re-run the benches and update the numbers with @leudz 's recent PR. Otherwise, the readme will continue to misrepresent the performance of shipyard.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants