-
Notifications
You must be signed in to change notification settings - Fork 33
How results will be updated? #13
Comments
It could be nice to measure performance through instruction counting, which is what https://perf.rust-lang.org/ has as their default measurement. I'm not exactly sure how it is set up, but if I understand correctly, it can lead to a lot smaller error threshold between CI runs. |
I had a bit of down time to play with this, and I ran Note that I have a 12 core machine, so not super realistic to average users, but it does show a very different story than the benches from 22 days. running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/release/deps/benchmarks-d2e805252823feec
Gnuplot not found, using plotters backend
simple_insert/legion time: [293.59 us 295.83 us 298.20 us]
change: [-32.876% -32.034% -31.134%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
simple_insert/legion_0.2.4
time: [575.17 us 577.32 us 579.81 us]
change: [-46.035% -45.616% -45.249%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
simple_insert/bevy time: [764.58 us 770.45 us 776.23 us]
change: [-11.959% -11.503% -10.991%] (p = 0.00 < 0.05)
Performance has improved.
simple_insert/hecs time: [551.06 us 554.85 us 559.23 us]
change: [-12.856% -11.456% -10.117%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild
simple_insert/shipyard time: [2.0161 ms 2.0241 ms 2.0326 ms]
change: [-19.474% -18.745% -18.056%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
Benchmarking simple_insert/specs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
simple_insert/specs time: [1.4692 ms 1.4747 ms 1.4809 ms]
change: [-36.297% -35.899% -35.454%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
simple_iter/legion time: [10.937 us 10.948 us 10.960 us]
change: [-18.607% -18.333% -18.079%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
simple_iter/legion (packed)
time: [10.909 us 10.923 us 10.938 us]
change: [-20.629% -20.169% -19.739%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
simple_iter/legion_0.2.4
time: [9.7724 us 9.7930 us 9.8131 us]
change: [-27.685% -27.248% -26.854%] (p = 0.00 < 0.05)
Performance has improved.
simple_iter/bevy time: [9.9188 us 9.9376 us 9.9556 us]
change: [-31.195% -30.889% -30.598%] (p = 0.00 < 0.05)
Performance has improved.
simple_iter/hecs time: [21.538 us 21.567 us 21.597 us]
change: [-19.596% -19.308% -19.040%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
simple_iter/shipyard time: [72.453 us 74.302 us 76.250 us]
change: [-32.402% -31.321% -30.208%] (p = 0.00 < 0.05)
Performance has improved.
simple_iter/shipyard (packed)
time: [23.034 us 23.399 us 23.790 us]
change: [-49.579% -49.053% -48.467%] (p = 0.00 < 0.05)
Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
21 (21.00%) high severe
simple_iter/specs time: [33.098 us 34.259 us 35.691 us]
change: [-1.5233% +1.3415% +5.1496%] (p = 0.44 > 0.05)
No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
5 (5.00%) high mild
15 (15.00%) high severe
fragmented_iter/legion time: [381.71 ns 382.83 ns 384.04 ns]
change: [-24.937% -24.698% -24.468%] (p = 0.00 < 0.05)
Performance has improved.
fragmented_iter/legion_0.2.4
time: [1.0956 us 1.1016 us 1.1075 us]
change: [-38.667% -38.172% -37.722%] (p = 0.00 < 0.05)
Performance has improved.
fragmented_iter/bevy time: [1.0652 us 1.0679 us 1.0708 us]
change: [-40.105% -39.777% -39.451%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
fragmented_iter/hecs time: [1.0912 us 1.0928 us 1.0945 us]
change: [-39.445% -39.179% -38.961%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
fragmented_iter/shipyard
time: [106.48 ns 106.64 ns 106.79 ns]
change: [-89.752% -89.710% -89.669%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
fragmented_iter/specs time: [1.6610 us 1.6633 us 1.6656 us]
change: [-1.5877% -1.2251% -0.8472%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
schedule/legion time: [235.46 us 242.47 us 249.66 us]
change: [+350.18% +360.77% +372.19%] (p = 0.00 < 0.05)
Performance has regressed.
schedule/legion (packed)
time: [232.07 us 238.43 us 244.94 us]
change: [+324.11% +332.95% +342.10%] (p = 0.00 < 0.05)
Performance has regressed.
schedule/legion_0.2.4 time: [259.52 us 262.94 us 266.13 us]
change: [+76.007% +77.912% +79.836%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
schedule/bevy time: [64.144 us 64.264 us 64.386 us]
change: [-32.415% -32.187% -31.959%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
schedule/shipyard time: [359.33 us 360.12 us 360.95 us]
change: [-37.786% -37.335% -36.842%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
schedule/shipyard (packed)
time: [112.45 us 113.07 us 113.76 us]
change: [-63.112% -62.784% -62.495%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
schedule/specs time: [164.39 us 166.06 us 167.93 us]
change: [-32.313% -31.743% -31.131%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
19 (19.00%) high mild
heavy_compute/legion time: [333.54 us 337.30 us 341.68 us]
change: [-51.561% -50.821% -50.034%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
heavy_compute/legion (packed)
time: [319.12 us 320.77 us 322.64 us]
change: [-55.677% -55.065% -54.456%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
heavy_compute/legion_0.2.4
time: [3.1970 ms 3.2008 ms 3.2046 ms]
change: [-26.487% -26.207% -25.935%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
5 (5.00%) high mild
heavy_compute/bevy time: [629.79 us 634.51 us 639.19 us]
change: [-41.146% -40.584% -40.061%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
heavy_compute/hecs time: [585.76 us 588.70 us 591.76 us]
change: [-42.162% -41.767% -41.393%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
heavy_compute/shipyard time: [329.43 us 334.53 us 339.54 us]
change: [-57.275% -56.615% -55.946%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
heavy_compute/shipyard (packed)
time: [337.05 us 343.24 us 350.40 us]
change: [-51.426% -50.129% -48.671%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
heavy_compute/specs time: [532.54 us 538.90 us 545.14 us]
change: [-47.185% -46.611% -46.091%] (p = 0.00 < 0.05)
Performance has improved.
add_remove_component/legion
time: [3.7019 ms 3.7096 ms 3.7178 ms]
change: [-33.158% -32.596% -32.108%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
add_remove_component/legion_0.2.4
time: [2.2386 ms 2.2447 ms 2.2510 ms]
change: [-27.178% -26.937% -26.682%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
add_remove_component/hecs
time: [7.3318 ms 7.3804 ms 7.4305 ms]
change: [-59.699% -59.438% -59.152%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
add_remove_component/shipyard
time: [207.23 us 207.65 us 208.09 us]
change: [-92.821% -92.787% -92.756%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
add_remove_component/specs
time: [97.710 us 98.436 us 99.291 us]
change: [-34.454% -33.923% -33.386%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
serialize_text/legion time: [12.175 ms 12.200 ms 12.225 ms]
change: [-33.020% -31.672% -30.402%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
serialize_binary/legion time: [4.5189 ms 4.5239 ms 4.5289 ms]
change: [-30.514% -29.507% -28.613%] (p = 0.00 < 0.05)
Performance has improved. |
It would be nice to at the very least automate the generation of the results table in the readme. For reference, the current results were run on an i7 5820K EE with 16GB @ 3200MHz. My laptop with a i5-6360U, while significantly slower, shows similar relative performance. @colelawrence's numbers are for the most part not that surprising on hardware that appears to be ~40% faster, although with some very notable exceptions. schedule/legion running 360% slower on faster hardware is a total mystery. fragmented_iter/shipyard running 90% faster actually brings it more in line with where I had expected shipyard to perform in a test specifically designed to torture archetypal ECS's and highlight the strengths of sparse-array based ECS libraries. Even so, it gaining such a larger speedup than the other libraries is curious. I wonder if each library is receiving significantly different benefits with changes in CPU speed vs memory speed vs cache size? Capturing results across a variety of hardware would obviously be interesting and probably useful, although logistically challenging. |
Couldn't we use a github action to run The most urgent thing I see is to simply re-run the benches and update the numbers with @leudz 's recent PR. Otherwise, the readme will continue to misrepresent the performance of shipyard. |
For now, AFAIK, the results come from TomGillen's computer.
How are we going to update them?
A few ideas, don't hesitate to propose more:
Regardless of what we choose, I think the specs of the computer that ran the benchmarks should be available fairly easily, maybe in the readme itself.
The text was updated successfully, but these errors were encountered: