Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX2 Support for key-value sorting #145

Merged
merged 5 commits into from
May 6, 2024
Merged

Conversation

sterrettm2
Copy link
Contributor

This patch adds support for AVX2 key-value sorting. For comparison, benchmarks comparing the AVX2 vs. AVX512 performance and AVX2 vs. scalar performance are below:

AVX2
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
simdkvsort/random_128/uint64_t        1939 ns         1946 ns       360240
simdkvsort/random_1k/uint64_t        18021 ns        18029 ns        39110
simdkvsort/random_100k/uint64_t    2359494 ns      2359426 ns          298
simdkvsort/random_1m/uint64_t     27181339 ns     27180247 ns           26
simdkvsort/random_10m/uint64_t   328366429 ns    328336094 ns            2
simdkvsort/random_100m/uint64_t 3818560389 ns   3818247990 ns            1
simdkvsort/random_128/int64_t         1939 ns         1948 ns       359936
simdkvsort/random_1k/int64_t         16188 ns        16201 ns        43224
simdkvsort/random_100k/int64_t     2094296 ns      2094317 ns          335
simdkvsort/random_1m/int64_t      24546170 ns     24543133 ns           28
simdkvsort/random_10m/int64_t    300429155 ns    300408697 ns            2
simdkvsort/random_100m/int64_t  3539816011 ns   3539431827 ns            1
simdkvsort/random_128/double          1657 ns         1665 ns       420794
simdkvsort/random_1k/double          13263 ns        13268 ns        52672
simdkvsort/random_100k/double      1707007 ns      1706945 ns          411
simdkvsort/random_1m/double       20710996 ns     20709242 ns           34
simdkvsort/random_10m/double     264312414 ns    264284784 ns            3
simdkvsort/random_100m/double   3124498665 ns   3124159045 ns            1
simdkvsort/random_128/uint32_t        1117 ns         1120 ns       626058
simdkvsort/random_1k/uint32_t         6418 ns         6428 ns       109035
simdkvsort/random_100k/uint32_t     884360 ns       884394 ns          791
simdkvsort/random_1m/uint32_t     10349431 ns     10348265 ns           68
simdkvsort/random_10m/uint32_t   129223761 ns    129208665 ns            5
simdkvsort/random_100m/uint32_t 1529426690 ns   1529322850 ns            1
simdkvsort/random_128/int32_t         1117 ns         1120 ns       625407
simdkvsort/random_1k/int32_t          6465 ns         6473 ns       108458
simdkvsort/random_100k/int32_t      922079 ns       922080 ns          758
simdkvsort/random_1m/int32_t      10840572 ns     10839740 ns           65
simdkvsort/random_10m/int32_t    134945071 ns    134931366 ns            5
simdkvsort/random_100m/int32_t  1591200513 ns   1591027400 ns            1
simdkvsort/random_128/float           1205 ns         1209 ns       578252
simdkvsort/random_1k/float            7378 ns         7390 ns        94558
simdkvsort/random_100k/float        975399 ns       975467 ns          716
simdkvsort/random_1m/float        11259949 ns     11258612 ns           62
simdkvsort/random_10m/float      138409369 ns    138397578 ns            5
simdkvsort/random_100m/float    1634287725 ns   1634146491 ns            1
AVX512
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
simdkvsort/random_128/uint64_t        1174 ns         1177 ns       575703
simdkvsort/random_1k/uint64_t         7565 ns         7572 ns        92599
simdkvsort/random_100k/uint64_t    1099925 ns      1100012 ns          637
simdkvsort/random_1m/uint64_t     14643507 ns     14642890 ns           48
simdkvsort/random_10m/uint64_t   199687140 ns    199670122 ns            4
simdkvsort/random_100m/uint64_t 2487791770 ns   2487510844 ns            1
simdkvsort/random_128/int64_t         1177 ns         1180 ns       593556
simdkvsort/random_1k/int64_t          7617 ns         7625 ns        92009
simdkvsort/random_100k/int64_t     1099157 ns      1099172 ns          638
simdkvsort/random_1m/int64_t      14583034 ns     14581973 ns           48
simdkvsort/random_10m/int64_t    199769132 ns    199758441 ns            4
simdkvsort/random_100m/int64_t  2486518465 ns   2486306230 ns            1
simdkvsort/random_128/double          1079 ns         1081 ns       643342
simdkvsort/random_1k/double           6218 ns         6225 ns       112379
simdkvsort/random_100k/double      1005576 ns      1005649 ns          694
simdkvsort/random_1m/double       13543725 ns     13542388 ns           52
simdkvsort/random_10m/double     189044901 ns    189014084 ns            4
simdkvsort/random_100m/double   2369940116 ns   2369678581 ns            1
simdkvsort/random_128/uint32_t         878 ns          880 ns       795229
simdkvsort/random_1k/uint32_t         3586 ns         3588 ns       195088
simdkvsort/random_100k/uint32_t     548641 ns       548656 ns         1278
simdkvsort/random_1m/uint32_t      6862367 ns      6862099 ns          102
simdkvsort/random_10m/uint32_t    95574488 ns     95569892 ns            7
simdkvsort/random_100m/uint32_t 1191578120 ns   1191506814 ns            1
simdkvsort/random_128/int32_t          877 ns          879 ns       796114
simdkvsort/random_1k/int32_t          3591 ns         3594 ns       194537
simdkvsort/random_100k/int32_t      545403 ns       545420 ns         1272
simdkvsort/random_1m/int32_t       6858498 ns      6857780 ns          103
simdkvsort/random_10m/int32_t     95449697 ns     95438473 ns            7
simdkvsort/random_100m/int32_t  1190093444 ns   1190017700 ns            1
simdkvsort/random_128/float            942 ns          944 ns       744034
simdkvsort/random_1k/float            4677 ns         4679 ns       149366
simdkvsort/random_100k/float        622690 ns       622691 ns         1125
simdkvsort/random_1m/float         7654461 ns      7654027 ns           91
simdkvsort/random_10m/float      102934181 ns    102924527 ns            7
simdkvsort/random_100m/float    1270152278 ns   1269981042 ns            1
AVX2 vs. Scalar
Benchmark                                                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------------------
[scalarkvsort.*random_128 vs. simdkvsort.*random_128]/uint64_t                +0.1078         +0.1102          1798          1992          1801          1999
[scalarkvsort.*random_128 vs. simdkvsort.*random_128]/int64_t                 +0.0765         +0.0833          1773          1909          1775          1923
[scalarkvsort.*random_128 vs. simdkvsort.*random_128]/double                  -0.2372         -0.2336          2155          1644          2157          1653
[scalarkvsort.*random_128 vs. simdkvsort.*random_128]/uint32_t                -0.3402         -0.3394          1689          1114          1691          1117
[scalarkvsort.*random_128 vs. simdkvsort.*random_128]/int32_t                 -0.3457         -0.3445          1705          1116          1706          1118
[scalarkvsort.*random_128 vs. simdkvsort.*random_128]/float                   -0.4488         -0.4485          2177          1200          2179          1201
OVERALL_GEOMEAN                                                               -0.2251         -0.2230             0             0             0             0

Benchmark                                                                      Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------
[scalarkvsort.*random_1k vs. simdkvsort.*random_1k]/uint64_t                -0.2675         -0.2674         22952         16813         22963         16822
[scalarkvsort.*random_1k vs. simdkvsort.*random_1k]/int64_t                 -0.3494         -0.3493         23429         15243         23440         15253
[scalarkvsort.*random_1k vs. simdkvsort.*random_1k]/double                  -0.7164         -0.7162         44117         12513         44131         12523
[scalarkvsort.*random_1k vs. simdkvsort.*random_1k]/uint32_t                -0.6435         -0.6431         17177          6124         17184          6133
[scalarkvsort.*random_1k vs. simdkvsort.*random_1k]/int32_t                 -0.6460         -0.6458         17605          6231         17612          6238
[scalarkvsort.*random_1k vs. simdkvsort.*random_1k]/float                   -0.8430         -0.8428         44325          6960         44333          6968
OVERALL_GEOMEAN                                                             -0.6273         -0.6271             0             0             0             0

Benchmark                                                                      Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------
[scalarkvsort.*random_1m vs. simdkvsort.*random_1m]/uint64_t                -0.8630         -0.8630     192462669      26358200     192446270      26357705
[scalarkvsort.*random_1m vs. simdkvsort.*random_1m]/int64_t                 -0.8728         -0.8728     192637350      24496545     192630780      24496142
[scalarkvsort.*random_1m vs. simdkvsort.*random_1m]/double                  -0.9052         -0.9052     217953698      20668767     217937330      20667636
[scalarkvsort.*random_1m vs. simdkvsort.*random_1m]/uint32_t                -0.9384         -0.9384     168347341      10363963     168335643      10363599
[scalarkvsort.*random_1m vs. simdkvsort.*random_1m]/int32_t                 -0.9362         -0.9362     170325919      10866820     170300695      10866190
[scalarkvsort.*random_1m vs. simdkvsort.*random_1m]/float                   -0.9425         -0.9425     196415775      11294391     196401933      11294226
OVERALL_GEOMEAN                                                             -0.9152         -0.9152             0             0             0             0

@sterrettm2 sterrettm2 force-pushed the kv-avx2 branch 2 times, most recently from 023e10f to c55ab7a Compare April 23, 2024 19:07
Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks @sterrettm2!

@r-devulap r-devulap merged commit 3f958c7 into intel:main May 6, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants