Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(common): reduce buffer reallocations in HashKey::build #8885

Merged
merged 6 commits into from
Mar 30, 2023

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Mar 30, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Thanks to @TennyZhuang and @st1page for their investigations.
This is part of optimizations suggested by @st1page: #8802 (comment).

We estimate size of HashKey to avoid unnecessary reallocations from vec.

#8683

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@kwannoel kwannoel changed the title perf(common): reduce allocations from HashKey::build perf(common): reduce buffer reallocations in HashKey::build Mar 30, 2023
@kwannoel
Copy link
Contributor Author

Before and after (around 10-15% improvement):

noelkwan@Noels-MacBook-Pro benches % cargo bench --bench bench_hash_key_encoding -- "vec ser varchar" && git checkout - && cargo bench --bench bench_hash_key_encoding -- "vec ser varchar"
   Compiling risingwave_common v0.2.0-alpha (/Users/noelkwan/projects/risingwave/src/common)
    Finished bench [optimized] target(s) in 25.40s
     Running benches/bench_hash_key_encoding.rs (/Users/noelkwan/projects/risingwave/target/release/deps/bench_hash_key_encoding-b933324b2cd47305)
vec ser varchar KeySerialized, 128 rows, Pr[null]=0
                        time:   [25.341 µs 25.384 µs 25.428 µs]
                        change: [+13.522% +13.868% +14.235%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

vec ser varchar KeySerialized, 1024 rows, Pr[null]=0
                        time:   [203.85 µs 204.37 µs 204.86 µs]
                        change: [+12.867% +13.254% +13.648%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

vec ser varchar KeySerialized, 128 rows, Pr[null]=0.01
                        time:   [25.202 µs 25.233 µs 25.266 µs]
                        change: [+12.170% +12.472% +12.761%] (p = 0.00 < 0.05)
                        Performance has regressed.

vec ser varchar KeySerialized, 1024 rows, Pr[null]=0.01
                        time:   [185.17 µs 185.59 µs 186.01 µs]
                        change: [+4.6333% +7.0009% +9.1626%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

vec ser varchar KeySerialized, 128 rows, Pr[null]=0.1
                        time:   [20.482 µs 20.509 µs 20.537 µs]
                        change: [+13.448% +13.777% +14.101%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

vec ser varchar KeySerialized, 1024 rows, Pr[null]=0.1
                        time:   [165.04 µs 165.43 µs 165.87 µs]
                        change: [+14.055% +14.376% +14.712%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Switched to branch 'kwannoel/reserve-buffer'
Your branch is up to date with 'origin/kwannoel/reserve-buffer'.
   Compiling risingwave_common v0.2.0-alpha (/Users/noelkwan/projects/risingwave/src/common)
    Finished bench [optimized] target(s) in 25.80s
     Running benches/bench_hash_key_encoding.rs (/Users/noelkwan/projects/risingwave/target/release/deps/bench_hash_key_encoding-b933324b2cd47305)
vec ser varchar KeySerialized, 128 rows, Pr[null]=0
                        time:   [22.171 µs 22.220 µs 22.273 µs]
                        change: [-12.841% -12.595% -12.325%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

vec ser varchar KeySerialized, 1024 rows, Pr[null]=0
                        time:   [176.80 µs 177.03 µs 177.24 µs]
                        change: [-13.430% -13.232% -13.020%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

vec ser varchar KeySerialized, 128 rows, Pr[null]=0.01
                        time:   [22.198 µs 22.228 µs 22.267 µs]
                        change: [-11.930% -11.787% -11.626%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

vec ser varchar KeySerialized, 1024 rows, Pr[null]=0.01
                        time:   [158.09 µs 158.27 µs 158.44 µs]
                        change: [-14.925% -14.707% -14.501%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

vec ser varchar KeySerialized, 128 rows, Pr[null]=0.1
                        time:   [17.546 µs 17.571 µs 17.597 µs]
                        change: [-14.388% -14.227% -14.065%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

vec ser varchar KeySerialized, 1024 rows, Pr[null]=0.1
                        time:   [141.64 µs 141.83 µs 142.00 µs]
                        change: [-14.311% -14.122% -13.934%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

src/common/src/hash/key.rs Show resolved Hide resolved
src/common/src/array/data_chunk.rs Outdated Show resolved Hide resolved
@kwannoel kwannoel enabled auto-merge March 30, 2023 09:33
@kwannoel kwannoel added this pull request to the merge queue Mar 30, 2023
@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Merging #8885 (528ebb2) into main (a311f5b) will increase coverage by 0.00%.
The diff coverage is 94.11%.

@@           Coverage Diff           @@
##             main    #8885   +/-   ##
=======================================
  Coverage   70.78%   70.78%           
=======================================
  Files        1177     1177           
  Lines      195369   195382   +13     
=======================================
+ Hits       138285   138296   +11     
- Misses      57084    57086    +2     
Flag Coverage Δ
rust 70.78% <94.11%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/array/data_chunk.rs 94.72% <90.90%> (-0.08%) ⬇️
src/common/src/hash/key.rs 84.89% <100.00%> (+0.05%) ⬆️

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Merged via the queue into main with commit 83f1e9d Mar 30, 2023
@kwannoel kwannoel deleted the kwannoel/reserve-buffer branch March 30, 2023 14:13
Comment on lines +117 to +121
let estimated_key_size = data_chunk.estimate_value_encoding_size(column_idxes);
// Construct serializers for each row.
let mut serializers: Vec<Self::S> = hash_codes
.into_iter()
.map(Self::S::from_hash_code)
.map(|hashcode| Self::S::from_hash_code(hashcode, estimated_key_size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not good enough, we can get each row's value encoding size like #8924. And after that, we can avoid any grow of the vec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants