Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(encoding): use estimate size in data chunk encoding #8924

Merged
merged 24 commits into from
Apr 11, 2023

Conversation

Honeta
Copy link
Contributor

@Honeta Honeta commented Mar 31, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

as title

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.

benchmark:

data chunk encoding: Int16, 128 rows, Pr[null]=0
                        time:   [5.0985 µs 5.0997 µs 5.1011 µs]
                        change: [-60.708% -60.585% -60.481%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

data chunk encoding: Int16, 1024 rows, Pr[null]=0
                        time:   [70.243 µs 70.277 µs 70.334 µs]
                        change: [-45.606% -45.550% -45.464%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

data chunk encoding: Int16, 128 rows, Pr[null]=0.01
                        time:   [9.0632 µs 9.0638 µs 9.0645 µs]
                        change: [-51.416% -51.347% -51.211%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

data chunk encoding: Int16, 1024 rows, Pr[null]=0.01
                        time:   [63.245 µs 63.248 µs 63.252 µs]
                        change: [-51.149% -51.129% -51.111%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

data chunk encoding: Int16, 128 rows, Pr[null]=0.1
                        time:   [7.4565 µs 7.4573 µs 7.4582 µs]
                        change: [-53.708% -53.633% -53.504%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  2 (2.00%) high severe

data chunk encoding: Int16, 1024 rows, Pr[null]=0.1
                        time:   [59.937 µs 59.943 µs 59.950 µs]
                        change: [-53.571% -53.542% -53.502%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

data chunk encoding: String, 128 rows, Pr[null]=0
                        time:   [11.026 µs 11.032 µs 11.044 µs]
                        change: [-30.353% -30.099% -29.848%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

data chunk encoding: String, 1024 rows, Pr[null]=0
                        time:   [97.505 µs 97.516 µs 97.527 µs]
                        change: [-32.983% -32.899% -32.825%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

data chunk encoding: String, 128 rows, Pr[null]=0.01
                        time:   [10.976 µs 10.978 µs 10.980 µs]
                        change: [-24.373% -24.341% -24.295%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe

data chunk encoding: String, 1024 rows, Pr[null]=0.01
                        time:   [96.907 µs 96.923 µs 96.944 µs]
                        change: [-34.879% -34.853% -34.832%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

data chunk encoding: String, 128 rows, Pr[null]=0.1
                        time:   [10.756 µs 10.758 µs 10.760 µs]
                        change: [-34.491% -34.474% -34.456%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

data chunk encoding: String, 1024 rows, Pr[null]=0.1
                        time:   [94.911 µs 94.924 µs 94.938 µs]
                        change: [-45.633% -45.613% -45.596%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

data chunk encoding: Int16 and String, 128 rows, Pr[null]=0
                        time:   [10.961 µs 10.962 µs 10.964 µs]
                        change: [-34.081% -33.991% -33.764%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

data chunk encoding: Int16 and String, 1024 rows, Pr[null]=0
                        time:   [107.45 µs 107.48 µs 107.52 µs]
                        change: [-25.952% -25.843% -25.672%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

data chunk encoding: Int16 and String, 128 rows, Pr[null]=0.01
                        time:   [10.883 µs 10.885 µs 10.887 µs]
                        change: [-35.649% -35.442% -35.241%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

data chunk encoding: Int16 and String, 1024 rows, Pr[null]=0.01
                        time:   [92.819 µs 92.829 µs 92.839 µs]
                        change: [-37.325% -37.120% -36.926%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

data chunk encoding: Int16 and String, 128 rows, Pr[null]=0.1
                        time:   [11.008 µs 11.009 µs 11.011 µs]
                        change: [-33.145% -33.052% -32.900%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

data chunk encoding: Int16 and String, 1024 rows, Pr[null]=0.1
                        time:   [94.356 µs 94.364 µs 94.372 µs]
                        change: [-35.836% -35.695% -35.611%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low severe
  1 (1.00%) high mild
  2 (2.00%) high severe

data chunk encoding: Int16, Int32, Int64 and String, 128 rows, Pr[null]=0
                        time:   [13.827 µs 13.828 µs 13.830 µs]
                        change: [-60.049% -60.000% -59.879%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

data chunk encoding: Int16, Int32, Int64 and String, 1024 rows, Pr[null]=0
                        time:   [131.10 µs 131.12 µs 131.13 µs]
                        change: [-52.724% -52.711% -52.697%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild
  3 (3.00%) high severe

data chunk encoding: Int16, Int32, Int64 and String, 128 rows, Pr[null]=0.01
                        time:   [13.837 µs 13.838 µs 13.841 µs]
                        change: [-59.884% -59.820% -59.723%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

data chunk encoding: Int16, Int32, Int64 and String, 1024 rows, Pr[null]=0.01
                        time:   [130.70 µs 130.71 µs 130.73 µs]
                        change: [-53.020% -52.903% -52.827%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

data chunk encoding: Int16, Int32, Int64 and String, 128 rows, Pr[null]=0.1
                        time:   [13.598 µs 13.599 µs 13.600 µs]
                        change: [-58.051% -57.943% -57.882%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

data chunk encoding: Int16, Int32, Int64 and String, 1024 rows, Pr[null]=0.1
                        time:   [115.88 µs 115.90 µs 115.91 µs]
                        change: [-56.546% -56.537% -56.528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

@Honeta Honeta requested a review from st1page March 31, 2023 10:02
@codecov
Copy link

codecov bot commented Mar 31, 2023

Codecov Report

Merging #8924 (293c864) into main (ee6d44f) will increase coverage by 0.01%.
The diff coverage is 66.32%.

@@            Coverage Diff             @@
##             main    #8924      +/-   ##
==========================================
+ Coverage   70.90%   70.91%   +0.01%     
==========================================
  Files        1196     1196              
  Lines      198786   198881      +95     
==========================================
+ Hits       140940   141028      +88     
- Misses      57846    57853       +7     
Flag Coverage Δ
rust 70.91% <66.32%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/array/data_chunk.rs 89.22% <38.77%> (-4.32%) ⬇️
src/common/src/util/value_encoding/mod.rs 92.75% <93.87%> (+0.17%) ⬆️

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be more cache-friendly if we can use a vectorized addition. But we need alloc the sum array in advance. We need a trade-off between them maybe. But not emergent.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 3176 files.

Valid Invalid Ignored Fixed
1477 2 1697 0
Click to see the invalid file list
  • src/common/benches/bench_data_chunk_encoding.rs
  • src/common/src/test_utils/rand_chunk.rs

src/common/benches/bench_data_chunk_encoding.rs Outdated Show resolved Hide resolved
src/common/src/test_utils/rand_chunk.rs Outdated Show resolved Hide resolved
@st1page st1page added this pull request to the merge queue Apr 11, 2023
Merged via the queue into main with commit bcc00c0 Apr 11, 2023
@st1page st1page deleted the xinjing/perf_data_chunk_encoding branch April 11, 2023 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants