-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of set_bits by avoiding to set individual bits #6288
Conversation
|
@alamb @andygrove @viirya |
82cbff6
to
1e9de38
Compare
This reverts commit a15db14.
Here are the benchmark results I got from this branch:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's like the 4th round of comments, but I feel we're getting super close to merging. I think this might be the final rounds. The code now also feels more compact and easier to follow. Thanks for being so patient.
} else { | ||
let len = std::cmp::min(len, 64 - std::cmp::max(read_shift, write_shift)); | ||
let bytes = ceil(len + read_shift, 8); | ||
let chunk = unsafe { read_bytes_to_u64(data, read_byte, bytes) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some // SAFETY:
explanations to unsafe usages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, thanks. Given the weight of unsafe code, I would like to see a 2nd approval though.
Thank you @crepererum @alamb @andygrove @viirya @Dandandan Any other comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kazuyukitanimura -- I will review this PR later today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kazuyukitanimura -- I believe this code is correct, but I think it needs some more testing for me to be super confident it in, especially given its profuse use of unsafe
as mentioned by @crepererum
I have an idea for a fuzz testing -- I am going to try and code something up and report back here
fn test_set_upto_64bits() { | ||
// len >= 64 | ||
let write_data: &mut [u8] = &mut [0; 9]; | ||
let data: &[u8] = &[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add a test that is greater than 64 bits (not just = 64 bits)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add it. BTW there is an existing test https://github.com/apache/arrow-rs/blob/master/arrow-buffer/src/util/bit_mask.rs#L170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on some more tests too. Stay tuned...
Thanks @alamb FYI it looks there is an existing fuzz test although this is limited to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a fuzz tester for set_bits
here: #6394
I ran the fuzz tester against this branch and it passed
I also ran it under MIRI (which took quite a while) and it found no errors
andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ cargo +nightly miri test -p arrow-buffer --profile test --lib util::bit_mask::tests
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.04s
Running unittests src/lib.rs (target/miri/aarch64-apple-darwin/debug/deps/arrow_buffer-82bdebfeced20ddb)
running 6 tests
test util::bit_mask::tests::set_bits_fuz ... ok
test util::bit_mask::tests::test_set_bits_aligned ... ok
test util::bit_mask::tests::test_set_bits_unaligned ... ok
test util::bit_mask::tests::test_set_bits_unaligned_destination_end ... ok
test util::bit_mask::tests::test_set_bits_unaligned_destination_start ... ok
test util::bit_mask::tests::test_set_upto_64bits ... ok
test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 112 filtered out; finished in 38.62s
Therefore, I think this is very well done @kazuyukitanimura 👏 🏆 🚀
Let's get it merged |
Thanks again everyone -- this is quite cool |
Thank you @alamb @crepererum @andygrove @viirya @Dandandan |
Which issue does this PR close?
Rationale for this change
This PR improves the performance of set_bits that is often the bottleneck for filter execs in DataFusion. In particular,
len < 64
is the majority for TPCDS and this PR helps for that scenario.For the existing bench
What changes are included in this PR?
Consolidating
BitChunks::new
and individual bit manipulations into one method.Are there any user-facing changes?
No