-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(common): unify implementation of hash key #9671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 3400 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
1548 | 1 | 1851 | 0 |
Click to see the invalid file list
- src/common/src/hash/key_v2.rs
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
282f580
to
91a74d2
Compare
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
@@ -77,6 +77,7 @@ strum = "0.24" | |||
strum_macros = "0.24" | |||
sysinfo = { version = "0.26", default-features = false } | |||
thiserror = "1" | |||
tinyvec = { version = "1", features = ["rustc_1_55", "grab_spare_slice"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need rustc_1_55
and grab_spare_slice
? Maybe should leave some comments inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because some interfaces of tinyvec
are only available under these features. Should be documented by this crate.
.expect("in-memory deserialize should never fail") | ||
.expect("datum should never be NULL"); | ||
|
||
// TODO: extra unboxing from `ScalarRefImpl` is unnecessary here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure I understand. Why is there boxing and unboxing for ScalarRefImpl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "boxing" here we mean erasing the concrete type of scalar and putting it into some variant of ScalarImpl
. In general, a "boxed" scalar cannot be directly used and requires another match
to take the real scalar so there's extra overhead.
} | ||
|
||
/// The buffer for building a hash key on a fixed-size byte array on the stack. | ||
pub struct StackBuffer<const N: usize>(ArrayVec<[u8; N]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not directly use array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. An ArrayVec
is exactly an array with a "current length" on which some methods are encapsulated. We use ArrayVec
simply for clearer implementation of BufMut
. 🤣
pub type Key256<B = HeapNullBitmap> = FixedSizeKey<32, B>; | ||
pub type KeySerialized<B = HeapNullBitmap> = SerializedKey<B>; | ||
|
||
pub type SerializedKey<B = HeapNullBitmap> = HashKeyImpl<HeapStorage, B>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if it will be confusing to have KeySerialized
and SerializedKey
😆 . But I don't have a better idea now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These're copied from the original implementation. This PR aims to keep all of the interfaces untouched so the changes are all in the common
crate. Let's refactor the code structure along with this in next PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I will make another pass later. Thanks for this!
Oops. There's really some difference in the decimal precision. This seems to be an expected behavior caused by this PR, but is it correct? 👀 cc @xiangjinwu |
FYI, I think PG15's behavior does not give any guarantee. dev=# create table t_d(v decimal);
dev=# insert into t_d values (3), (3.0), (3.00);
INSERT 0 3
dev=# select * from t_d;
v
------
3
3.0
3.00
(3 rows)
dev=# select * from t_d group by v;
v
---
3
(1 row)
dev=# delete from t_d;
DELETE 3
dev=# insert into t_d values (3.0), (3.00);
INSERT 0 2
dev=# select * from t_d group by v;
v
-----
3.0
(1 row)
dev=# delete from t_d;
DELETE 2
dev=# insert into t_d values (3), (3.00);
INSERT 0 2
dev=# select * from t_d group by v;
v
---
3
(1 row)
dev=# delete from t_d;
DELETE 2
dev=# insert into t_d values (3.00), (3);
INSERT 0 2
dev=# select * from t_d group by v;
v
------
3.00
(1 row) |
Signed-off-by: Bugen Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It seems that some other bugs are exposed after this PR 🤣
Looks like PostgreSQL always picks the value of the first row. For the sake of determinism in our system, I guess it's reasonable for us to always output the normalized form. |
Signed-off-by: Bugen Zhao <[email protected]>
7de6c67
to
15fdcac
Compare
Codecov Report
@@ Coverage Diff @@
## main #9671 +/- ##
==========================================
- Coverage 71.02% 71.01% -0.01%
==========================================
Files 1241 1242 +1
Lines 208154 208091 -63
==========================================
- Hits 147844 147785 -59
+ Misses 60310 60306 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
See #9659 for the background of this PR. Almost rewrite the implementation of
HashKey
and unify the implementation into a single struct with generics.The motivation for this large refactor is...
Eq
Ord
HashKeySerde
(but only used and implemented forCopy
scalar types), let's extend it and make it the source of truth of hash key encodingScalarRef
won't compile for non-Copy
scalar types; also for efficiency, do not return aVec
and copy it again for variant length types: let's directly write into theBufMut
: so we need to redesign this traitFixedSizeKey
andSerializedKey
, and the only difference is the memory backend (stack array vs heap vector), let's extract it.That's it. 😇
In the meantime, this PR tweaks the performance carefully to avoid unnecessary boxing or
match
as much as possible, while also fixing some bad patterns. So we get performance improvement on most micro-benchmark cases.But it's still worth noting that #9305 needs to be totally redone since hash encoding is not value encoding. Let's do this in the next PRs.
Close #9659.
Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note