-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support comparison semantics for batch serialize/deserialize of Column #9756
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
a311faa
to
eeb1ac3
Compare
Signed-off-by: guo-shaoge <[email protected]>
eeb1ac3
to
c07d13a
Compare
e19c851
to
7c725cb
Compare
Signed-off-by: guo-shaoge <[email protected]>
7c725cb
to
84ee65b
Compare
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
/retest |
Signed-off-by: guo-shaoge <[email protected]>
3878b4b
to
47cdf91
Compare
} | ||
else | ||
{ | ||
inline_memcpy(pos[i], &data[array_offsets[start + i - 1]], len * sizeof(T)); | ||
if (len <= 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments here to explain why is 4 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not my code. @gengliqi can you help to explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a simple optimization. If the length is very small, copying them one by one is faster than std::memcpy.
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yibin87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
38e03ed
to
8070321
Compare
06ff1d4
to
7572680
Compare
} | ||
else | ||
{ | ||
inline_memcpy(pos[i], &data[array_offsets[start + i - 1]], len * sizeof(T)); | ||
if (len <= 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a simple optimization. If the length is very small, copying them one by one is faster than std::memcpy.
dbms/src/Columns/ColumnString.cpp
Outdated
{ | ||
assert(sizeAt(i) >= 1); | ||
// Minus 1 because of terminating zero. | ||
byte_size[i] += sizeof(UInt32) + (sizeAt(i) - 1) * max_bytes_one_char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size * max_bytes_one_char
may waste lots of memory. For example, a utf8 character is 3 bytes, max_bytes_one_char
is 4 bytes. 4 bytes is enough but here need 12 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dbms/src/Columns/ColumnString.cpp
Outdated
const void * src = &chars[offsetAt(start + i)]; | ||
if constexpr (has_collator) | ||
{ | ||
auto sort_key = collator->sortKey(reinterpret_cast<const char *>(src), str_size - 1, *sort_key_container); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sortKey
is a virtual function. How about adding a batch version to reduce the overhead of virtual functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some marcos so we can static_cast collator to Derived class and call method in specific Derived class.
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
7572680
to
25406de
Compare
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #9761
Problem Summary: #9553 add serialize/deserialize interface by column-wise, but it didn't handle collator for ColumnString and real copy format for ColumnDecimal([sign, limb_count, limb_data])
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note