-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve aggregate performance with specialized groups accumulator for single string group by #7064
Comments
I think for single column group, we could "just"
anything I missed here? |
I think that would get most of the benefit. Another potential optimization would be to potentially use the 'small string optimization' so the hash table comparison could be done inline and have one level of indirection So in the hash table store not only the group_index but also another 12 bytes:
That way group key comparisons are faster because:
|
Here is a suggestion for a specific data structure for a similar idea from @Dandandan 👍 /// Contains hashes and offsets for given hash (+ potential collisions), use `RawTable` for extra speed
uniques: HashMap<u64, SmallVec<u64; 1>>,
/// actual string/byte data, can be emitted cheaply / free
values: BufferBuilder<u8>, |
If you use the raw_entry API on HashMap, or RawTable, you should be able to avoid needing the SmallVec to handle hash collisions - as they will be handled for you more efficiently by the hash probing setup. |
Yes, good addition 👍 |
This task seems interesting, if no one take on this, I would like to give it a try. Probably work on the follow up on #8721 first. Distinct Accumulator for Bytes |
I think that is a great idea |
I draft the idea, does this make sense? // Short String Optimizated HashSet for String
// Equivalent to HashSet<String> but with better memory usage (Speed unsure)
struct SSOStringHashSet {
// header: u128
// short string: length(4bytes) + data(12bytes)
// long string: length(4bytes) + prefix(4bytes) + offset(8bytes)
header_set: HashSet<u128, RandomState>,
// map<hash of long string w/o 4 bytes prefix, offset in buffer>
long_string_map: HashMap<u64, u64, RandomState>,
buffer: BufferBuilder<u8>,
}
impl SSOStringHashSet {
fn insert(&mut self, value: &str) {
let value_len = value.len();
if value_len <= 12 {
let mut short_string_header = 0u128;
short_string_header |= (value_len << 96) as u128;
short_string_header |= value
.as_bytes()
.iter()
.fold(0u128, |acc, &x| acc << 8 | x as u128);
self.header_set.insert(short_string_header);
} else {
// 1) hash the string w/o 4 bytes prefix
// 2) check if the hash exists in the map
// 3) if exists, insert the offset into the header
// 4) if not exists, insert the hash and offset into the map
let mut long_string_header = 0u128;
long_string_header |= (value_len << 96) as u128;
long_string_header |= (value
.as_bytes()
.iter()
.take(4)
.fold(0u128, |acc, &x| acc << 8 | x as u128)
<< 64) as u128;
let suffix = value
.as_bytes()
.iter()
.skip(4)
.collect::<Vec<_>>();
// NYI hash_bytes: hash &[u8] to u64, similar to hashbrown `make_hash` for &[u8]
let hashed_suffix = hash_bytes(suffix);
if let Some(offset) = self.long_string_map.get(&hashed_suffix) {
long_string_header |= *offset as u128;
} else {
let offset = self.buffer.len();
self.long_string_map.insert(hashed_suffix, offset as u64);
long_string_header |= offset as u128;
// convert suffix: Vec<&u8> to &[u8]
self.buffer.append_slice(suffix);
}
self.header_set.insert(long_string_header);
}
}
} |
I think that won't currently handle hash collisions, https://github.com/apache/arrow-rs/blob/master/arrow-array%2Fsrc%2Fbuilder%2Fgeneric_bytes_dictionary_builder.rs#L211 might provide some inspiration here |
If you want to try and use the short string optimization, it might make sense to create a struct to encapsulate the struct directly, perhaps something like enum StringKey {
// data length, in chars
u64: len,
// if the data length, in *bytes* is less than 8, stored as a u64 here,
// otherwise, this stores the offset into buffer
offset_or_inline: u64,
}
} Then you maybe your structure can look like struct StringKey {
// returns the data pointed at by this key (either inlined or in buffer, depending on self.length)
fn val(&self, buffer: &[u8]) -> &str {
...
}
} As well as various hashing, etc Then your struct could look something like // Short String Optimizated HashSet for String
// Equivalent to HashSet<String> but with better memory usage (Speed unsure)
struct SSOStringHashSet {
inner: HashSet<StringKey, RandomState>,
// map<hash of long string w/o 4 bytes prefix, offset in buffer>
long_string_map: HashMap<u64, u64, RandomState>,
buffer: BufferBuilder<u8>,
} |
I started hacking on a potential PR for this here: #8827 -- maybe we can collaborate @jayzhan211 . I need a little more to get it working in general, but then we'll have to implement the emit/clear functions too |
Is your feature request related to a problem or challenge?
DataFusion could be made faster for queries that have a
GROUP BY <string> column
For example, in ClickBench Q34
You can run this query from a datafusion checkout like this (using the code in #7060, which hopefully will be merged shortly):
Here is the profile from running 16 cores:
Describe the solution you'd like
I would like a special cased
GroupsValue
for this case of a single string (hopefully Utf8, LargeUTf8, Binary, and LargeBinary) column that:Other ideas that could make this faster:
"Small String optimization" refers to the format described in the umbra paper,
This would have to be adapted for Rust / safetly but the same general idea applies (inlining the first few bytes of the string into the hash table for quick "is it equal" comparisons, and then having an offset to an external area for larger strings)
Describe alternatives you've considered
No response
Additional context
@tustvold's changes in #6969 and #7043 should make it very easy to code this up as a different GroupValues implementation
The text was updated successfully, but these errors were encountered: