-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manage group values and states by blocks in aggregation #11931
Comments
take |
Is the plan to manage group values and states in two different kinds of blocks, or a unified block? There are so many good optimizations in the aggregation code now, they make the implementation a bit hard to understand, I was thinking managing group values + states in a single block could also be a good cleanup |
Plan to make them two kinds of blocks, because The design is similar as #7065 , but introduce it into The reason why doing it is according to the cpu flamegraph, the growing of the big single |
The sketch's detailed design1. When will the blocked method triggered?
2. Introduce new emit modes used in blocked methodIt can support emit multiple blocks in
For incrementally development for blocked method for so many detailed GroupValues and GroupAccumulator impls. This sketch pr did a lot of compatibility works, and combinations are allowed:
3. Introduce
|
I think emitting with "n" blocks is much more straightforward. n = 3, block size = 4. emit 3 * 4 = 12 elements |
It seems indeed more clear! I have switched to this in codes. |
@alamb The general design can see: And the pr is here: |
Is your feature request related to a problem or challenge?
Now we manage the group values and the aggregation states by a single big vector growing constantly.
This solution is simple to impl, but really leads to some extra cpu cost according to the cpu profile.
Maybe we should manage them by blocks like duckdb.
Describe the solution you'd like
It may be a big work, I want to finish it through following steps:
GroupValuesRows
.group values
management in otherGroupValues
impls.states
management in differentGroupAccumulator
impls.The general design is similar as #7065 , but introduce it into GroupValues, not only GroupAccumulators.
Describe alternatives you've considered
No response
Additional context
The cpu cost flamegraph:
https://github.com/Rachelint/drawio-store/blob/main/cpucosts0811.png
The text was updated successfully, but these errors were encountered: