Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: CompressionTrees diverge from the actual array children #1430

Merged
merged 9 commits into from
Nov 21, 2024

Conversation

lwwmanning
Copy link
Member

@lwwmanning lwwmanning commented Nov 21, 2024

Fixed the following misalignments:

  • Sparse compressor would try to compress both indices and values like.child(0)
  • FoR compressor would not return an FoR array for constant-0
  • Chunked compressor had entirely the wrong children in its CompressionTree
  • FSST compressor would compress the varbin offsets but didn't reflect it in the tree
  • Dict compressor switched codes and values (because we do codes first, values second everywhere except in the DictArray children themselves, where they are indexes 1 and 0 respectively). Switched the DictArray children, which is a backcompat break, but figured we're better doing the "natural" ordering that people seemingly expect / we do everywhere else in the code base

Also improved the CompressionTree display impl to use TreeDisplay, which is much nicer & more useful for debugging. And I removed the panic_in_result_fn lint, since we ban panic, but assertions are in fact quite nice sometimes.

@lwwmanning lwwmanning enabled auto-merge (squash) November 21, 2024 15:19
@@ -1,8 +1,8 @@
mod tokio_runtime;

use core::cell::LazyCell;
Copy link
Contributor

@AdamGS AdamGS Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting ready for non_std Vortex?

}
EitherOrBoth::Left(Some(child_tree)) => {
vortex_panic!(
"Child tree without child array!!\nroot tree: {}\nroot array: {}\nlocal tree: {path}\nlocal array: {}\nproblematic child_tree: {child_tree}",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to panic rather than making this fallible because this is definitely programmer error (and also, we explore ~all possible permutations of encoding trees in the unit test suite)

@lwwmanning
Copy link
Member Author

this is split off from #1068

@robert3005
Copy link
Member

@lwwmanning need to look deeper but the whole idea of compression tree was that you could return a different array than the compressor. The compression tree would send you back to that path

@lwwmanning
Copy link
Member Author

@lwwmanning need to look deeper but the whole idea of compression tree was that you could return a different array than the compressor. The compression tree would send you back to that path

we only did that in the FoR compressor for constant 0, and I'm not sure that's right (i.e., if it's constant 0, the Constant compressor or should win over the FoR compressor with Constant child in this new implementation, rather than saying that we should encode-like with the FoRCompressor)

@lwwmanning
Copy link
Member Author

also, of the 5 things I changed, I'd argued all except the FoR one are clearly bugs

@robert3005
Copy link
Member

If we got to for compressor then that means array isn’t constant. Either stats are missing or it’s not constant. Not constant case is likely sparse 0s

@lwwmanning
Copy link
Member Author

If we got to for compressor then that means array isn’t constant. Either stats are missing or it’s not constant. Not constant case is likely sparse 0s

Yep, and in that case we should do Frequency encoding, not FoR

Cargo.toml Show resolved Hide resolved
@gatesn
Copy link
Contributor

gatesn commented Nov 21, 2024

but assertions are in fact quite nice sometimes.

😱

@lwwmanning lwwmanning merged commit f052dd7 into develop Nov 21, 2024
14 checks passed
@lwwmanning lwwmanning deleted the wm/compression-tree branch November 21, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants