-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bug in nested v4 format merger from refactoring #14053
fix bug in nested v4 format merger from refactoring #14053
Conversation
if (!(format instanceof NestedCommonFormatColumn.Format) | ||
&& !(format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: !(format instanceof NestedCommonFormatColumn.Format || format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)
is perhaps a bit more easier to understand the intentions of for this.
) | ||
) | ||
); | ||
// still merge it since that follows the normal path of persist then merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "still merge" implies that this comment is referring to a change. A new reader is not going to know what that change is... It looks like you are exercising the behavior of what happens when it reads back over the segment to persist a new one? Perhaps changing this to
Do a merge, which will do yet another persist and load again to validate that the behavior of writing and then
reading still does good things
This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...
yeah, I was concerned about that, but it looks like it hasn't made the (already terrible) processing tests times any worse, so I think its worth it because of the extra coverage of ensuring both indexable adapters are flexed when building test data segments (and more closely matches current ingest task behavior)
Description
Fixes a regression when ingesting 'v4' nested format columns caused by shuffling around some stuff when refactoring during review of #14014. I realized that I forgot to switch some of the tests back to using the v4 format, so Ive swapped the 'tsv' format tests to go back to using 'json' instead of 'auto' to ingest the test data (there are no arrays in that data so there is no difference in functional behavior between v4 and the new common format).
This PR has: