Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug in nested v4 format merger from refactoring #14053

Merged
merged 3 commits into from
Apr 11, 2023

Conversation

clintropolis
Copy link
Member

Description

Fixes a regression when ingesting 'v4' nested format columns caused by shuffling around some stuff when refactoring during review of #14014. I realized that I forgot to switch some of the tests back to using the v4 format, so Ive swapped the 'tsv' format tests to go back to using 'json' instead of 'auto' to ingest the test data (there are no arrays in that data so there is no difference in functional behavior between v4 and the new common format).

This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Comment on lines 179 to 180
if (!(format instanceof NestedCommonFormatColumn.Format)
&& !(format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: !(format instanceof NestedCommonFormatColumn.Format || format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)

is perhaps a bit more easier to understand the intentions of for this.

)
)
);
// still merge it since that follows the normal path of persist then merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "still merge" implies that this comment is referring to a change. A new reader is not going to know what that change is... It looks like you are exercising the behavior of what happens when it reads back over the segment to persist a new one? Perhaps changing this to

Do a merge, which will do yet another persist and load again to validate that the behavior of writing and then
reading still does good things

This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...

yeah, I was concerned about that, but it looks like it hasn't made the (already terrible) processing tests times any worse, so I think its worth it because of the extra coverage of ensuring both indexable adapters are flexed when building test data segments (and more closely matches current ingest task behavior)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants