fix bug in nested v4 format merger from refactoring #14053

clintropolis · 2023-04-10T03:19:54Z

Description

Fixes a regression when ingesting 'v4' nested format columns caused by shuffling around some stuff when refactoring during review of #14014. I realized that I forgot to switch some of the tests back to using the v4 format, so Ive swapped the 'tsv' format tests to go back to using 'json' instead of 'auto' to ingest the test data (there are no arrays in that data so there is no difference in functional behavior between v4 and the new common format).

This PR has:

been self-reviewed.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

imply-cheddar · 2023-04-10T06:46:38Z

processing/src/main/java/org/apache/druid/segment/QueryableIndexIndexableAdapter.java

+    if (!(format instanceof NestedCommonFormatColumn.Format)
+        && !(format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)) {


nit: !(format instanceof NestedCommonFormatColumn.Format || format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)

is perhaps a bit more easier to understand the intentions of for this.

imply-cheddar · 2023-04-10T06:51:26Z

processing/src/test/java/org/apache/druid/segment/IndexBuilder.java

+              )
+          )
+      );
+      // still merge it since that follows the normal path of persist then merge


nit: "still merge" implies that this comment is referring to a change. A new reader is not going to know what that change is... It looks like you are exercising the behavior of what happens when it reads back over the segment to persist a new one? Perhaps changing this to

Do a merge, which will do yet another persist and load again to validate that the behavior of writing and then
reading still does good things

This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...

This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...

yeah, I was concerned about that, but it looks like it hasn't made the (already terrible) processing tests times any worse, so I think its worth it because of the extra coverage of ensuring both indexable adapters are flexed when building test data segments (and more closely matches current ingest task behavior)

fix bug in nested v4 format merger from refactoring

99b702e

clintropolis added the Bug label Apr 10, 2023

clintropolis added this to the 26.0 milestone Apr 10, 2023

clintropolis added the Area - Segment Format and Ser/De label Apr 10, 2023

imply-cheddar approved these changes Apr 10, 2023

View reviewed changes

clintropolis added 2 commits April 10, 2023 14:18

Merge remote-tracking branch 'upstream/master' into fix-nested-merge-npe

8bbd2f2

adjust

9e4b931

clintropolis merged commit d61bd7f into apache:master Apr 11, 2023

clintropolis deleted the fix-nested-merge-npe branch April 11, 2023 03:39

clintropolis mentioned this pull request Apr 11, 2023

[Backport] fix bug in nested v4 format merger from refactoring #14060

Merged

clintropolis added a commit to clintropolis/druid that referenced this pull request Apr 11, 2023

fix bug in nested v4 format merger from refactoring (apache#14053)

ecf312a

clintropolis added a commit that referenced this pull request Apr 13, 2023

fix bug in nested v4 format merger from refactoring (#14053) (#14060)

fe7cb2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug in nested v4 format merger from refactoring #14053

fix bug in nested v4 format merger from refactoring #14053

clintropolis commented Apr 10, 2023

imply-cheddar Apr 10, 2023

imply-cheddar Apr 10, 2023

clintropolis Apr 11, 2023

		if (!(format instanceof NestedCommonFormatColumn.Format)
		&& !(format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)) {

fix bug in nested v4 format merger from refactoring #14053

fix bug in nested v4 format merger from refactoring #14053

Conversation

clintropolis commented Apr 10, 2023

Description

imply-cheddar Apr 10, 2023

Choose a reason for hiding this comment

imply-cheddar Apr 10, 2023

Choose a reason for hiding this comment

clintropolis Apr 11, 2023

Choose a reason for hiding this comment