Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnest functionality for Druid #13268

Merged
merged 38 commits into from
Dec 3, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4e686a8
Moving all unnest cursor code atop refactored code for unnest
somu-imply Oct 27, 2022
f354dd7
Updating unnest cursor
somu-imply Oct 28, 2022
f9ac767
Removing dedup and fixing up some null checks
somu-imply Oct 28, 2022
9f98b12
AllowList changes
somu-imply Oct 31, 2022
31612fc
Fixing some NPEs
somu-imply Nov 1, 2022
035c423
Using bitset for allowlist
somu-imply Nov 1, 2022
6a9b0d7
Updating the initialization only when cursor is in non-done state
somu-imply Nov 2, 2022
ba55890
Updating code to skip rows not in allow list
somu-imply Nov 3, 2022
7333da5
Adding a flag for cases when first element is not in allowed list
somu-imply Nov 3, 2022
90073f6
Updating for a null in allowList
somu-imply Nov 4, 2022
f6fc1aa
Splitting unnest cursor into 2 subclasses
somu-imply Nov 5, 2022
bbc66f5
Intercepting some apis with columnName for new unnested column
somu-imply Nov 5, 2022
e146ebb
Adding test cases and renaming some stuff
somu-imply Nov 6, 2022
bb22a92
checkstyle fixes
somu-imply Nov 6, 2022
1908242
Moving to an interface for Unnest
somu-imply Nov 6, 2022
3de1161
handling null rows in a dimension
somu-imply Nov 6, 2022
d1a884a
Updating cursors after comments part-1
somu-imply Nov 16, 2022
401a9b2
Addressing comments and adding some more tests
somu-imply Nov 16, 2022
bce6ffe
Reverting a change to ScanQueryRunner and improving a comment
somu-imply Nov 17, 2022
240afe2
removing an unused function
somu-imply Nov 17, 2022
9821c53
Updating cursors after comments part 2
somu-imply Nov 17, 2022
b7ab781
Merge remote-tracking branch 'upstream/master' into unnest_v2
somu-imply Nov 17, 2022
5fd3dd7
One last fix for review comments
somu-imply Nov 18, 2022
576dbcc
Making some functions private, deleting some comments, adding a test …
somu-imply Nov 18, 2022
e56b0b2
Adding an exception for a case
somu-imply Nov 29, 2022
bb66e59
Closure for unnest data source
somu-imply Nov 30, 2022
8f71b81
Adding some javadocs
somu-imply Nov 30, 2022
e312f91
One minor change in makeDimSelector of columnarCursor
somu-imply Nov 30, 2022
0e3ede4
Updating an error message
somu-imply Nov 30, 2022
edff9cd
Update processing/src/main/java/org/apache/druid/segment/DimensionUnn…
somu-imply Nov 30, 2022
321536f
Unnesting on virtual columns was missing an object array, adding that…
somu-imply Dec 1, 2022
5e8c38a
Updating exceptions to use UOE
somu-imply Dec 1, 2022
2ecf23b
Merge remote-tracking branch 'upstream/master' into unnest_v2
somu-imply Dec 1, 2022
81b674a
Renamed files, added column capability test on adapter, return statem…
somu-imply Dec 2, 2022
bd467dc
Handling for null values in dim selector
somu-imply Dec 2, 2022
30c2897
Fixing a NPE for null row
somu-imply Dec 2, 2022
44c9955
Updating capabilities
somu-imply Dec 2, 2022
5659760
Updating capabilities
somu-imply Dec 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@
import java.util.concurrent.atomic.AtomicLong;
import java.util.function.Function;

/**
* The data source for representing an unnest operation.
*
* An unnest data source has the following:
* a base data source which is to be unnested
* the column name of the MVD which will be unnested
* the name of the column that will hold the unnested values
* and an allowlist serving as a filter of which values in the MVD will be unnested.
*/
public class UnnestDataSource implements DataSource
{
private final DataSource base;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,31 @@
import java.util.LinkedHashSet;
import java.util.List;

/**
* The cursor to help unnest MVDs without dictionary encoding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't specific to multi-value dimensions, since this also handles ARRAY typed selectors

* Consider a segment has 2 rows
* ['a', 'b', 'c']
* ['d', 'e']
*
* The baseCursor points to the row ['a', 'b', 'c']
* while the unnestCursor with each call of advance() moves over individual elements.
*
* unnestCursor.advance() -> 'a'
* unnestCursor.advance() -> 'b'
* unnestCursor.advance() -> 'c'
* unnestCursor.advance() -> 'd' (advances base cursor first)
* unnestCursor.advance() -> 'e'
*
*
* The allowSet if available helps skip over elements which are not in the allowList by moving the cursor to
* the next available match.
*
* The index reference points to the index of each row that the unnest cursor is accessing through currentVal
* The index ranges from 0 to the size of the list in each row which is held in the unnestListForCurrentRow
*
* The needInitialization flag sets up the initial values of unnestListForCurrentRow at the beginning of the segment
*
*/
public class ColumnarValueUnnestCursor implements Cursor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some javadocs here about this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadocs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitpick, but why not just call this thing what it is doing, e.g. UnnestColumnValueSelectorCursor? Same thing with the other one, UnnestDimensionSelectorCursor

{
private final Cursor baseCursor;
Expand Down Expand Up @@ -69,7 +94,11 @@ public ColumnSelectorFactory getColumnSelectorFactory()
@Override
public DimensionSelector makeDimensionSelector(DimensionSpec dimensionSpec)
{
throw new UnsupportedOperationException("Dimension selector not applicable for column value selector");
if (!outputName.equals(dimensionSpec.getDimension())) {
return baseColumSelectorFactory.makeDimensionSelector(dimensionSpec);
}
throw new UnsupportedOperationException(
"Dimension selector not applicable for column value selector for column " + outputName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More design nits:

  1. We have a UOE that bundles String.format into the building of the exception use that.
  2. We encase interpolated values in [] to help differentiate things like extra spaces.

}

@Override
Expand Down Expand Up @@ -203,6 +232,11 @@ public void reset()
baseCursor.reset();
}

/**
* This method populates the objects when the base cursor moves to the next row
*
* @param firstRun flag to populate one time object references to hold values for unnest cursor
*/
private void getNextRow(boolean firstRun)
{
currentVal = this.columnValueSelector.getObject();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,46 @@
import java.util.BitSet;
import java.util.LinkedHashSet;

/**
* The cursor to help unnest MVDs with dictionary encoding.
* Consider a segment has 2 rows
* ['a', 'b', 'c']
* ['d', 'c']
*
* Considering dictionary encoding, these are represented as
*
* 'a' -> 0
* 'b' -> 1
* 'c' -> 2
* 'd' -> 3
*
* The baseCursor points to the row of IndexedInts [0, 1, 2]
* while the unnestCursor with each call of advance() moves over individual elements.
*
* advance() -> 0 -> 'a'
* advance() -> 1 -> 'b'
* advance() -> 2 -> 'c'
* advance() -> 3 -> 'd' (advances base cursor first)
* advance() -> 2 -> 'c'
*
* Total 5 advance calls above
*
* The allowSet, if available, helps skip over elements that are not in the allowList by moving the cursor to
* the next available match. The hashSet is converted into a bitset (during initialization) for efficiency.
* If allowSet is ['c', 'd'] then the advance moves over to the next available match
*
* advance() -> 2 -> 'c'
* advance() -> 3 -> 'd' (advances base cursor first)
* advance() -> 2 -> 'c'
*
* Total 3 advance calls in this case
*
* The index reference points to the index of each row that the unnest cursor is accessing
* The indexedInts for each row are held in the indexedIntsForCurrentRow object
*
* The needInitialization flag sets up the initial values of indexedIntsForCurrentRow at the beginning of the segment
*
*/
public class DimensionUnnestCursor implements Cursor
{
private final Cursor baseCursor;
Expand Down Expand Up @@ -188,6 +228,9 @@ public IdLookup idLookup()
};
}

/*
This ideally should not be called. If called delegate using the makeDimensionSelector
*/
@Override
public ColumnValueSelector makeColumnValueSelector(String columnName)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@
import java.util.LinkedHashSet;
import java.util.Optional;

/**
* The segment reference for the Unnest Data Source.
* The input column name, output name and the allowSet follow from {@link org.apache.druid.query.UnnestDataSource}
*/
public class UnnestSegmentReference implements SegmentReference
{
private static final Logger log = new Logger(UnnestSegmentReference.class);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@
import java.util.LinkedHashSet;
import java.util.Objects;

/**
* This class serves as the Storage Adapter for the Unnest Segment and is responsible for creating the cursors
* If the column is dictionary encoded it creates {@link DimensionUnnestCursor} else {@link ColumnarValueUnnestCursor}
* These cursors help navigate the segments for these cases
*/
public class UnnestStorageAdapter implements StorageAdapter
{
private final StorageAdapter baseAdapter;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@
import javax.annotation.Nullable;
import java.util.List;


/**
* A Cursor that iterates over a user created list.
* This is used to test the base cursor of an UnnestCursor.
* Usages can be found in tests of {@link ColumnarValueUnnestCursor} in {@link ColumnarValueUnnestCursorTest}
* However this cannot help with {@link DimensionUnnestCursor}.
* Tests for {@link DimensionUnnestCursor} are done alongside tests for {@link UnnestStorageAdapterTest}
*/
public class ListCursor implements Cursor
{
List<Object> baseList;
Expand Down