Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle loading of complex types into CometVector correctly in iceberg_compat scans #1279

Merged

Conversation

parthchandra
Copy link
Contributor

NativeColumnReader was reading all types into a CometPlainVector which is correct only for primitive types. The PR updates the code to use NativeUtil and follow the same logic as CometExecIterator.
This addresses 12 unit test failures involving ColumnarShuffle on struct types.

@parthchandra
Copy link
Contributor Author

The PR also replaces the use of ColumnDescriptor (again valid only for primitive types) with parquet.schema.Type in NativeBatchReader

// release the previous dictionary vector and create a new one.
Dictionary arrowDictionary = importer.getProvider().lookup(dictionaryEncoding.getId());
CometPlainVector dictionaryVector =
new CometPlainVector(arrowDictionary.getVector(), useDecimal128, isUuid);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid only for primitive types

@@ -266,7 +268,8 @@ public void init() throws URISyntaxException, IOException {

//// Create Column readers
List<ColumnDescriptor> columns = requestedSchema.getColumns();
int numColumns = columns.size();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This yields leaf fields which is not what we want.

@@ -469,19 +473,23 @@ private int loadNextBatch() throws Throwable {
importer = new CometSchemaImporter(ALLOCATOR);

List<ColumnDescriptor> columns = requestedSchema.getColumns();
for (int i = 0; i < columns.size(); i++) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also operates on leaf fields which, again, is not what we want.

@parthchandra
Copy link
Contributor Author

@andygrove @mbutrovich

@andygrove andygrove merged commit 6cafe28 into apache:comet-parquet-exec Jan 14, 2025
74 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants