-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix expression column capabilities to not report dictionary encoded unless input is string #16577
fix expression column capabilities to not report dictionary encoded unless input is string #16577
Conversation
…nless input is string
@@ -277,6 +277,9 @@ public ColumnCapabilities inferColumnCapabilities(@Nullable ColumnType outputTyp | |||
// until query time | |||
return ColumnCapabilitiesImpl.copyOf(underlyingCapabilities) | |||
.setType(ColumnType.STRING) | |||
// this is sad, but currently dictionary encodedness is tied to string | |||
// selectors and sad stuff happens if the input type isn't string | |||
.setDictionaryEncoded(underlyingCapabilities.is(ValueType.STRING)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be underlyingCapabilities.isTrue() && underlyingCapabilities.is(ValueType.STRING)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, imo copyOf
is not the right thing to use here, logically speaking. I don't think we actually want to carry through all caps from the underlying selector, just a few specific ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be underlyingCapabilities.isTrue() && underlyingCapabilities.is(ValueType.STRING)?
oops yes.
also, imo copyOf is not the right thing to use here, logically speaking. I don't think we actually want to carry through all caps from the underlying selector, just a few specific ones.
Ah yea that is reasonable i suppose, i think originally more were preserved than not so i went with the copy. I guess hasBitmapIndexes
isn't widely used on query side anymore since stuff can just ask ColumnIndexSupplier
for an index and it will return if it has it, though i suppose it should still be preserved just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, we were explicitly marking hasBitmapIndexes as false, though #15585 allows expressions to use them in many cases, so it probably should be allowed to pass through. hasBitmapIndexes
and hasSpatialIndexes
should probably be removed from ColumnCapabilities
, and the write side stuff that currently uses it be moved to ColumnFormat
, though not going to do that on this PR.
@@ -303,10 +303,6 @@ public void read(ByteBuffer buffer, ColumnBuilder builder, ColumnConfig columnCo | |||
bitmapSerdeFactory, | |||
byteOrder | |||
); | |||
ColumnCapabilitiesImpl capabilitiesBuilder = builder.getCapabilitiesBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just removing unused code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i had put in the description:
While here also noticed that COMPLEX was needlessly reporting itself as dictionary encoded, which isn't quite true. While the nested field columns are dictionary encoded, the json values themselves are not, so functions like TO_JSON_STRING could also run into this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, I missed that the capabilitiesBuilder
is from builder
itself. OK, this change makes sense then.
Description
Fixes an issue that occurs after #16366 where expressions with single input dictionary encoded columns that are not strings but have a string output incorrectly use the string dictionary encoded vector selector.
The fix for now is to make setting the capabilities of the virtual column to be dictionary encoded only if the input type is string. In the future if we make dictionary encoded selectors that are not coupled with handling strings then we could remove this constraint, but until then, this is the safest option to avoid incorrectly using string dimension selectors.
While here also noticed that
COMPLEX<json>
was needlessly reporting itself as dictionary encoded, which isn't quite true. While the nested field columns are dictionary encoded, the json values themselves are not, so functions likeTO_JSON_STRING
could also run into this problem.This PR has: