-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add system fields to input sources. #15276
Conversation
Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this.
The IT
I am not sure what's going on here but I don't think this is related to this PR. Should be safe to merge without this test passing. |
I pushed a commit that simply merges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
* Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.
Main changes:
The
SystemField
enum defines system fields__file_uri
,__file_path
,and
__file_bucket
. They are associated with each input entity.The
SystemFieldInputSource
interface can be added to any InputSourceto make it system-field-capable. It sets up serialization of a list
of configured
systemFields
in the JSON form of the input source, andprovides a method getSystemFieldValue for computing the value of each
system field. Cloud object, HDFS, HTTP, and Local now have this.
The
SystemFieldInputSource
isn't strictly necessary, since each input source could have implemented system fields internally in its own way. However, I think the interface is valuable because it helps ensure system fields are dealt with consistently, and because it provides a path to exposing system fields in SQL in a nice way. I think that ideally, they would be referenceable by name, but not participate in star expansion. AFAICT this would require a new Calcite feature. Relevant Calcite mailing list thread: https://lists.apache.org/thread/pnf3bx3jlrmv7q1q7jhwhsylrw4q5t20Until then, system fields can be used in SQL without the planner's awareness: with
EXTERN
, addsystemFields
to theinputSource
section, and add the system field names to thesignature
section.