-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(substrait): modular substrait consumer #13803
base: main
Are you sure you want to change the base?
Conversation
/// Can be used to consume standard Substrait without user-defined extensions | ||
pub struct DefaultSubstraitConsumer { | ||
extensions: Arc<Extensions>, | ||
state: Arc<SessionState>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if Arc is actually needed here, I just got tired of fighting the borrow-checker. I'm open to suggestions, as all we really need is to be able to get references to Extensions and SessionState.
rels: &[Rel], | ||
state: &dyn SubstraitPlanningState, | ||
extensions: &Extensions, | ||
is_all: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting from here, a lot of the changes look like this. Replacing state
and extensions
with a single consumer
argument that subsumes them both, and then threading the consumer
into calls.
state: &dyn SubstraitPlanningState, | ||
rel: &Rel, | ||
extensions: &Extensions, | ||
pub async fn from_project_rel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the from_*
are extractions of existing functionality. It is easier to see this if you hide whitespaces changes in the diff.
ad28bec
to
04491a6
Compare
@Blizzara I would appreciate if you could take a look and let me know if this approach seems reasonable, or if I've architecture astronauted myself. |
04491a6
to
b8e435f
Compare
Thanks for the ping - I took a look, and overall looks really good to me! I left couple thoughts (I see you already saw some), but nothing major. I already have usecase in mind for this internally, so looking forward to getting it merged 😄 One thing is we should consider splitting the consumer.rs into smaller files. It could be a separate PR to keep the diff smaller, but maybe moving all the static "consumer_X" functions into like |
let plan = self | ||
.state | ||
.serializer_registry() | ||
.deserialize_logical_plan(&ext_detail.type_url, &ext_detail.value)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One drawback of the current implementation is that it's abusing the type_url
field in proto::Any
(which is quite clearly specified). Working with Substrait objects directly would eliminate the need for hacks like encoding node names as Any::type_url.
As a side note, this may sound hypocritical, given that I made it even worse with my PR by setting it to table names as well. But I only did it for consistency. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, this may sound hypocritical, given that I made it even worse #13772
I don't think it's hypocritical, you were aiming for consistency and I'm just being much more aggressive with my changes and breaking the API.
Include SerializerRegistry based handlers for Extension Relations in the DefaultSubstraitConsumer
I think it makes sense to do that, and I'm happy to own it, but I would prefer to do that as a followup. I structured the |
Yup, I totally agree! I looked through the latest changes, looks even better now, just some more comments re SubstraitPlanningState but otherwise this looks very good by me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: When reviewing this PR I would recommend hiding whitespace changes.
Which issue does this PR close?
Initial work for #13318
Rationale for this change
Improves the reusability of the Substrait consumer for users that utilize user-defined extensions and types.
I should note that this PR represents an approach to the problem, but is not necessarily the only way we could handle it. I'm very open to feedback on the approach.
What changes are included in this PR?
from_substrait_*
functions (i.e.from_filter_rel
,from_scalar_function
) to aid re-use.SubstraitConsumer
trait has been introduced with default methods that handle relations and expression using the above functions.SubstraitConsumer
trait also includes methods for handling extension relations (i.econsume_extension_single
) along with user-defined types (consume_user_defined_type
,consume_user_defined_literal
).&impl SubstraitConsumer
.SubstraitPlanningState
has been subsumed into theSubstraitConsumer
.Are these changes tested?
These changes refactor existing code and leverage their tests.
A doc comment has been added to show how the
SubstraitConsumer
trait can be implemented by users.Are there any user-facing changes?
Yes.
The top-level
from_substrait_plan(state, plan)
function now takes a&SessionState
as its first argument, instead of&dyn SubstraitPlanningState
. The same is true forfrom_substrait_extended_expr
. These two functions are the primary external facing APIs for intiation a conversion.A number of functions like
from_substrait_rel
,from_substrait_rex
, etc have had their API changed to make the first argument&impl SubstraitConsumer
, and remove allstate: &dyn SubstraitPlanningState
andextensions: &Extensions
arguments.