-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. #12142
Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. #12142
Conversation
…ly pub interfaces
…nvolved in type coercion
…ess type coercion
Note to @alamb -- exposing the rewriter api (as suggested here), enables the union type-coercion with less of the schema shenangians and overhead (vs the full analyzer). Gives the user options. |
/// Union two [`LogicalPlan`]s. | ||
/// | ||
/// Constructs the UNION plan, but does not perform type-coercion. Therefore the | ||
/// subtree expressions will not be properly typed until the optimizer pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, does that mean if there is no optimizer phase the query will crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe this depends on how users construct their own logical plans? And if they are already ensuring type correctness in their own code?
Our logical plan construction relied upon the previous behavior of the union()
api, which was immediate type coercion. IMO - this was not an explicit api contract, rather an implementation detail. Regardless, we use the typed-coerced union schema & expressions when adding other logical nodes (e.g. limit, sort, group by). Once the behavior of the api changed, we started building incorrect plans -- which produced incorrect results and in one case errored.
I filed this ticket as a doc enhancement, not a bug, since I didn't think(?) type coercion was part of the api contract. My hope was to (a) make it clear when/how unions can be type coerced, and (b) make public the APIs to do so. In other words, make it easier for users to ensure type correctness before (or without) the optimizer pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I have phrased the docs differently? Or is there another approach we should take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to clarify. The "incorrect plans" is because the new union()
behavior doesn't type-coerce the expressions and it takes the left node's schema. When we built logical plans with union + gap filling (adding casted scalar nulls), we started having missing fields etc when inspecting the union before constructing the next node.
Maybe the latter should be a bug?
/// or alternatively, merge the union input schema using [`coerce_union_schema`] and | ||
/// apply the expression rewrite with [`coerce_plan_expr_for_schema`]. | ||
/// | ||
/// [`TypeCoercionRewriter::coerce_union`]: https://docs.rs/datafusion-optimizer/latest/datafusion_optimizer/analyzer/type_coercion/struct.TypeCoercionRewriter.html#method.coerce_union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @wiedld
There are couple of projects using DF without optimizer, so I'm thinking should we add the coerced union recommendation to the error message....
@comphead -- I'm not sure which error message you are referring to? 😅 🙏🏼 |
Ignore it, for downstream projects not using the optimizer it would be difficult to understand why UNION crashed, but I believe the documentation is enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -56,6 +56,8 @@ use datafusion_expr::{ | |||
Projection, ScalarUDF, Union, WindowFrame, WindowFrameBound, WindowFrameUnits, | |||
}; | |||
|
|||
/// Performs type coercion by determining the schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Which issue does this PR close?
Closes #12105
Rationale for this change
We construct our own logical plans for a SQL-derivative languages (e.g. InfluxQL). The construction of logic plans using UNION requires that upon construction of the union plan node, we type-coerce the expressions prior to building the rest of the plan. If we don't perform this expression rewrite, then our consuming LIMIT, GROUP BY, ORDER BY, and aggregates will not be built properly.
We do not think our use case is unique, and therefore we are (a) exposing the APIs need to perform this coercion and (b) providing docs.
What changes are included in this PR?
coerce_union_schema
TypeCoercion
TypeCoercionRewriter
coerce_union_schema
.union()
API, which does the logical plan construction , as to the how/when/why to use type coercion.Are these changes tested?
No code changes.
Are there any user-facing changes?
No. Only more exposed interfaces to use.