Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suport unparsing LogicalPlan::Window to SQL #10664

Closed
Tracked by #8661
alamb opened this issue May 25, 2024 · 4 comments · Fixed by #10767
Closed
Tracked by #8661

Suport unparsing LogicalPlan::Window to SQL #10664

alamb opened this issue May 25, 2024 · 4 comments · Fixed by #10767
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented May 25, 2024

Is your feature request related to a problem or challenge?

Part of #9726 to complete the LogialPlan --> SQL conversion

Converting LogicalPlan back to SQL is valuable for several usecases such as using DataFusion to programatically create SQL -- see the plan_to_sql.rs example

Describe the solution you'd like

Support converting SQL with window functions like this:

SELECT first_value OVER (x ORDER BY y) FROM foo;

Describe alternatives you've considered

The basic pattern is this (see #10371 for an example):

  1. Implement the LogicalPlan --> AST reverse code in Unparser::plan_to_sql(source link)
  2. Add a test in roundtrip_statement in sql_integration.rs source link

Note you can run the tests like

cargo test -p datafusion-sql -- roundtrip_statement

Additional context

I think this is a good first issue as the pattern is well established and there are explicit instructions

@yyy1000
Copy link
Contributor

yyy1000 commented May 28, 2024

I'd like to take it if no one picks it today. :)

@yyy1000
Copy link
Contributor

yyy1000 commented May 29, 2024

I tried to implement for a while and I have a question,
for selecting a window function, there would be an projection.

> explain SELECT first_value(y) OVER (ORDER BY x) FROM foo;
| logical_plan  | Projection: FIRST_VALUE(foo.y) ORDER BY [foo.x ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                                                                                                                                                                                                                                                                                                                                                                 |
|               |   WindowAggr: windowExpr=[[FIRST_VALUE(foo.y) ORDER BY [foo.x ASC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]                                                                                                                                                                                                                                                                                                                                                 |
|               |     TableScan: foo projection=[x, y]       |      

However, for an aggregate, there's not.

> explain select count(x) from foo;
+---------------+------------------------------------------------+
| plan_type     | plan                                           |
+---------------+------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[]], aggr=[[COUNT(foo.x)]] |
|               |   TableScan: foo projection=[x]                |
+---------------+------------------------------------------------+

In my understanding, window function is a special case of aggregate function, why would these two have different plan struct?

@alamb
Copy link
Contributor Author

alamb commented May 29, 2024

In my understanding, window function is a special case of aggregate function, why would these two have different plan struct?

I think window funtions and aggregate functions are different. Somewhat confusingly is that you can use aggregate functions as window functions

Window functions always have an OVER clause.

So for example, this is a normal aggregate

select count(x) from foo;

However, this is count being used as a window aggregate:

select count(x) OVER (PARTITION BY y) from foo;

@devinjdangelo
Copy link
Contributor

I took a stab at this in #10767. It can be handled similarly to how we currently handle LogicalPlan::Aggregate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants