Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With Order Support for Memory Tables #1401

Closed
Closed
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1375a21
add with_order option
mertak-synnada Aug 6, 2024
a30ad56
format code
mertak-synnada Aug 6, 2024
21cd618
Merge branch 'sqlparser-rs:main' into feature/with-order-for-table
mertak-synnada Aug 8, 2024
320f7e1
change type into Vec<Vec<OrderExpr>>
mertak-synnada Aug 8, 2024
d3d12e9
Merge remote-tracking branch 'origin/feature/with-order-for-table' in…
mertak-synnada Aug 8, 2024
e85062f
skip with order type while creating options
mertak-synnada Aug 8, 2024
acbcd2c
add with_order option
mertak-synnada Aug 6, 2024
79e0e18
format code
mertak-synnada Aug 6, 2024
3fb59e9
change type into Vec<Vec<OrderExpr>>
mertak-synnada Aug 8, 2024
122f273
skip with order type while creating options
mertak-synnada Aug 8, 2024
cc3c4c7
Merge remote-tracking branch 'origin/feature/with-order-for-table' in…
Aug 23, 2024
10a7b90
add debug statements
Aug 23, 2024
c8a6f14
remove debug statements
Aug 26, 2024
ed705c9
format code
Aug 26, 2024
8fb1958
remove redundant counter op
Aug 26, 2024
1c2eb75
remove Option from with_order attribute
Aug 27, 2024
15d84d7
Merge branch 'refs/heads/main' into feature/with-order-for-table
Aug 27, 2024
0f31847
add Datafusion link to with_order clause
mertak-synnada Aug 29, 2024
7d222ff
format code
mertak-synnada Aug 29, 2024
1575534
Merge branch 'refs/heads/main' into feature/with-order-for-table
mertak-synnada Sep 2, 2024
e878149
fix merge conflict
mertak-synnada Sep 2, 2024
17ba408
add supports_with_order function to dialect
mertak-synnada Sep 3, 2024
565e637
move test to sqlparser_common.rs
mertak-synnada Sep 3, 2024
147cd9e
add fmt to with order statement
mertak-synnada Sep 3, 2024
5a97a46
remove redundant println!
mertak-synnada Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/ast/dml.rs
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,9 @@ pub struct CreateTable {
pub default_charset: Option<String>,
pub collation: Option<String>,
pub on_commit: Option<OnCommit>,
/// Datafusion "WITH ORDER" clause
/// <https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table/>
pub with_order: Vec<Vec<OrderByExpr>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include here a link to the docs where this syntax comes from? (e.g. similar to the on_cluster below)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was inspired by Datafusion's WITH ORDER statement, so I've added the link and tests, thank you!

Copy link
Contributor

@alamb alamb Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I did some research and could not find any existing databases that use the WITH ORDER syntax.

ClickHouse does seem to have a way to specify order as part of a CREATE TABLE statement: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree

@mertak-synnada rather than introducing special DataFusion only syntax support, what do you think about extending DataFusion to use the existing ClickHouse syntax?

CREATE TABLE ....
ORDER BY <...>

We might have to change the GenericDialect or add some feature to the Dialect trait to permit other dialects to parse such syntax, but it would be nice to align to some other existing syntax rather than creating something new

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two possible ways forward:

  1. We can change DF to support ORDER BY after CREATE and deprecate WITH ORDER. This way we would have Clickhouse-like syntax for both memory and external tables, and WITH ORDER syntax only for external tables. The latter asymmetry is a little weird but we can explain it away by deprecating WITH ORDER.
  2. We can support both ORDER BY and WITH ORDER for both memory and external tables in DF (they would be synonyms). We'd need to add WITH ORDER for ordinary CREATEs here though.

I am OK with both, with a slight preference to 2 because DF has had WITH ORDER for a while now. What do you think?

Copy link
Contributor

@alamb alamb Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case we should distinguish between DataFusion's needs and sqlparser-rs's needs

If we are going to introduce WITH ORDER to sqlparser-rs I think it should follow the existing pattern of being connected to a specific dialect (in this case DatafusionDialect).

Thus I would personally vote for option 1 (change DF) so we avoid creating a new SQL dialect (both in this crate and in general) as much as possible

  1. We can change DF to support ORDER BY after CREATE and deprecate WITH ORDER. This way we would have Clickhouse-like syntax for both memory and external tables, and WITH ORDER syntax only for external tables. The latter asymmetry is a little weird but we can explain it away by deprecating WITH ORDER.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - let's do it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include test cases demonstrating the behavior introduced by the changes?

/// ClickHouse "ON CLUSTER" clause:
/// <https://clickhouse.com/docs/en/sql-reference/distributed-ddl/>
pub on_cluster: Option<Ident>,
Expand Down
14 changes: 12 additions & 2 deletions src/ast/helpers/stmt_create_table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ use sqlparser_derive::{Visit, VisitMut};
use super::super::dml::CreateTable;
use crate::ast::{
ClusteredBy, ColumnDef, CommentDef, Expr, FileFormat, HiveDistributionStyle, HiveFormat, Ident,
ObjectName, OnCommit, OneOrManyWithParens, Query, RowAccessPolicy, SqlOption, Statement,
TableConstraint, TableEngine, Tag, WrappedCollection,
ObjectName, OnCommit, OneOrManyWithParens, OrderByExpr, Query, RowAccessPolicy, SqlOption,
Statement, TableConstraint, TableEngine, Tag, WrappedCollection,
};
use crate::parser::ParserError;

Expand Down Expand Up @@ -75,6 +75,7 @@ pub struct CreateTableBuilder {
pub on_commit: Option<OnCommit>,
pub on_cluster: Option<Ident>,
pub primary_key: Option<Box<Expr>>,
pub with_order: Vec<Vec<OrderByExpr>>,
pub order_by: Option<OneOrManyWithParens<Expr>>,
pub partition_by: Option<Box<Expr>>,
pub cluster_by: Option<WrappedCollection<Vec<Ident>>>,
Expand Down Expand Up @@ -136,6 +137,7 @@ impl CreateTableBuilder {
max_data_extension_time_in_days: None,
default_ddl_collation: None,
with_aggregation_policy: None,
with_order: vec![],
with_row_access_policy: None,
with_tags: None,
}
Expand Down Expand Up @@ -273,6 +275,11 @@ impl CreateTableBuilder {
self
}

pub fn with_order(mut self, with_order: Vec<Vec<OrderByExpr>>) -> Self {
self.with_order = with_order;
self
}

pub fn order_by(mut self, order_by: Option<OneOrManyWithParens<Expr>>) -> Self {
self.order_by = order_by;
self
Expand Down Expand Up @@ -397,6 +404,7 @@ impl CreateTableBuilder {
max_data_extension_time_in_days: self.max_data_extension_time_in_days,
default_ddl_collation: self.default_ddl_collation,
with_aggregation_policy: self.with_aggregation_policy,
with_order: self.with_order,
with_row_access_policy: self.with_row_access_policy,
with_tags: self.with_tags,
})
Expand Down Expand Up @@ -452,6 +460,7 @@ impl TryFrom<Statement> for CreateTableBuilder {
max_data_extension_time_in_days,
default_ddl_collation,
with_aggregation_policy,
with_order,
with_row_access_policy,
with_tags,
}) => Ok(Self {
Expand Down Expand Up @@ -496,6 +505,7 @@ impl TryFrom<Statement> for CreateTableBuilder {
default_ddl_collation,
with_aggregation_policy,
with_row_access_policy,
with_order,
with_tags,
volatile,
}),
Expand Down
68 changes: 63 additions & 5 deletions src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5589,7 +5589,23 @@ impl<'a> Parser<'a> {
let clustered_by = self.parse_optional_clustered_by()?;
let hive_formats = self.parse_hive_formats()?;
// PostgreSQL supports `WITH ( options )`, before `AS`
let with_options = self.parse_options(Keyword::WITH)?;
let mut with_options: Vec<SqlOption> = vec![];
let mut order_exprs: Vec<Vec<OrderByExpr>> = vec![];
if self.parse_keyword(Keyword::WITH) {
if self.parse_keyword(Keyword::ORDER) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.parse_keyword(Keyword::WITH) {
if self.parse_keyword(Keyword::ORDER) {
if self.parse_keywords(&[Keyword::WITH, Keyword::ORDER]) {

Also I think we can keep the postgres path separate since this feature doesn't apply to that dialect? - so that this part remains the same

        let with_options = self.parse_options(Keyword::WITH)?;

We can make the fuctionality conditional with e.g. if self.dialect.supports_with_order() or similar method

self.expect_token(&Token::LParen)?;
loop {
order_exprs.push(vec![self.parse_order_by_expr()?]);
if !self.consume_token(&Token::Comma) {
self.expect_token(&Token::RParen)?;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the parse_comma_separated() method be reused for this functionality?

} else {
with_options = self.parse_options_in_parentheses()?;
}
}

let table_properties = self.parse_options(Keyword::TBLPROPERTIES)?;

let engine = if self.parse_keyword(Keyword::ENGINE) {
Expand Down Expand Up @@ -5738,6 +5754,7 @@ impl<'a> Parser<'a> {
.cluster_by(create_table_config.cluster_by)
.options(create_table_config.options)
.primary_key(primary_key)
.with_order(order_exprs)
.strict(strict)
.build())
}
Expand Down Expand Up @@ -6369,15 +6386,19 @@ impl<'a> Parser<'a> {

pub fn parse_options(&mut self, keyword: Keyword) -> Result<Vec<SqlOption>, ParserError> {
if self.parse_keyword(keyword) {
self.expect_token(&Token::LParen)?;
let options = self.parse_comma_separated(Parser::parse_sql_option)?;
self.expect_token(&Token::RParen)?;
Ok(options)
self.parse_options_in_parentheses()
} else {
Ok(vec![])
}
}

fn parse_options_in_parentheses(&mut self) -> Result<Vec<SqlOption>, ParserError> {
self.expect_token(&Token::LParen)?;
let options = self.parse_comma_separated(Parser::parse_sql_option)?;
self.expect_token(&Token::RParen)?;
Ok(options)
}

pub fn parse_options_with_keywords(
&mut self,
keywords: &[Keyword],
Expand Down Expand Up @@ -12364,6 +12385,43 @@ mod tests {
);
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think this would probably not be the ideal place for the test to live - if the feature isn't bound to any dialect, or if it uses a dialect method to guard the feature, we can add the tests to common e.g. https://github.com/sqlparser-rs/sqlparser-rs/blob/main/tests/sqlparser_common.rs#L4405

fn parse_create_table_with_order() {
let sql = "CREATE TABLE test (foo INT, bar VARCHAR(256)) WITH ORDER (foo ASC)";
let ast = Parser::parse_sql(&GenericDialect {}, sql).unwrap();
match ast[0].clone() {
Statement::CreateTable(CreateTable { with_order, .. }) => {
assert_eq!(
with_order,
vec![vec![OrderByExpr {
expr: Expr::Identifier(Ident::from("foo")),
asc: Some(true),
nulls_first: None,
with_fill: None,
}]]
);
}
_ => unreachable!(),
}

let sql = "CREATE TABLE test (foo INT, bar VARCHAR(256)) WITH ORDER (bar DESC NULLS FIRST)";
let ast = Parser::parse_sql(&GenericDialect {}, sql).unwrap();
match ast[0].clone() {
Statement::CreateTable(CreateTable { with_order, .. }) => {
assert_eq!(
with_order,
vec![vec![OrderByExpr {
expr: Expr::Identifier(Ident::from("bar")),
asc: Some(false),
nulls_first: Some(true),
with_fill: None,
}]]
);
}
_ => unreachable!(),
}
}

#[test]
fn test_parse_multipart_identifier_positive() {
let dialect = TestedDialects {
Expand Down
1 change: 1 addition & 0 deletions tests/sqlparser_duckdb.rs
Original file line number Diff line number Diff line change
Expand Up @@ -751,6 +751,7 @@ fn test_duckdb_union_datatype() {
max_data_extension_time_in_days: Default::default(),
default_ddl_collation: Default::default(),
with_aggregation_policy: Default::default(),
with_order: Default::default(),
with_row_access_policy: Default::default(),
with_tags: Default::default()
}),
Expand Down
1 change: 1 addition & 0 deletions tests/sqlparser_postgres.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4864,6 +4864,7 @@ fn parse_trigger_related_functions() {
with_aggregation_policy: None,
with_row_access_policy: None,
with_tags: None,
with_order: vec![],
}
);

Expand Down