Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance object name path segments #1539

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ayman-sigma
Copy link
Contributor

@ayman-sigma ayman-sigma commented Nov 20, 2024

Right now ObjectName is just list of identifiers. We parse each object name path segment as a string identifier. Some dialects has more rich types for each path segment. This PR rework the object name to allow different types for each path segment.

Examples this PR will make it easier to support:

  1. Databricks IDENTIFIER clause. Example: SELECT * FROM myschema.IDENTIFIER(:mytab). The (:mytab) is wrongly parsed right now as TableFunctionArgs. More details: https://docs.databricks.com/en/sql/language-manual/sql-ref-names-identifier-clause.html
  2. Snowflake double-dot notation. Example SELECT * FROM db..table_name. This indicates that use of default schema PUBLIC. With this PR, we can use DefaultSchema variant for the path segment instead of using empty identifier. More details: https://docs.snowflake.com/en/sql-reference/name-resolution#resolution-when-schema-omitted-double-dot-notation

Most changes are mechanical except couple of locations I commented on below, in addition to the ast/mod.rs.

@@ -4294,7 +4312,9 @@ impl<'a> Parser<'a> {
let mut data_type = self.parse_data_type()?;
if let DataType::Custom(n, _) = &data_type {
// the first token is actually a name
name = Some(n.0[0].clone());
match n.0[0].clone() {
ObjectNamePart::Identifier(ident) => name = Some(ident),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we start adding more to the ObjectNamePart enum, we will return parsing error for the other variants here.

@@ -10778,7 +10798,7 @@ impl<'a> Parser<'a> {
self.expect_token(&Token::LParen)?;
let aggregate_functions = self.parse_comma_separated(Self::parse_aliased_function_call)?;
self.expect_keyword(Keyword::FOR)?;
let value_column = self.parse_object_name(false)?.0;
let value_column = self.parse_period_separated_identifiers()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving this is a column name, we should parse it as period-separated identifiers and not as Object name.

@mvzink
Copy link
Contributor

mvzink commented Nov 20, 2024

I think ObjectNamePart::Wildcard or something would be better than what I did in #1538, so this seems like a good idea to me.

src/ast/mod.rs Show resolved Hide resolved
src/parser/mod.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ayman-sigma! left some minor comments, this looks good to me overall

src/ast/mod.rs Show resolved Hide resolved
src/parser/mod.rs Outdated Show resolved Hide resolved
src/parser/mod.rs Outdated Show resolved Hide resolved
src/parser/mod.rs Outdated Show resolved Hide resolved
@ayman-sigma ayman-sigma requested a review from iffyio November 24, 2024 20:33
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! cc @alamb

@ayman-sigma ayman-sigma force-pushed the ayman/improveObjectNameParts branch from e22e3d8 to 176cf13 Compare November 26, 2024 02:39
@alamb
Copy link
Contributor

alamb commented Nov 30, 2024

Hi @ayman-sigma this PR appears to have some conflicts. Is there any chance you can resolve them so we can merge it in?

Thank you!

@ayman-sigma ayman-sigma force-pushed the ayman/improveObjectNameParts branch from 29f2610 to 6f05bcf Compare December 2, 2024 03:50
@ayman-sigma ayman-sigma force-pushed the ayman/improveObjectNameParts branch from 6f05bcf to 7791973 Compare December 2, 2024 03:52
@ayman-sigma
Copy link
Contributor Author

Hi @ayman-sigma this PR appears to have some conflicts. Is there any chance you can resolve them so we can merge it in?

Thank you!

@alamb, Done.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started trying to update DataFusion to use this change -- it turns out to be fairly invasive.

You can try here: apache/datafusion#13546

(the issue is that we have a bunch of handling of ObjectName --> Indents code).

I think we can make the DataFusion code better / easier to follow

@alamb
Copy link
Contributor

alamb commented Dec 11, 2024

Given the potential for non trivial downstream conflicts due to this change (look at the list of conflicts it has already collected) I would like to consider it for the next release

@ayman-sigma
Copy link
Contributor Author

Given the potential for non trivial downstream conflicts due to this change (look at the list of conflicts it has already collected) I would like to consider it for the next release

Sounds good. Thanks @alamb!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants