Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plumb location info through the parser #757

Open
Tracked by #161
ankrgyl opened this issue Dec 12, 2022 · 5 comments
Open
Tracked by #161

Plumb location info through the parser #757

ankrgyl opened this issue Dec 12, 2022 · 5 comments

Comments

@ankrgyl
Copy link
Contributor

ankrgyl commented Dec 12, 2022

This is a follow up to our discussion in #710 (comment) (and also referenced in #301, #179, and #524.

Now that location info is plumbed through the tokenizer, we can leverage these locations to:

  1. Produce better error messages in the parser (by referencing the offending token, and its location)
  2. Propagate location information outside of the parser, so that libraries that consume the parse tree can reference the original location of an AST node.

From my understanding, the first task should be easier, since we wouldn't need to update the parse tree data structure (perhaps outside of a few edge cases). The main change should be updating the parser to use TokenWithLocation and then improving the ParserError messages to reference these locations. I would suggest changing the type of TokenizerError and ParserError to be richer than a string, and pass along the location information directly, so that consumers of the error can access it directly.

The second task will be more difficult, however, since we'll need to either change all of the tree nodes to be structs instead of enums (to host the location information), or change each enum branch to be a struct with location information. I think the second approach will likely be easier to change incrementally, and for users to refactor against, and we can add a trait that homogenizes how users of the tree access the location info.

@alamb let me know your thoughts on the above.

@alamb
Copy link
Contributor

alamb commented Dec 13, 2022

Thank you @ankrgyl -- I really appreciate you pushing this project along.

I would suggest changing the type of TokenizerError and ParserError to be richer than a string, and pass along the location information directly, so that consumers of the error can access it directly.

I agree this sound like a good plan

I think the second approach will likely be easier to change incrementally

I am not quite sure what this would look like -- could you share a draft PR or rust pseudo code?

cc @AugustoFKL

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Dec 13, 2022

I am not quite sure what this would look like -- could you share a draft PR or rust pseudo code?

Absolutely, here is a quick patch which shows what that would look like for Ident (note: this does not compile -- I did not change the parser yet). The basic idea is that each individual arm of the Expr enum is responsible for tracking its location, instead of changing Expr to be a struct.

ankur-m1:~/projects/sqlparser-rs ankur$ git diff
diff --git a/src/ast/mod.rs b/src/ast/mod.rs
index 7f3d154..14b2bc6 100644
--- a/src/ast/mod.rs
+++ b/src/ast/mod.rs
@@ -22,6 +22,8 @@ use core::fmt;
 #[cfg(feature = "serde")]
 use serde::{Deserialize, Serialize};

+use crate::tokenizer::Location;
+
 pub use self::data_type::{
     CharLengthUnits, CharacterLength, DataType, ExactNumberInfo, TimezoneInfo,
 };
@@ -91,6 +93,9 @@ pub struct Ident {
     /// The starting quote if any. Valid quote characters are the single quote,
     /// double quote, backtick, and opening square bracket.
     pub quote_style: Option<char>,
+
+    /// The location of the identifier in the original SQL query.
+    pub location: Option<Location>,
 }

 impl Ident {
@@ -102,12 +107,13 @@ impl Ident {
         Ident {
             value: value.into(),
             quote_style: None,
+            location: None,
         }
     }

     /// Create a new quoted identifier with the given quote and value. This function
     /// panics if the given quote is not a valid quote character.
-    pub fn with_quote<S>(quote: char, value: S) -> Self
+    pub fn with_quote<S>(quote: char, value: S, location: Option<Location>) -> Self
     where
         S: Into<String>,
     {
@@ -115,6 +121,7 @@ impl Ident {
         Ident {
             value: value.into(),
             quote_style: Some(quote),
+            location,
         }
     }
 }
@@ -124,6 +131,7 @@ impl From<&str> for Ident {
         Ident {
             value: value.to_string(),
             quote_style: None,
+            location: None,
         }
     }
 }
diff --git a/src/tokenizer.rs b/src/tokenizer.rs
index 68dd652..a6fd0b2 100644
--- a/src/tokenizer.rs
+++ b/src/tokenizer.rs
@@ -301,7 +301,7 @@ impl fmt::Display for Whitespace {
 }

 /// Location in input string
-#[derive(Debug, Eq, PartialEq, Clone)]
+#[derive(Debug, Eq, PartialEq, Clone, PartialOrd, Ord, Hash)]
 pub struct Location {
     /// Line number, starting from 1
     pub line: u64,

@alamb
Copy link
Contributor

alamb commented Dec 14, 2022

The basic idea is that each individual arm of the Expr enum is responsible for tracking its location, instead of changing Expr to be a struct.

Got it -- this approach makes sense to me

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Jan 5, 2023

@alamb I've been working on this as part of a side project I'm building and wanted to put up a PR for your feedback: #790.

There are a few other things in the PR (e.g. flipping stuff to pub) that I'd be happy to clean up, but wanted to get your feedback on the overall approach. The PR as-is adds location info to Idents but not the rest of the operators yet.

@AugustoFKL
Copy link
Contributor

@alamb @ankrgyl sorry for the delay, the last month, I barely was able to work, not my best phase.

Either way, there are some part of the code that is really complex for me, and I can't spend that much time here, so I end up only getting the general idea, sorry.

LMK once there are more advances (I already approved the last PR), so I can take a look as well

jakeswenson added a commit to jakeswenson/sqlparser-rs that referenced this issue Mar 19, 2023
lustefaniak pushed a commit to getsynq/sqlparser-rs that referenced this issue Jun 20, 2023
lustefaniak pushed a commit to getsynq/sqlparser-rs that referenced this issue Sep 25, 2023
lustefaniak pushed a commit to getsynq/sqlparser-rs that referenced this issue Sep 25, 2023
lustefaniak pushed a commit to getsynq/sqlparser-rs that referenced this issue Sep 25, 2023
lustefaniak pushed a commit to getsynq/sqlparser-rs that referenced this issue Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants