Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Calcite native library that parses sql and returns a binary substrait representation #21

Closed
jacques-n opened this issue Sep 9, 2021 · 5 comments
Assignees

Comments

@jacques-n
Copy link
Contributor

The idea here is to provide a reasonable way for people to give users immediate access to the high quality SQL parsing of Calcite with minimal effort. We'd use GraalVM for AOT compilation and start with a fairly simple function similar to substrait parse(string)

It would be nice if part of this effort was to expose this library with a command line tool that could be piped to other future tools. (For example, create an additional cli that will take a plan and return the results with Datafusion.)

A big question is what catalog to expose in an example cli. Some ideas:

  • Require a user to provide --table = for Parquet files (similar to how local paths are explored in Docker)
  • Pass in table declarations --table t1=(int c1, int c2) and treat the read objects as opaque (for later binding use by an execution system)?
  • Other ideas?

A second fun thing to add would be a separate library that is plan in and out and applies a list of optimization rules using one of the existing Calcite optimizers. Lower priority than the sql parser initially but could be intersting to evaluate different optimization patterns and start exposing nice Calcite interfaces for things like python/rust/etc.

@andygrove
Copy link

In the DataFusion CLI we chose to support the Hive CREATE EXTERNAL TABLE syntax for defining tables based on files.

CREATE EXTERNAL TABLE foo STORED AS PARQUET LOCATION '...';

CREATE EXTERNAL TABLE bar (a INT, b INT) STORED AS CSV LOCATION '...';

@jacques-n jacques-n self-assigned this Sep 18, 2021
@jacques-n
Copy link
Contributor Author

Some updates here. Made some progress and will post a wip PR soon (hopefully). It is proving non-trivial to get Calcite to compile within a GraalVM native image. I've opened up CALCITE-4786 to track work that should make that easier. Good news is when I've gotten things to work, I've seen sub-millisecond sql > rel > substrait conversions for very simple plans.

@cpcloud
Copy link
Contributor

cpcloud commented May 13, 2022

Can this be closed since we have substrait-java now?

@jacques-n
Copy link
Contributor Author

Moved to substrait-io/substrait-java. Closing here.

@ingomueller-net
Copy link
Contributor

@cpcloud, @jacques-n: Could you elaborate a bit on what substrait-java does and how it addresses the use cases defined in CALCITE-4786?

(I'd love to have a well-designed SQL parser for standard SQL, like Calcite, but without involving a JVM for the user, i.e., unlike Calcite unless through native image. I was very hopeful at the beginning of this issue but don't understand the connection to substrait-java.)

rkondakov pushed a commit to rkondakov/substrait that referenced this issue Nov 21, 2023
…ubstrait-io#21)

* Update substrait submodule to point to latest release
* Add AggregateFunctionInvocation to pojo model to track distinct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants