Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Supporting SAMPLE parsing #1566

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

seve-martinez
Copy link
Contributor

Snowflake, Databricks, Postgres, and others all support some from of table sampling.

This commonly takes on the form

SELECT ...
FROM ... 
{ SAMPLE | TABLESAMPLE } [ samplingMethod ]

where

samplingMethod ::= { { BERNOULLI | ROW } ( { <probability> | <num> ROWS } ) |
                     { SYSTEM | BLOCK } ( <probability> ) [ { REPEATABLE | SEED } ( <seed> ) ] }

Different dialects support one, the other, or both sample keywords and only some of the methods. The current PR follows the snowflake paradigm.

Outstanding questions

  • This method is not currently called anywhere. Review of the Parser code would make me think it should be part of the parse_table_factor() logic since it always trails a relation. However this would then require each variant of TableFactor that supports SAMPLE to have a placeholder for it. Is that the right approach?

@iffyio
Copy link
Contributor

iffyio commented Dec 6, 2024

Hi @seve-martinez! It sounds like the parsing can be done while parsing the select body instead? Probably here before looking for a WHERE clause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants