-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distinct_on to dataframe api #11012
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @Omega359
I left a suggestion to improve the example maybe with some comments. The semantics of distinct_on are a little mind blowing
I merged this branch up from main to resolve a merge conflict |
/// # async fn main() -> Result<()> { | ||
/// let ctx = SessionContext::new(); | ||
/// let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await? | ||
/// // Return a single row (a, b) for each distinct value of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if DISTINCT supports all types, incl binary, complex, etc? if its not we should be mentioning it in the doc, and double check it returns a respective error instead of crash/corruption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good idea -- I filed #11052 to track
However, I think it is not required for this PR so I merged it in
* Add distinct_on to dataframe api apache#11011 * cargo fmt * Update datafusion/core/src/dataframe/mod.rs as per reviewer feedback Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
* Add distinct_on to dataframe api apache#11011 * cargo fmt * Update datafusion/core/src/dataframe/mod.rs as per reviewer feedback Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
* Add distinct_on to dataframe api apache#11011 * cargo fmt * Update datafusion/core/src/dataframe/mod.rs as per reviewer feedback Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #11011
Rationale for this change
Add distinct_on to the dataframe api.
What changes are included in this PR?
code, tests, documentation
Are these changes tested?
Yes via tests in dataframe/mod.rs
Are there any user-facing changes?
dataframe API was amended.