-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuple sketch SQL support #13887
Tuple sketch SQL support #13887
Conversation
Based on workaround found here: mapstruct/mapstruct#1241
Includes tests which exercise the use-case outlined on: apache#13819
...apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java
Fixed
Show fixed
Hide fixed
Also removed unused constant.
@abhishekagarwal87 kindly check this |
Hmm... not sure if there's anything to be done about this failing CI check:
https://github.com/apache/druid/actions/runs/4358587476/jobs/7648241472 |
@vtlim Hi, are there any further changes required before this PR can be accepted and merged? Is a review by more maintainers required? Thanks! |
@frankgrimes97 Can you add the two new sketch aggregator names to the .spelling file? The tests catch the new names as a misspelling so we just need to add them to the dictionary. For example after L735 in https://github.com/apache/druid/blob/master/website/.spelling#L735. Add a line for ARRAY_OF_DOUBLES_SKETCH and one for ARRAY_OF_DOUBLES_SKETCH_METRICS_SUM_ESTIMATE I only reviewed the docs side, so there will need to be a reviewer on the code side. cc @abhishekagarwal87 or @rohangarg |
...g/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchMergeAggregator.java
Outdated
Show resolved
Hide resolved
.../query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java
Show resolved
Hide resolved
.../query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSetBaseOperatorConversion.java
Outdated
Show resolved
Hide resolved
This avoids unnecessary changes to ArrayOfDoublesSketchMergeAggregator
...apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java
Fixed
Show resolved
Hide resolved
...apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java
Fixed
Show resolved
Hide resolved
Hi, can this now be merged or are more changes required? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool functionality! Left a number of comments on how best to integrate the functionality into SQL, given the way that SQL normally works.
...apache/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregator.java
Fixed
Show resolved
Hide resolved
...he/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java
Outdated
Show resolved
Hide resolved
...he/druid/query/aggregation/datasketches/tuple/sql/ArrayOfDoublesSketchSqlAggregatorTest.java
Show resolved
Hide resolved
…tch-sql-rebased-on-pr-13819
Some APIs were recently deprecated as part of: apache#13904 apache#13914 Some CI checks were warning about this so better to address it now instead of needing to do so in a subsequent PR.
- Add missing INTERSECT/NOT/UNION descriptions - Re-order new functions alphabetically in documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes generally look good to me. I'd consider changing the names to start with TUPLE_DOUBLES_
rather than ARRAY_OF_DOUBLES_SKETCH_
. (Like TUPLE_DOUBLES_SKETCH
, TUPLE_DOUBLES_NOT
, etc.) It's a bit less typing and IMO clearer.
Curious what people think.
One issue with changing the naming only for the SQL functions is that it would be inconsistent with the naming in the native Druid functions and Apache Data Sketches codebase/documentation and so might lead to confusion. It might be worth noting that the naming of the Data Sketch SQL functions doesn't seem entirely consistent across the board.
We could perhaps consider the following:
|
That's a good point about |
…tch-sql-rebased-on-pr-13819
ARRAY_OF_DOUBLES_SKETCH -> DS_TUPLE_DOUBLES ARRAY_OF_DOUBLES_SKETCH_* -> DS_TUPLE_DOUBLES_*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after CI passes.
@gianm The failing checks seem unrelated to my changes. |
Indeed. I am merging this PR since failures are indeed unrelated. |
This PR is a follow-up to #13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.
Release note
Add SQL functions for creating and operating on Tuple sketches
This PR has: