-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for accurate distinct count on attributes through config #178
Conversation
Existing config supports configuring the distinct count aggregation function for a given scope for all attributes. But we might need accurate distinct count on some attributes if the default is set to the approximate function. Defined a way to configure such attributes.
Codecov Report
@@ Coverage Diff @@
## main #178 +/- ##
============================================
- Coverage 78.53% 78.51% -0.02%
- Complexity 854 856 +2
============================================
Files 80 80
Lines 3597 3608 +11
Branches 406 407 +1
============================================
+ Hits 2825 2833 +8
- Misses 597 599 +2
- Partials 175 176 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This comment has been minimized.
This comment has been minimized.
I'd love to dive in to the use case a bit if you want to set up a call - remember the reason this went in in the first place was with enough the performance of exact distinct count wasn't satisfactory. So will that be better for some columns rather than others? If not, then we're just opening up a config to let us shoot ourselves in the foot. |
@aaron-steinfeld So the context is we have many mismatches in grouping queries which surface to UI, even for small numbers.. In certain cases, this might be critical. |
...in/java/org/hypertrace/core/query/service/pinot/converters/PinotFunctionConverterConfig.java
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
if (config.hasPath(DISTINCT_COUNT_AGGREGATION_OVERRIDES)) { | ||
Config overridesConfig = config.getConfig(DISTINCT_COUNT_AGGREGATION_OVERRIDES); | ||
this.distinctCountAggOverrides = | ||
overridesConfig.entrySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keySet, not using the whole entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is config object - no keySet available, only entrySet..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah - ConfigObject
has keySet (it implements map), Config
does not. So .root().keySet()
would work. FWIW, the two are equivalent for a one level config but I would say keySet version better aligns with your intention - getting the immediate keys, rather than what entrySet does (recurses into all children and produces path expressions mapped to any descendent value).
} | ||
|
||
public PinotFunctionConverterConfig() { | ||
this(ConfigFactory.empty()); | ||
} | ||
|
||
public String getDistinctCountFunction(String arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit rename arg to what the string is supposed to represent (columnName)
Existing config supports configuring the distinct count aggregation function for a given scope for all attributes. But we might need accurate distinct count on some attributes if the default is set to the approximate function. Defined a way to configure such attributes.