Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for accurate distinct count on attributes through config #178

Merged
merged 2 commits into from
Jan 11, 2023

Conversation

singhalprerana
Copy link
Contributor

Existing config supports configuring the distinct count aggregation function for a given scope for all attributes. But we might need accurate distinct count on some attributes if the default is set to the approximate function. Defined a way to configure such attributes.

Existing config supports configuring the distinct count aggregation function for a given scope for all attributes.
But we might need accurate distinct count on some attributes if the default is set to the approximate function.
Defined a way to configure such attributes.
@codecov
Copy link

codecov bot commented Dec 15, 2022

Codecov Report

Merging #178 (f51ccb7) into main (38dcb3f) will decrease coverage by 0.01%.
The diff coverage is 87.50%.

@@             Coverage Diff              @@
##               main     #178      +/-   ##
============================================
- Coverage     78.53%   78.51%   -0.02%     
- Complexity      854      856       +2     
============================================
  Files            80       80              
  Lines          3597     3608      +11     
  Branches        406      407       +1     
============================================
+ Hits           2825     2833       +8     
- Misses          597      599       +2     
- Partials        175      176       +1     
Flag Coverage Δ
integration 78.51% <87.50%> (-0.02%) ⬇️
unit 71.60% <87.50%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...pinot/converters/PinotFunctionConverterConfig.java 85.18% <84.61%> (-8.15%) ⬇️
...rvice/pinot/converters/PinotFunctionConverter.java 94.80% <100.00%> (-0.07%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions

This comment has been minimized.

@aaron-steinfeld
Copy link
Contributor

I'd love to dive in to the use case a bit if you want to set up a call - remember the reason this went in in the first place was with enough the performance of exact distinct count wasn't satisfactory. So will that be better for some columns rather than others? If not, then we're just opening up a config to let us shoot ourselves in the foot.

@singhalprerana
Copy link
Contributor Author

singhalprerana commented Dec 23, 2022

I'd love to dive in to the use case a bit if you want to set up a call - remember the reason this went in in the first place was with enough the performance of exact distinct count wasn't satisfactory. So will that be better for some columns rather than others? If not, then we're just opening up a config to let us shoot ourselves in the foot.

@aaron-steinfeld So the context is we have many mismatches in grouping queries which surface to UI, even for small numbers.. In certain cases, this might be critical.

@github-actions

This comment has been minimized.

if (config.hasPath(DISTINCT_COUNT_AGGREGATION_OVERRIDES)) {
Config overridesConfig = config.getConfig(DISTINCT_COUNT_AGGREGATION_OVERRIDES);
this.distinctCountAggOverrides =
overridesConfig.entrySet().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: keySet, not using the whole entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is config object - no keySet available, only entrySet..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah - ConfigObject has keySet (it implements map), Config does not. So .root().keySet() would work. FWIW, the two are equivalent for a one level config but I would say keySet version better aligns with your intention - getting the immediate keys, rather than what entrySet does (recurses into all children and produces path expressions mapped to any descendent value).

}

public PinotFunctionConverterConfig() {
this(ConfigFactory.empty());
}

public String getDistinctCountFunction(String arg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit rename arg to what the string is supposed to represent (columnName)

@singhalprerana singhalprerana merged commit 4525f0c into main Jan 11, 2023
@singhalprerana singhalprerana deleted the distinct-count/config branch January 11, 2023 07:05
@github-actions
Copy link

Unit Test Results

  38 files  ±0    38 suites  ±0   11s ⏱️ -1s
257 tests ±0  257 ✔️ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 4525f0c. ± Comparison against base commit 38dcb3f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants