Added support for accurate distinct count on attributes through config #178

singhalprerana · 2022-12-15T10:24:25Z

Existing config supports configuring the distinct count aggregation function for a given scope for all attributes. But we might need accurate distinct count on some attributes if the default is set to the approximate function. Defined a way to configure such attributes.

codecov · 2022-12-15T10:28:13Z

Codecov Report

Merging #178 (f51ccb7) into main (38dcb3f) will decrease coverage by 0.01%.
The diff coverage is 87.50%.

@@             Coverage Diff              @@
##               main     #178      +/-   ##
============================================
- Coverage     78.53%   78.51%   -0.02%     
- Complexity      854      856       +2     
============================================
  Files            80       80              
  Lines          3597     3608      +11     
  Branches        406      407       +1     
============================================
+ Hits           2825     2833       +8     
- Misses          597      599       +2     
- Partials        175      176       +1

Flag	Coverage Δ
integration	`78.51% <87.50%> (-0.02%)`	⬇️
unit	`71.60% <87.50%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...pinot/converters/PinotFunctionConverterConfig.java	`85.18% <84.61%> (-8.15%)`	⬇️
...rvice/pinot/converters/PinotFunctionConverter.java	`94.80% <100.00%> (-0.07%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

aaron-steinfeld · 2022-12-15T13:45:56Z

I'd love to dive in to the use case a bit if you want to set up a call - remember the reason this went in in the first place was with enough the performance of exact distinct count wasn't satisfactory. So will that be better for some columns rather than others? If not, then we're just opening up a config to let us shoot ourselves in the foot.

singhalprerana · 2022-12-23T06:11:54Z

I'd love to dive in to the use case a bit if you want to set up a call - remember the reason this went in in the first place was with enough the performance of exact distinct count wasn't satisfactory. So will that be better for some columns rather than others? If not, then we're just opening up a config to let us shoot ourselves in the foot.

@aaron-steinfeld So the context is we have many mismatches in grouping queries which surface to UI, even for small numbers.. In certain cases, this might be critical.

...in/java/org/hypertrace/core/query/service/pinot/converters/PinotFunctionConverterConfig.java

aaron-steinfeld · 2023-01-10T21:08:37Z

...in/java/org/hypertrace/core/query/service/pinot/converters/PinotFunctionConverterConfig.java

+    if (config.hasPath(DISTINCT_COUNT_AGGREGATION_OVERRIDES)) {
+      Config overridesConfig = config.getConfig(DISTINCT_COUNT_AGGREGATION_OVERRIDES);
+      this.distinctCountAggOverrides =
+          overridesConfig.entrySet().stream()


nit: keySet, not using the whole entry

this is config object - no keySet available, only entrySet..

ah - ConfigObject has keySet (it implements map), Config does not. So .root().keySet() would work. FWIW, the two are equivalent for a one level config but I would say keySet version better aligns with your intention - getting the immediate keys, rather than what entrySet does (recurses into all children and produces path expressions mapped to any descendent value).

aaron-steinfeld · 2023-01-10T21:10:09Z

...in/java/org/hypertrace/core/query/service/pinot/converters/PinotFunctionConverterConfig.java

  }

  public PinotFunctionConverterConfig() {
    this(ConfigFactory.empty());
  }
+
+  public String getDistinctCountFunction(String arg) {


nit rename arg to what the string is supposed to represent (columnName)

github-actions · 2023-01-11T07:12:45Z

Unit Test Results

  38 files ±0   38 suites ±0 11s ⏱️ -1s
257 tests ±0 257 ✔️ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 4525f0c. ± Comparison against base commit 38dcb3f.

singhalprerana requested review from kotharironak, laxmanchekka and a team December 15, 2022 10:24

This comment has been minimized.

Sign in to view

aaron-steinfeld reviewed Dec 29, 2022

View reviewed changes

...in/java/org/hypertrace/core/query/service/pinot/converters/PinotFunctionConverterConfig.java Outdated Show resolved Hide resolved

Added generic overrides map

f51ccb7

singhalprerana requested review from aaron-steinfeld and avinashkolluru January 10, 2023 04:37

This comment has been minimized.

Sign in to view

aaron-steinfeld approved these changes Jan 10, 2023

View reviewed changes

singhalprerana merged commit 4525f0c into main Jan 11, 2023

singhalprerana deleted the distinct-count/config branch January 11, 2023 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for accurate distinct count on attributes through config #178

Added support for accurate distinct count on attributes through config #178

singhalprerana commented Dec 15, 2022

codecov bot commented Dec 15, 2022 •

edited

Loading

This comment has been minimized.

aaron-steinfeld commented Dec 15, 2022

singhalprerana commented Dec 23, 2022 •

edited

Loading

This comment has been minimized.

aaron-steinfeld Jan 10, 2023

singhalprerana Jan 11, 2023

aaron-steinfeld Jan 11, 2023

aaron-steinfeld Jan 10, 2023

github-actions bot commented Jan 11, 2023

Added support for accurate distinct count on attributes through config #178

Added support for accurate distinct count on attributes through config #178

Conversation

singhalprerana commented Dec 15, 2022

codecov bot commented Dec 15, 2022 • edited Loading

Codecov Report

This comment has been minimized.

aaron-steinfeld commented Dec 15, 2022

singhalprerana commented Dec 23, 2022 • edited Loading

This comment has been minimized.

aaron-steinfeld Jan 10, 2023

Choose a reason for hiding this comment

singhalprerana Jan 11, 2023

Choose a reason for hiding this comment

aaron-steinfeld Jan 11, 2023

Choose a reason for hiding this comment

aaron-steinfeld Jan 10, 2023

Choose a reason for hiding this comment

github-actions bot commented Jan 11, 2023

Unit Test Results

codecov bot commented Dec 15, 2022 •

edited

Loading

singhalprerana commented Dec 23, 2022 •

edited

Loading