Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topn with granularity regression fixes #17565

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Dec 13, 2024

Description

changes:

  • fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from rework cursor creation #16533
  • fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from rework cursor creation #16533

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

changes:
* fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from apache#16533
* fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from apache#16533
@clintropolis clintropolis added this to the 31.0.1 milestone Dec 13, 2024
@@ -275,7 +275,7 @@ private static boolean canUsePooledAlgorithm(
final int numBytesToWorkWith = resultsBuf.capacity();
final int numValuesPerPass = numBytesPerRecord > 0 ? numBytesToWorkWith / numBytesPerRecord : cardinality;

return numValuesPerPass <= cardinality;
return numValuesPerPass >= cardinality;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comparision change caught my eye...the new cmp seem to be more in line with the intention - but it seems to me that
depending on some conditions here ; this cardinality's value might be -1 in some cases
is that ok to answer true if cardinality is -1 ?

note: if cardinality == -1 ; then both the old and the new logic returns true - is that ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this is fair, it is not really a problem in practice because currently the column capabilities should not report as dictionary encoded if the cardinality is -1, so we return false earlier in the method before we get here, but this seems worth explicitly checking for just in case.

@@ -245,6 +245,11 @@ private static boolean canUsePooledAlgorithm(
final int numBytesPerRecord
)
{
if (cardinality < 0) {
// unknown cardinality doesn't work with the pooled algorith which requires an exact count of dictionary ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

algorithm (spelling)

@@ -391,7 +391,7 @@ public boolean hasNext()
if (delegate != null && delegate.hasNext()) {
return true;
} else {
if (!cursor.isDone() && granularizer.currentOffsetWithinBucket()) {
if (granularizer.currentOffsetWithinBucket()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the cursor.isDone() here removed simply because it's redundant?

Copy link
Member Author

@clintropolis clintropolis Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i noticed granularizer.currentOffsetWithinBucket also checks isDone so it seemed nicer this way, this didn't cause any issues, just noticed while i was looking over stuff

aggregator.init(resultsBuffer, position);
aggregator.aggregate(resultsBuffer, position);
positionToAllocate += aggregatorSize;
if (granularizer.currentOffsetWithinBucket()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the granularizer.currentOffsetWithinBucket() check added here (& the other prototypes) to enable skipping of empty granular buckets?

Copy link
Member Author

@clintropolis clintropolis Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, this was a bug, i did it as an if statement wrapping the loop so we wouldn't need to check currentOffsetWithinBucket() twice per loop since the granularizer advance methods also call currentOffsetWithinBucket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants