MSQ generates tombstones honoring granularity specified in a `REPLACE` query. #15243

abhishekrb19 · 2023-10-24T17:47:12Z

Unlike native batch ingestion, MSQ is strict about segment allocation -- it will throw an InsertCannotAllocateSegment fault if the requested segment granularity doesn't align with the allocated segment granularity.

This patch modifies the segment generation logic in MSQ REPLACE to:

Honor the segment granularity specified in the query for all finite-interval tombstones upto 5000 segments (Limits.MAX_PARTITION_BUCKETS) - e.g., [2000, 2010). Once the limit exceeds, a TooManyBucketsFault fault is returned similar to data segments.
For infinite-interval tombstones, retain the current behavior of generating one tombstone each for positive and negative eternity - e.g., [-Inf, 2000), [2010, Inf).

This behavior is consistent with how MSQ generally works re honoring segment granularity in the query. It's also similar to how native batch ingestion generates tombstones.

Description

Previously MSQ replace would generate "irregular" tombstones as an optimization so the controller doesn't run OOM. The new code only treats the infinite interval ones special, which will be cleaned up by the coordinator in a follow up change.

This change semi-reverts #13893, uses the tests added in the PR and adds a new condition to account for the infinite-interval tombstones.

Release note

MSQ REPLACE queries will generate tombstone segments honoring the segment granularity specified in the query. If a query generates more than 5000 tombstones, it'll return an MSQ TooManyBucketsFault to the user, similar to the behavior with data segments.

Key changed/added classes in this PR

TombstoneHelper.java
TombstoneHelperTest.java

This PR has:

This change tweaks to only account for the infinite-interval tombstones. For finite-interval tombstones, the MSQ query granualrity will be used which is consistent with how MSQ works.

gianm

In general I like the idea of having the tombstone behavior be better aligned between native and MSQ. I'm not sure if there's a reason that it needs to be different.

Couple comments:

Looks like TombstoneHelper has computeTombstones used by native batch in replace mode, and computeTombstoneSegmentsForReplace used by MSQ in replace mode. Is there a practical difference in the logic? If so— is it a difference we need to preserve, or could both systems use the same method? It looks like the method used by MSQ was added in Generate tombstones when running MSQ's replace #13706. Perhaps the PR has some discussion that would shed light on these questions.
Ideally MSQ would return a TooManyBuckets fault if there are more than 5000 tombstones (StageDefinition.PARTITION_STATS_MAX_BUCKETS). To support this I suggest adding a limit parameter to TombstoneHelper, so there's some way for a caller to tell if the limit is exceeded. This prevents OOM in situations where there are too many time chunks. It also aligns with the behavior of an ingest that generates actual data as opposed to tombstones.

abhishekrb19 · 2023-10-26T02:52:47Z

Thank you for the review, @gianm!

Yep, I agree. It looks like @LakshSingla did some analysis re reusing computeTombstones- Generate tombstones when running MSQ's replace #13706 (comment). I also looked into it, and I think with some refactoring like moving IntervalUtils into core druid and passing in specific arguments, this should be possible. I'd prefer to revisit the refactor as a separate PR from the functional change in this patch. For now, I've added javadocs to the two methods the native batch and MSQ engines use.
Good idea! I updated the code per your suggestion and added unit tests in MSQReplaceTest.java and TombstoneHelperTest.java.

LakshSingla

Since TooManyBucketsFault now also includes tombstones, does the user-facing message for the error still hold, or do we want it to be more explanatory?
Expressing "tombstones" in the message doesn't seem neat, so I think we can mention the number of buckets generated & the number of empty buckets in the message to help the user & admin debug faster, without saying what tombstones are.

...vice/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/TombstoneHelper.java

extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQReplaceTest.java

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

…o MsqFaultsTest

abhishekrb19 · 2023-11-03T02:46:53Z

@LakshSingla re:

Since TooManyBucketsFault now also includes tombstones, does the user-facing message for the error still hold, or do we want it to be more explanatory?
Expressing "tombstones" in the message doesn't seem neat, so I think we can mention the number of buckets generated & the number of empty buckets in the message to help the user & admin debug faster, without saying what tombstones are.

The user-facing error message would still be (also in MSQFaultsTest):

TooManyBuckets: Too many partition buckets (max = 5,000); try breaking your query up into smaller queries or using a wider PARTITIONED BY

Did you mean the cause in the stacktrace? If so, I think it's better to be direct and say "tombstone" in the cause since it's an existing concept explained in the Druid docs. The stacktrace for completeness:

org.apache.druid.msq.indexing.error.MSQException: TooManyBuckets: Too many partition buckets (max = 5,000); try breaking your query up into smaller queries or using a wider PARTITIONED BY
	at org.apache.druid.msq.exec.ControllerImpl.publishAllSegments(ControllerImpl.java:1437)
	at org.apache.druid.msq.exec.ControllerImpl.publishSegmentsIfNeeded(ControllerImpl.java:1667)
	at org.apache.druid.msq.exec.ControllerImpl.runTask(ControllerImpl.java:404)
	at org.apache.druid.msq.exec.ControllerImpl.run(ControllerImpl.java:338)
	at org.apache.druid.msq.indexing.MSQControllerTask.runTask(MSQControllerTask.java:248)
	at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:169)
	at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:229)
	at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:154)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.druid.java.util.common.IAE: Cannot add more tombstone buckets than [5000].
	at org.apache.druid.indexing.common.task.batch.parallel.TombstoneHelper.computeTombstoneIntervalsForReplace(TombstoneHelper.java:220)
	at org.apache.druid.indexing.common.task.batch.parallel.TombstoneHelper.computeTombstoneSegmentsForReplace(TombstoneHelper.java:149)
	at org.apache.druid.msq.exec.ControllerImpl.publishAllSegments(ControllerImpl.java:1423)
	... 13 more

LakshSingla · 2023-11-03T06:24:15Z

Did you mean the cause in the stacktrace

I meant the user facing message. We can enhance the message to include more information about what the user is doing, and how many buckets they are generating.

I think it's better to be direct and say "tombstone" in the cause since it's an existing concept explained in the Druid docs.

I guess so, though tombstones isn't something that we expect a user to be familiar with.

… handling.

abhishekrb19 · 2023-11-03T18:16:55Z

@LakshSingla, thank you for the review! Regarding mentioning the total # of buckets a query would generate without the max buckets limit, the computation could lead to an OOM. The max = 5,000 in the exception message provides a safe permissible upper bound limit to a user, which I think suffices.

I believe I've addressed your comments. Can you please take another look?

extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQFaultsTest.java

...vice/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/TombstoneHelper.java

…` query. (apache#15243) * MSQ generates tombstones honoring the query's granularity. This change tweaks to only account for the infinite-interval tombstones. For finite-interval tombstones, the MSQ query granualrity will be used which is consistent with how MSQ works. * more tests and some cleanup. * checkstyle * comment edits * Throw TooManyBuckets fault based on review; add more tests. * Add javadocs for both methods on reconciling the methods. * review: Move testReplaceTombstonesWithTooManyBucketsThrowsException to MsqFaultsTest * remove unused imports. * Move TooManyBucketsException to indexing package for shared exception handling. * lower max bucket for tests and fixup count * Advance and count the iterator. * checkstyle

…`REPLACE` query. (apache#15243)" This reverts commit 2e79fd5.

MSQ generates tombstones honoring the query's granularity.

ce5b2e1

This change tweaks to only account for the infinite-interval tombstones. For finite-interval tombstones, the MSQ query granualrity will be used which is consistent with how MSQ works.

github-actions bot added Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 24, 2023

abhishekrb19 marked this pull request as draft October 24, 2023 17:47

abhishekrb19 marked this pull request as ready for review October 24, 2023 19:00

abhishekrb19 added 2 commits October 24, 2023 13:02

more tests and some cleanup.

789652e

checkstyle

1a93ec6

abhishekrb19 changed the title ~~MSQ generates tombstones honoring the query's granularity.~~ MSQ generates tombstones honoring granularity specified in a REPLACE query. Oct 25, 2023

comment edits

9e56df7

gianm reviewed Oct 25, 2023

View reviewed changes

abhishekrb19 added 2 commits October 25, 2023 18:49

Throw TooManyBuckets fault based on review; add more tests.

6d40331

Add javadocs for both methods on reconciling the methods.

f2fbf17

abhishekrb19 mentioned this pull request Oct 30, 2023

Clean up duty for non-overlapping eternity tombstones #15281

Merged

6 tasks

LakshSingla reviewed Nov 2, 2023

View reviewed changes

abhishekrb19 added 3 commits November 2, 2023 19:06

Merge branch 'master' into uniform_msq_tombstones

a5f2950

review: Move testReplaceTombstonesWithTooManyBucketsThrowsException t…

035e5e9

…o MsqFaultsTest

remove unused imports.

27d8871

Move TooManyBucketsException to indexing package for shared exception…

bdba1bf

… handling.

abhishekrb19 force-pushed the uniform_msq_tombstones branch from e8011aa to 46749ab Compare November 3, 2023 16:39

lower max bucket for tests and fixup count

8bb2c38

abhishekrb19 force-pushed the uniform_msq_tombstones branch from 46749ab to 8bb2c38 Compare November 3, 2023 16:41

LakshSingla reviewed Nov 14, 2023

View reviewed changes

extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQFaultsTest.java Show resolved Hide resolved

...vice/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/TombstoneHelper.java Outdated Show resolved Hide resolved

abhishekrb19 added 3 commits November 13, 2023 23:36

Advance and count the iterator.

5d65ce0

Merge branch 'master' into uniform_msq_tombstones

971d3fc

checkstyle

722eb38

LakshSingla approved these changes Nov 15, 2023

View reviewed changes

abhishekrb19 merged commit 2e79fd5 into apache:master Nov 15, 2023
83 checks passed

abhishekrb19 deleted the uniform_msq_tombstones branch November 15, 2023 07:35

AmatyaAvadhanula added a commit to AmatyaAvadhanula/druid that referenced this pull request Dec 12, 2023

Revert "MSQ generates tombstones honoring granularity specified in a …

2e90c56

…`REPLACE` query. (apache#15243)" This reverts commit 2e79fd5.

abhishekrb19 added the Release Notes label Jan 3, 2024

LakshSingla added this to the 29.0.0 milestone Jan 29, 2024

LakshSingla mentioned this pull request Feb 13, 2024

[DRAFT] 29.0.0 release notes #15896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSQ generates tombstones honoring granularity specified in a `REPLACE` query. #15243

MSQ generates tombstones honoring granularity specified in a `REPLACE` query. #15243

abhishekrb19 commented Oct 24, 2023 •

edited

Loading

gianm left a comment

abhishekrb19 commented Oct 26, 2023

LakshSingla left a comment

abhishekrb19 commented Nov 3, 2023 •

edited

Loading

LakshSingla commented Nov 3, 2023

abhishekrb19 commented Nov 3, 2023

MSQ generates tombstones honoring granularity specified in a REPLACE query. #15243

MSQ generates tombstones honoring granularity specified in a REPLACE query. #15243

Conversation

abhishekrb19 commented Oct 24, 2023 • edited Loading

Description

Release note

Key changed/added classes in this PR

gianm left a comment

Choose a reason for hiding this comment

abhishekrb19 commented Oct 26, 2023

LakshSingla left a comment

Choose a reason for hiding this comment

abhishekrb19 commented Nov 3, 2023 • edited Loading

LakshSingla commented Nov 3, 2023

abhishekrb19 commented Nov 3, 2023

MSQ generates tombstones honoring granularity specified in a `REPLACE` query. #15243

MSQ generates tombstones honoring granularity specified in a `REPLACE` query. #15243

abhishekrb19 commented Oct 24, 2023 •

edited

Loading

abhishekrb19 commented Nov 3, 2023 •

edited

Loading