-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not allocate week granular segments unless requested #15589
Do not allocate week granular segments unless requested #15589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Left some minor comments. The code-level behavior is in alignment with what we document here and in the MSQ docs:
Avoid WEEK granularity for data partitioning because weeks don't align neatly with months and years, making it difficult to change partitioning by coarser granularity. Instead, opt for other partitioning options such as DAY or MONTH, which offer more flexibility.
@@ -272,8 +273,12 @@ private SegmentIdWithShardSpec tryAllocateFirstSegment(TaskActionToolbox toolbox | |||
{ | |||
// No existing segments for this row, but there might still be nearby ones that conflict with our preferred | |||
// segment granularity. Try that first, and then progressively smaller ones if it fails. | |||
// Consider trying with WEEK granularity only if the preferred granularity is WEEK | |||
final List<Interval> tryIntervals = Granularity.granularitiesFinerThan(preferredSegmentGranularity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have the logic inside Granularity.granularitiesFinerThan()
-- i.e., unless preferredSegmentGranularity == GranularityType.WEEK
, the function won't add GranularityType.WEEK
to the candidate list of granularities? (similar to the existing Granularity.ALL
condition)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, add a new method named Granularity.nonWeekgranularitiesFinerThan()
or something and put the explanation in the javadoc of that method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I was hesitant about changing the existing method because it seemed like it would make the utility unintuitive.
Although it's not being used anywhere else
@@ -439,6 +440,11 @@ private int allocateSegmentsForBatch(AllocateRequestBatch requestBatch, Set<Data | |||
final List<Granularity> candidateGranularities | |||
= Granularity.granularitiesFinerThan(requestBatch.key.preferredSegmentGranularity); | |||
for (Granularity granularity : candidateGranularities) { | |||
// Do not try to allocate with WEEK granularity unless the preferred granularity itself is WEEK | |||
if (!Granularities.WEEK.equals(requestBatch.key.preferredSegmentGranularity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comment re moving this logic inside Granularity.granularitiesFinerThan
. If we do that, we can skip this filtering logic at every call site and avoid bugs.
@@ -272,8 +273,12 @@ private SegmentIdWithShardSpec tryAllocateFirstSegment(TaskActionToolbox toolbox | |||
{ | |||
// No existing segments for this row, but there might still be nearby ones that conflict with our preferred | |||
// segment granularity. Try that first, and then progressively smaller ones if it fails. | |||
// Consider trying with WEEK granularity only if the preferred granularity is WEEK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also update the class-level javadoc to note that WEEK
granularity won't be used even if it's a possibility, unless it's explicitly requested for: https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/actions/SegmentAllocateAction.java#L56?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Also, optionally, while we are at it, could we change this quickstart tutorial segment granularity from week
to day
: https://github.com/apache/druid/blob/master/examples/quickstart/tutorial/transform-index.json#L23
@@ -117,6 +119,9 @@ public static List<Granularity> granularitiesFinerThan(final Granularity gran0) | |||
if ((gran == GranularityType.ALL && !gran0.equals(Granularities.ALL)) || gran == GranularityType.NONE) { | |||
continue; | |||
} | |||
if (GranularityType.WEEK.equals(gran) && !Granularities.WEEK.equals(gran0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: similar to the above check, enum comparisons can use the ==
operator
@@ -1011,7 +1011,6 @@ public void testGranularitiesFinerThanAll() | |||
Granularities.YEAR, | |||
Granularities.QUARTER, | |||
Granularities.MONTH, | |||
Granularities.WEEK, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, could we also add a test testGranularitiesFinerThanWeek()
that does include the week granularity when it's requested?
@abhishekrb19 @kfaraz Thank you for the reviews |
When a segment of a requested segment granularity cannot be allocated due to conflicting segments, we try to exhaust all finer granularity segments before giving up.
However when the preferred granularity is month or coarser, and a week granular segment is allocated to adjust with the conflict, the cluster can end up in a bad state because a week granular segment doesn't necessarily align with the end of the month or higher granular interval.
To prevent this problem, we consider day granular segments after month in the sequence.
Weekly segments are skipped and are not allocated unless explicitly requested for.
Suppose an hour segment exists for 16th December, 2023 from 12am-1am, and a new segment needs to be allocated for the timestamp 18th December. After this patch:
A. A MONTH granular segment cannot be created as a segment already exists for the month which doesn't span the whole month.
B. A DAY segment (and not a WEEK) is chosen from [2023-12-18/2023-12-19] and created as it is the next valid granularity and has no conflicts.
This PR has: