Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allocate week granular segments unless requested #15589

Merged

Conversation

AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Dec 17, 2023

When a segment of a requested segment granularity cannot be allocated due to conflicting segments, we try to exhaust all finer granularity segments before giving up.
However when the preferred granularity is month or coarser, and a week granular segment is allocated to adjust with the conflict, the cluster can end up in a bad state because a week granular segment doesn't necessarily align with the end of the month or higher granular interval.
To prevent this problem, we consider day granular segments after month in the sequence.
Weekly segments are skipped and are not allocated unless explicitly requested for.

Suppose an hour segment exists for 16th December, 2023 from 12am-1am, and a new segment needs to be allocated for the timestamp 18th December. After this patch:

  1. If the requested granularity is WEEK, create a week granular segment with the interval [2023-12-18/2023-12-25].
  2. If the requested granularity is MONTH
    A. A MONTH granular segment cannot be created as a segment already exists for the month which doesn't span the whole month.
    B. A DAY segment (and not a WEEK) is chosen from [2023-12-18/2023-12-19] and created as it is the next valid granularity and has no conflicts.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review December 17, 2023 13:55
Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. Left some minor comments. The code-level behavior is in alignment with what we document here and in the MSQ docs:

Avoid WEEK granularity for data partitioning because weeks don't align neatly with months and years, making it difficult to change partitioning by coarser granularity. Instead, opt for other partitioning options such as DAY or MONTH, which offer more flexibility.

@@ -272,8 +273,12 @@ private SegmentIdWithShardSpec tryAllocateFirstSegment(TaskActionToolbox toolbox
{
// No existing segments for this row, but there might still be nearby ones that conflict with our preferred
// segment granularity. Try that first, and then progressively smaller ones if it fails.
// Consider trying with WEEK granularity only if the preferred granularity is WEEK
final List<Interval> tryIntervals = Granularity.granularitiesFinerThan(preferredSegmentGranularity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have the logic inside Granularity.granularitiesFinerThan() -- i.e., unless preferredSegmentGranularity == GranularityType.WEEK, the function won't add GranularityType.WEEK to the candidate list of granularities? (similar to the existing Granularity.ALL condition)

Copy link
Contributor

@kfaraz kfaraz Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, add a new method named Granularity.nonWeekgranularitiesFinerThan() or something and put the explanation in the javadoc of that method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I was hesitant about changing the existing method because it seemed like it would make the utility unintuitive.
Although it's not being used anywhere else

@@ -439,6 +440,11 @@ private int allocateSegmentsForBatch(AllocateRequestBatch requestBatch, Set<Data
final List<Granularity> candidateGranularities
= Granularity.granularitiesFinerThan(requestBatch.key.preferredSegmentGranularity);
for (Granularity granularity : candidateGranularities) {
// Do not try to allocate with WEEK granularity unless the preferred granularity itself is WEEK
if (!Granularities.WEEK.equals(requestBatch.key.preferredSegmentGranularity)
Copy link
Contributor

@abhishekrb19 abhishekrb19 Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment re moving this logic inside Granularity.granularitiesFinerThan. If we do that, we can skip this filtering logic at every call site and avoid bugs.

@@ -272,8 +273,12 @@ private SegmentIdWithShardSpec tryAllocateFirstSegment(TaskActionToolbox toolbox
{
// No existing segments for this row, but there might still be nearby ones that conflict with our preferred
// segment granularity. Try that first, and then progressively smaller ones if it fails.
// Consider trying with WEEK granularity only if the preferred granularity is WEEK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also update the class-level javadoc to note that WEEK granularity won't be used even if it's a possibility, unless it's explicitly requested for: https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/actions/SegmentAllocateAction.java#L56?

Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Also, optionally, while we are at it, could we change this quickstart tutorial segment granularity from week to day: https://github.com/apache/druid/blob/master/examples/quickstart/tutorial/transform-index.json#L23

@@ -117,6 +119,9 @@ public static List<Granularity> granularitiesFinerThan(final Granularity gran0)
if ((gran == GranularityType.ALL && !gran0.equals(Granularities.ALL)) || gran == GranularityType.NONE) {
continue;
}
if (GranularityType.WEEK.equals(gran) && !Granularities.WEEK.equals(gran0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar to the above check, enum comparisons can use the == operator

@@ -1011,7 +1011,6 @@ public void testGranularitiesFinerThanAll()
Granularities.YEAR,
Granularities.QUARTER,
Granularities.MONTH,
Granularities.WEEK,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, could we also add a test testGranularitiesFinerThanWeek() that does include the week granularity when it's requested?

@AmatyaAvadhanula AmatyaAvadhanula merged commit c41e99e into apache:master Jan 5, 2024
83 checks passed
@AmatyaAvadhanula
Copy link
Contributor Author

@abhishekrb19 @kfaraz Thank you for the reviews

@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
@kfaraz kfaraz deleted the skip_week_segment_granularity branch February 21, 2024 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants