-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache DataSegments and PendingSegments on the Overlord #17336
base: master
Are you sure you want to change the base?
Conversation
Thanks for the PR, @AmatyaAvadhanula ! There are a few things that might need to be done before we can start caching segments on the Overlord:
Without these being done first, the cache might serve stale information. cc: @cryptoe |
@kfaraz, thanks for the feedback.
I do not think this is necessary as the coordinator can continue to perform those operations as DataSegment is immutable. The coordinator currrently operates on a snapshot and will continue to do so. If this is incorrect, I think we can begin by having this caching enabled only when the Centralized segment metadata cache feature is disabled. I hope we can determine the least set of changes to get this working and also add a feature flag, if everything else seems good. |
It would probably be simpler to write and easier to review if we do this part in a separate PR and merge it before the caching changes.
While this is true for the payload of the segment itself, the schema fingerprint may change (typically from null to something non-null. I don't think any other change is possible for this column.)
The cache should be behind a feature flag. But to keep things simple, the new Overlord APIs probably don't need to be behind the feature flag. During a rolling upgrade, if coordinator gets upgraded first and starts calling new APIs on an old Overlord, we can simply give a nice error message. |
Thanks. I agree that this would be better.
Segments would be added to the cache during a commit irrespective of whether the CDS feature is enabled. |
Description
Polling the druid_segments and druid_pendingSegments tables can be a bottleneck for several operations in Druid and the load on the metadata store increases with the number of segments in the cluster.
This PR aims to maintain a central cache of active segments and relevant pending segments on the Overlord and use this wherever possible.
Advantages
40x
in SegmentAllocateActionTest#testBenchmark) -> Lower lag especially in situations where data is being backfilledChanges
Potential issues:
Implementation details
Cache initialization:
This PR has: