Cache DataSegments and PendingSegments on the Overlord #17336

AmatyaAvadhanula · 2024-10-13T13:14:50Z

Description

Polling the druid_segments and druid_pendingSegments tables can be a bottleneck for several operations in Druid and the load on the metadata store increases with the number of segments in the cluster.
This PR aims to maintain a central cache of active segments and relevant pending segments on the Overlord and use this wherever possible.

Advantages

Faster segment allocation (40x in SegmentAllocateActionTest#testBenchmark) -> Lower lag especially in situations where data is being backfilled
(TODO?) Faster segment syncing on coordinator -> Faster ingestion handoff on clusters where polling millions of segments (with hundreds of columns) takes a few minutes

Changes

Cache segment timelines for used segments and pending segments for intervals with active locks on the overlord.
Use this info for segment allocation
Every segment allocation or segment commit transactionally updates the cache
(TODO) Ensure that marking segments as unused or cleaning up pending segments on the Overlord also clear the cache.
(TODO) Make all the calls from the Coordinator go through the Overlord before altering druid_segments or druid_pendingSegments in order to have a centralized cache
(TODO?) Use this cache and poll segments incrementally on the Coodinator

Potential issues:

There can be an increase in memory but an overhead of a similar magnitude was already introduced with the CompactionSupervisor on the Overlord. This cache can be utilized by the compaction supervisor as well.
Time taken for leader election can be high. (May be mitigated by falling back to the DB when the cache is initializing)

Implementation details

Cache initialization:

When a TaskLockbox syncs for the first time upon leader election, obtain all the pending segments for active locks and cache only these. Any other pending segment must be irrelevant for segment allocation.
Also obtain all the used datasources and add all their segments to the cache.

This PR has:

kfaraz · 2024-10-17T04:46:25Z

Thanks for the PR, @AmatyaAvadhanula !
This is a good idea and has been talked about for a while.

There are a few things that might need to be done before we can start caching segments on the Overlord:

Move all segment mark used / unused operations from Coordinator to the Overlord
Move schema fingerprint update operations from Coordinator to the Overlord

Without these being done first, the cache might serve stale information.

cc: @cryptoe

AmatyaAvadhanula · 2024-10-17T04:53:22Z

@kfaraz, thanks for the feedback.

Segment metadata operations on the coordinator needing to be routed through the Overlord is a TODO of this PR. I think it should be simple (but tedious) to move the APIs to the Overlord and have an overlord client execute these on the coordinator.

Move schema fingerprint update operations from Coordinator to the Overlord.

I do not think this is necessary as the coordinator can continue to perform those operations as DataSegment is immutable. The coordinator currrently operates on a snapshot and will continue to do so. If this is incorrect, I think we can begin by having this caching enabled only when the Centralized segment metadata cache feature is disabled.

I hope we can determine the least set of changes to get this working and also add a feature flag, if everything else seems good.

kfaraz · 2024-10-17T10:31:12Z

Segment metadata operations on the coordinator needing to be routed through the Overlord is a TODO of this PR. I think it should be simple (but tedious) to move the APIs to the Overlord and have an overlord client execute these on the coordinator.

It would probably be simpler to write and easier to review if we do this part in a separate PR and merge it before the caching changes.

can continue to perform those operations as DataSegment is immutable.

While this is true for the payload of the segment itself, the schema fingerprint may change (typically from null to something non-null. I don't think any other change is possible for this column.)

I hope we can determine the least set of changes to get this working and also add a feature flag, if everything else seems good.

The cache should be behind a feature flag.

But to keep things simple, the new Overlord APIs probably don't need to be behind the feature flag. During a rolling upgrade, if coordinator gets upgraded first and starts calling new APIs on an old Overlord, we can simply give a nice error message.

AmatyaAvadhanula · 2024-10-17T10:40:06Z

It would probably be simpler to write and easier to review if we do this part in a separate PR and merge it before the caching changes.

Thanks. I agree that this would be better.

While this is true for the payload of the segment itself, the schema fingerprint may change (typically from null to something non-null. I don't think any other change is possible for this column.)

Segments would be added to the cache during a commit irrespective of whether the CDS feature is enabled.
The schema fingerprint backfill that happens on the coordinator doesn't affect the cache and operates on a snapshot that the coordinator uses.
Is this understanding incorrect?

Cache DataSegments on the Overlord

997e0bd

github-actions bot added Area - Dependencies Area - Ingestion labels Oct 13, 2024

kfaraz mentioned this pull request Dec 6, 2024

Move segment update APIs from Coordinator to Overlord #17545

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache DataSegments and PendingSegments on the Overlord #17336

Cache DataSegments and PendingSegments on the Overlord #17336

AmatyaAvadhanula commented Oct 13, 2024 •

edited

Loading

kfaraz commented Oct 17, 2024

AmatyaAvadhanula commented Oct 17, 2024

kfaraz commented Oct 17, 2024

AmatyaAvadhanula commented Oct 17, 2024

Cache DataSegments and PendingSegments on the Overlord #17336

Are you sure you want to change the base?

Cache DataSegments and PendingSegments on the Overlord #17336

Conversation

AmatyaAvadhanula commented Oct 13, 2024 • edited Loading

Description

Advantages

Changes

Potential issues:

Implementation details

kfaraz commented Oct 17, 2024

AmatyaAvadhanula commented Oct 17, 2024

kfaraz commented Oct 17, 2024

AmatyaAvadhanula commented Oct 17, 2024

AmatyaAvadhanula commented Oct 13, 2024 •

edited

Loading