Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update compactor backlog doc for checking halt #6906

Merged
merged 1 commit into from
Nov 22, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/operating/compactor-backlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@ The compactor is one of the most important components in Thanos. It is responsib

When your system contains a lot of block producers (Sidecar, Rule, Receiver, etc) or the scale is large, the compactor might not be able to keep up with the data producing rate and it falls behind, which causes a lot of backlogged work. This document will help you to troubleshoot the backlog compaction issue and how to scale the compactor.

## Make sure compactors are `running`

Before checking whether your compactor has backlog issues, please make sure compactors are `running`. `Running` here means compactors don't halt.

If compactors halt, any compaction or downsample process stops so it is crucial to make sure no halt happens for compactor deployment.

`thanos_compact_halted` metric will be set to 1 when halt happens. You can also find logs like below, telling that compactor is halting.

```
msg="critical error detected; halting" err="compaction failed...
```

There could be different reasons that caused the compactor to halt. A very common case is overlapping blocks. Please refer to our doc https://thanos.io/tip/operating/troubleshooting.md/#overlaps for more information.

## Detect the backlog

Self-monitoring for the monitoring system is important. We highly recommend you set up the Thanos Grafana dashboards and alerts to monitor the Thanos components. Without self-monitoring, it is hard to detect the issue and fix the problems.
Expand Down
Loading