-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: investigate elevated batch commit tail latencies #2646
Comments
(Potentially) related to: Also, ingests can be disruptive to batch commit tail latencies, although it is definitely not the only factor because we've seen elevated batch commit tail latencies in the absence of ingests.: |
As far as I can tell, there are two primary sources of tail latency:
This seems to be driven by slow
I believe we're seeing contention on the table cache mutexes while reserving space for the memtables: #1997. These long memtable rotations keep
I tried to look at whether this might be solely due to the fsync by setting
I wonder if there's contention on the visible sequence number?:
|
I played with a few tweaks looking to understand and reduce the tail latencies, including avoiding acquiring It can be difficult to see past the noise, especially in the presence of write stalls which can spike log commit latency dramatically or on process start. The below graph uses a MIN aggregator to see past the spikes. It shows these changes do help in the very tail, but are more muted at p99. |
We've observed large allocations like the 64MB memtable allocation take 10ms+. This can contribute to batch commit tail latencies by slowing down WAL/memtable rotation during which the entire commit pipeline is stalled. This commit adapts the memtable lifecycle to keep the most recent obsolete memtable around for the next memtable allocation. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ RotateMemtables-24 105.56µ ± 1% 88.11µ ± 2% -16.53% (p=0.000 n=25) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ RotateMemtables-24 124.3Ki ± 0% 124.3Ki ± 0% ~ (p=0.418 n=25) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ RotateMemtables-24 114.0 ± 0% 115.0 ± 0% +0.88% (p=0.000 n=25) ``` Informs cockroachdb#2646.
We've observed large allocations like the 64MB memtable allocation take 10ms+. This can contribute to batch commit tail latencies by slowing down WAL/memtable rotation during which the entire commit pipeline is stalled. This commit adapts the memtable lifecycle to keep the most recent obsolete memtable around for the next memtable allocation. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ RotateMemtables-24 105.56µ ± 1% 88.11µ ± 2% -16.53% (p=0.000 n=25) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ RotateMemtables-24 124.3Ki ± 0% 124.3Ki ± 0% ~ (p=0.418 n=25) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ RotateMemtables-24 114.0 ± 0% 115.0 ± 0% +0.88% (p=0.000 n=25) ``` Informs cockroachdb#2646.
We've observed large allocations like the 64MB memtable allocation take 10ms+. This can contribute to batch commit tail latencies by slowing down WAL/memtable rotation during which the entire commit pipeline is stalled. This commit adapts the memtable lifecycle to keep the most recent obsolete memtable around for use as the next memtable. This reduces the commit latency hiccup during a memtable rotation, and it also reduces block cache mutex contention (cockroachdb#1997) by reducing the number of times we must reserve memory from the block cache. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ RotateMemtables-24 120.7µ ± 2% 102.8µ ± 4% -14.85% (p=0.000 n=25) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ RotateMemtables-24 124.3Ki ± 0% 124.0Ki ± 0% -0.27% (p=0.000 n=25) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ RotateMemtables-24 114.0 ± 0% 111.0 ± 0% -2.63% (p=0.000 n=25) ``` Informs cockroachdb#2646.
We've observed large allocations like the 64MB memtable allocation take 10ms+. This can contribute to batch commit tail latencies by slowing down WAL/memtable rotation during which the entire commit pipeline is stalled. This commit adapts the memtable lifecycle to keep the most recent obsolete memtable around for use as the next memtable. This reduces the commit latency hiccup during a memtable rotation, and it also reduces block cache mutex contention (cockroachdb#1997) by reducing the number of times we must reserve memory from the block cache. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ RotateMemtables-24 120.7µ ± 2% 102.8µ ± 4% -14.85% (p=0.000 n=25) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ RotateMemtables-24 124.3Ki ± 0% 124.0Ki ± 0% -0.27% (p=0.000 n=25) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ RotateMemtables-24 114.0 ± 0% 111.0 ± 0% -2.63% (p=0.000 n=25) ``` Informs cockroachdb#2646.
We've observed large allocations like the 64MB memtable allocation take 10ms+. This can add latency to the WAL/memtable rotation critical section during which the entire commit pipeline is stalled, contributing to batch commit tail latencies. This commit adapts the memtable lifecycle to keep the most recent obsolete memtable around for use as the next mutable memtable. This reduces the commit latency hiccup during a memtable rotation, and it also reduces block cache mutex contention (cockroachdb#1997) by reducing the number of times we must reserve memory from the block cache. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ RotateMemtables-24 120.7µ ± 2% 102.8µ ± 4% -14.85% (p=0.000 n=25) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ RotateMemtables-24 124.3Ki ± 0% 124.0Ki ± 0% -0.27% (p=0.000 n=25) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ RotateMemtables-24 114.0 ± 0% 111.0 ± 0% -2.63% (p=0.000 n=25) ``` Informs cockroachdb#2646.
We've observed large allocations like the 64MB memtable allocation take 10ms+. This can add latency to the WAL/memtable rotation critical section during which the entire commit pipeline is stalled, contributing to batch commit tail latencies. This commit adapts the memtable lifecycle to keep the most recent obsolete memtable around for use as the next mutable memtable. This reduces the commit latency hiccup during a memtable rotation, and it also reduces block cache mutex contention (#1997) by reducing the number of times we must reserve memory from the block cache. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.30GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ RotateMemtables-24 120.7µ ± 2% 102.8µ ± 4% -14.85% (p=0.000 n=25) │ old.txt │ new.txt │ │ B/op │ B/op vs base │ RotateMemtables-24 124.3Ki ± 0% 124.0Ki ± 0% -0.27% (p=0.000 n=25) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ RotateMemtables-24 114.0 ± 0% 111.0 ± 0% -2.63% (p=0.000 n=25) ``` Informs #2646.
The following graphs are from one particular instance of this, but we've seen it across several test clusters including the 23.1 test cluster.
Internal discussion: https://cockroachlabs.slack.com/archives/C057ULDSKC0/p1686947338066129?thread_ts=1686924156.296339&cid=C057ULDSKC0
The text was updated successfully, but these errors were encountered: