Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: CGo memory allocations should be per-store, or clarified as aggregated #4129

Open
itsbilal opened this issue Nov 1, 2024 · 0 comments
Assignees
Labels
A-storage C-enhancement New feature or request O-testcluster Issues found as part of DRT testing P-2 Issues/test failures with a fix SLA of 3 months T-storage

Comments

@itsbilal
Copy link
Member

itsbilal commented Nov 1, 2024

Currently, if the Pebble block cache is shared across multiple Pebbles, the CGo memory allocations line in db.Metrics() is aggregated across all the stores while the count of zombie memtables etc is for that Pebble only. This makes it pretty difficult to correlate the two numbers at first glance. Here's an example string representation from a Pebble with very high memtable memory utilization (this Cockroach node had 8 stores, each with high zombie memtable counts):

      |                             |       |       |   ingested   |     moved    |    written   |       |    amp   |     multilevel
level | tables  size val-bl vtables | score |   in  | tables  size | tables  size | tables  size |  read |   r   w  |    top   in  read
------+-----------------------------+-------+-------+--------------+--------------+--------------+-------+----------+------------------
    0 |     0     0B     0B       0 |  0.00 | 188GB |  1.1K  970KB |     0     0B |   16K   19GB |    0B |   0  0.1 |    0B    0B    0B
    1 |     0     0B     0B       0 |  0.00 |    0B |     0     0B |     0     0B |     0     0B |    0B |   0  0.0 |    0B    0B    0B
    2 |    18   52MB  3.3MB       0 |  0.96 | 8.9GB |   125  111KB |  3.0K  9.2GB |   53K  162GB | 162GB |   1 18.2 |    0B    0B    0B
    3 |    53  261MB   34MB       0 |  0.92 |  11GB |   327  290KB |  1.9K  4.8GB |  7.8K   28GB |  29GB |   1  2.5 |    0B    0B    0B
    4 |   161  1.6GB  356MB       1 |  0.84 |  17GB |   550  498KB |   382 1023MB |   17K  160GB | 161GB |   1  9.6 | 2.0GB 6.0GB  21GB
    5 |   634  2.3GB  554MB       2 |  1.00 |  15GB |  3.8K  4.9MB |   158  368MB |  133K  497GB | 499GB |   1 34.0 | 356MB 2.7GB 8.5GB
    6 |  6.2K  167GB  2.5GB      57 |     - |  11GB |   20K  224GB |   294   44MB |   24K 1009GB | 1.0TB |   1 93.7 | 767KB  26MB  58MB
total |  7.0K  171GB  3.4GB      60 |     - | 412GB |   26K  224GB |  5.8K   15GB |  252K  2.2TB | 1.9TB |   5  5.6 | 2.3GB 8.7GB  30GB
---------------------------------------------------------------------------------------------------------------------------------------
WAL: 1 files (48MB)  in: 186GB  written: 188GB (1% overhead) failover: (switches: 6, primary: ‹22h19m21.694006978s›, secondary: ‹59.301058386s›)
Flushes: 5296
Compactions: 48680  estimated debt: 0B  in progress: 0 (0B)
             default: 17201  delete: 2590  elision: 6356  move: 5835  read: 45  tombstone-density: 16653  rewrite: 0  copy: 0  multi-level: 1044
MemTables: 1 (64MB)  zombie: 50 (3.1GB)
Zombie tables: 2410 (29GB, local: 29GB)
Backing tables: 38 (1.8GB)
Virtual tables: 60 (1.4GB)
Local tables size: 172GB
Compression types: snappy: 6975 unknown: 60
Block cache: 0 entries (0B)  hit rate: 77.1%
Table cache: 78K entries (60MB)  hit rate: 99.9%
Secondary cache: 0 entries (0B)  hit rate: 0.0%
Snapshots: 3  earliest seq num: 1019853557
Table iters: 621
Filter utility: 72.8%
Ingestions: 3605  as flushable: 928 (53GB in 6493 tables)
Cgo memory usage: 24GB  block cache: 541MB (data: 392MB, maps: 148MB, entries: 5.4KB)  memtables: 23GB

At the very least, the string representation of metrics should clarify that the memory counters are aggregated across any Pebbles in the same process (as they're stored in a global var). Or better yet, we should make the counters per-store and print them out per-store.

Jira issue: PEBBLE-293

Epic CRDB-41111

@itsbilal itsbilal added the C-enhancement New feature or request label Nov 1, 2024
@itsbilal itsbilal added the O-testcluster Issues found as part of DRT testing label Nov 1, 2024
@itsbilal itsbilal moved this from Incoming to Backlog in [Deprecated] Storage Nov 5, 2024
@exalate-issue-sync exalate-issue-sync bot added the P-2 Issues/test failures with a fix SLA of 3 months label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage C-enhancement New feature or request O-testcluster Issues found as part of DRT testing P-2 Issues/test failures with a fix SLA of 3 months T-storage
Projects
Status: Backlog
Development

No branches or pull requests

2 participants