[6.11] Constant heavy reads when background_target is full #795

nitinkmr333 · 2024-12-04T14:55:23Z

On multi-device filesystem, I have noticed that whenever background_target becomes full, there are constant heavy reads by the rebalance thread.

Steps to reproduce:

Create two loop devices. One will be used as foreground_target (disk0), other will be background_target (disk1)-

❯ mkdir -p ~/bcachefs
❯ cd ~/bcachefs
❯ dd if=/dev/zero of=disk0 bs=1G count=40 status=progress
42949672960 bytes (43 GB, 40 GiB) copied, 16 s, 2.7 GB/s
40+0 records in
40+0 records out
42949672960 bytes (43 GB, 40 GiB) copied, 16.1028 s, 2.7 GB/s
❯ dd if=/dev/zero of=disk1 bs=1G count=40 status=progress
41875931136 bytes (42 GB, 39 GiB) copied, 15 s, 2.7 GB/s
40+0 records in
40+0 records out
42949672960 bytes (43 GB, 40 GiB) copied, 15.7211 s, 2.7 GB/s

Here, both are 40GB disks.

Add them as loop devices (for mounting)-

❯ sudo losetup --find --show disk0
/dev/loop0
❯ sudo losetup --find --show disk1
/dev/loop1
❯ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0    40G  0 loop
loop1         7:1    0    40G  0 loop

Format the loop devices as bcachefs. disk0 label is ssd (foreground_target) & disk1 label is hdd (background_target)-

❯ sudo bcachefs format --label ssd /dev/loop0 --label hdd /dev/loop1 --foreground_target=ssd --background_target=hdd
External UUID:                             99e865e6-ee40-480a-bd5d-c2fb1b805583
Internal UUID:                             80195311-407a-492f-a297-5d2e3e78892d
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              1
Label:                                     (none)
Version:                                   1.13: inode_has_child_snapshots
Version upgrade complete:                  0.0: (unknown version)
Oldest version on disk:                    1.13: inode_has_child_snapshots
Created:                                   Wed Dec  4 19:11:21 2024
Sequence number:                           0
Time of last write:                        Thu Jan  1 05:30:00 1970
Superblock size:                           1.25 KiB/1.00 MiB
Clean:                                     0
Devices:                                   2
Sections:                                  members_v1,disk_groups,members_v2
Features:                                  new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:

Options:
  block_size:                              512 B
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash
  data_checksum:                           none [crc32c] crc64 xxhash
  compression:                             none
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash]
  metadata_target:                         none
  foreground_target:                       ssd
  background_target:                       hdd
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none
  nocow:                                   0

members_v2 (size 304):
Device:                                    0
  Label:                                   ssd (0)
  UUID:                                    5819c971-b9fe-448a-b1d0-d488591e61f6
  Size:                                    40.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 163840
  Last mount:                              (never)
  Last superblock write:                   0
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                (none)
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   0
Device:                                    1
  Label:                                   hdd (1)
  UUID:                                    18483694-8f70-454e-a5cd-719c2499ac11
  Size:                                    40.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 163840
  Last mount:                              (never)
  Last superblock write:                   0
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                (none)
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   0
starting version 1.13: inode_has_child_snapshots opts=foreground_target=ssd,background_target=hdd
initializing new filesystem
going read-write
initializing freespace
shutdown complete, journal seq 16

Mount the filesystem and write 60GB file (bigger than background_target)-

❯ sudo bcachefs mount /dev/loop0:/dev/loop1 /mnt
❯ sudo dd if=/dev/zero of=/mnt/hugefile bs=1G count=60 status=progress
64424509440 bytes (64 GB, 60 GiB) copied, 75 s, 854 MB/s
60+0 records in
60+0 records out
64424509440 bytes (64 GB, 60 GiB) copied, 75.4496 s, 854 MB/s

bcachefs fs usage-

❯ sudo bcachefs fs usage /mnt -h
Filesystem: 99e865e6-ee40-480a-bd5d-c2fb1b805583
Size:                       73.6 GiB
Used:                       60.2 GiB
Online reserved:                 0 B

Data type       Required/total  Durability    Devices
btree:          1/1             1             [loop0]              228 MiB
user:           1/1             1             [loop0]             21.6 GiB
user:           1/1             1             [loop1]             38.4 GiB
cached:         1/1             1             [loop0]             16.3 GiB

Btree usage:
extents:            87.0 MiB
inodes:              256 KiB
dirents:             256 KiB
alloc:              42.3 MiB
subvolumes:          256 KiB
snapshots:           256 KiB
lru:                2.75 MiB
freespace:           256 KiB
need_discard:        256 KiB
backpointers:       80.0 MiB
bucket_gens:         256 KiB
snapshot_trees:      256 KiB
rebalance_work:     13.8 MiB
accounting:          256 KiB

Pending rebalance work:
21.6 GiB

hdd (device 1):                loop1              rw
                                data         buckets    fragmented
  free:                     1.26 GiB            5177
  sb:                       3.00 MiB              13       252 KiB
  journal:                   320 MiB            1280
  btree:                         0 B               0
  user:                     38.4 GiB          157370
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 40.0 GiB          163840

ssd (device 0):                loop0              rw
                                data         buckets    fragmented
  free:                     1.27 GiB            5205
  sb:                       3.00 MiB              13       252 KiB
  journal:                   320 MiB            1280
  btree:                     228 MiB             912
  user:                     21.6 GiB           88390
  cached:                   16.3 GiB           66785       256 KiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:              314 MiB            1255
  unstriped:                     0 B               0
  capacity:                 40.0 GiB          163840

There is some pending rebalance work but background_target is full, so it cannot move the data. I can see rebalance thread doing constant reads even after data is written-

I expect some constant I/O by filesystem to check if background_target has free space, but 300+MB/s seems excessive. I tried waiting for more than an hour but it did not stop. It triggers again if I remount the drive. It only stops if I delete the file I created and free up the background_target.

Underlying filesystem (where loop devices are created) is btrfs (with compression=zstd:3).

Host-

Host- NixOS
❯ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.11.5-zen1, NixOS, 24.11 (Vicuna), 24.11.20241202.f9f0d5c`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.24.10`
 - nixpkgs: `/nix/store/45bzbkwnyb6nikgc7jkrn7vjibhy4xhk-source`

bcachefs-tools version- 6.13.0

I will do some more testing on actual hardware.

The text was updated successfully, but these errors were encountered:

nitinkmr333 · 2024-12-12T15:48:02Z

I can confirm this also happens on actual hardware. There are heavy reads when background target is full. Writes are unaffected.

nitinkmr333 mentioned this issue Dec 12, 2024

[6.11,6.12] Constant I/O (rebalance) when foreground 2x nvme + background 2x HDD when nvme size >> HDD size #799

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6.11] Constant heavy reads when background_target is full #795

[6.11] Constant heavy reads when background_target is full #795

nitinkmr333 commented Dec 4, 2024 •

edited

Loading

nitinkmr333 commented Dec 12, 2024 •

edited

Loading

[6.11] Constant heavy reads when background_target is full #795

[6.11] Constant heavy reads when background_target is full #795

Comments

nitinkmr333 commented Dec 4, 2024 • edited Loading

nitinkmr333 commented Dec 12, 2024 • edited Loading

nitinkmr333 commented Dec 4, 2024 •

edited

Loading

nitinkmr333 commented Dec 12, 2024 •

edited

Loading