Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.11] Constant heavy reads when background_target is full #795

Open
nitinkmr333 opened this issue Dec 4, 2024 · 1 comment
Open

[6.11] Constant heavy reads when background_target is full #795

nitinkmr333 opened this issue Dec 4, 2024 · 1 comment

Comments

@nitinkmr333
Copy link

nitinkmr333 commented Dec 4, 2024

On multi-device filesystem, I have noticed that whenever background_target becomes full, there are constant heavy reads by the rebalance thread.

Steps to reproduce:

Create two loop devices. One will be used as foreground_target (disk0), other will be background_target (disk1)-

❯ mkdir -p ~/bcachefs
❯ cd ~/bcachefs
❯ dd if=/dev/zero of=disk0 bs=1G count=40 status=progress
42949672960 bytes (43 GB, 40 GiB) copied, 16 s, 2.7 GB/s
40+0 records in
40+0 records out
42949672960 bytes (43 GB, 40 GiB) copied, 16.1028 s, 2.7 GB/s
❯ dd if=/dev/zero of=disk1 bs=1G count=40 status=progress
41875931136 bytes (42 GB, 39 GiB) copied, 15 s, 2.7 GB/s
40+0 records in
40+0 records out
42949672960 bytes (43 GB, 40 GiB) copied, 15.7211 s, 2.7 GB/s

Here, both are 40GB disks.

Add them as loop devices (for mounting)-

❯ sudo losetup --find --show disk0
/dev/loop0
❯ sudo losetup --find --show disk1
/dev/loop1
❯ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0    40G  0 loop
loop1         7:1    0    40G  0 loop

Format the loop devices as bcachefs. disk0 label is ssd (foreground_target) & disk1 label is hdd (background_target)-

❯ sudo bcachefs format --label ssd /dev/loop0 --label hdd /dev/loop1 --foreground_target=ssd --background_target=hdd
External UUID:                             99e865e6-ee40-480a-bd5d-c2fb1b805583
Internal UUID:                             80195311-407a-492f-a297-5d2e3e78892d
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              1
Label:                                     (none)
Version:                                   1.13: inode_has_child_snapshots
Version upgrade complete:                  0.0: (unknown version)
Oldest version on disk:                    1.13: inode_has_child_snapshots
Created:                                   Wed Dec  4 19:11:21 2024
Sequence number:                           0
Time of last write:                        Thu Jan  1 05:30:00 1970
Superblock size:                           1.25 KiB/1.00 MiB
Clean:                                     0
Devices:                                   2
Sections:                                  members_v1,disk_groups,members_v2
Features:                                  new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:

Options:
  block_size:                              512 B
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash
  data_checksum:                           none [crc32c] crc64 xxhash
  compression:                             none
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash]
  metadata_target:                         none
  foreground_target:                       ssd
  background_target:                       hdd
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none
  nocow:                                   0

members_v2 (size 304):
Device:                                    0
  Label:                                   ssd (0)
  UUID:                                    5819c971-b9fe-448a-b1d0-d488591e61f6
  Size:                                    40.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 163840
  Last mount:                              (never)
  Last superblock write:                   0
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                (none)
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   0
Device:                                    1
  Label:                                   hdd (1)
  UUID:                                    18483694-8f70-454e-a5cd-719c2499ac11
  Size:                                    40.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 163840
  Last mount:                              (never)
  Last superblock write:                   0
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                (none)
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   0
starting version 1.13: inode_has_child_snapshots opts=foreground_target=ssd,background_target=hdd
initializing new filesystem
going read-write
initializing freespace
shutdown complete, journal seq 16

Mount the filesystem and write 60GB file (bigger than background_target)-

❯ sudo bcachefs mount /dev/loop0:/dev/loop1 /mnt
❯ sudo dd if=/dev/zero of=/mnt/hugefile bs=1G count=60 status=progress
64424509440 bytes (64 GB, 60 GiB) copied, 75 s, 854 MB/s
60+0 records in
60+0 records out
64424509440 bytes (64 GB, 60 GiB) copied, 75.4496 s, 854 MB/s

bcachefs fs usage-

❯ sudo bcachefs fs usage /mnt -h
Filesystem: 99e865e6-ee40-480a-bd5d-c2fb1b805583
Size:                       73.6 GiB
Used:                       60.2 GiB
Online reserved:                 0 B

Data type       Required/total  Durability    Devices
btree:          1/1             1             [loop0]              228 MiB
user:           1/1             1             [loop0]             21.6 GiB
user:           1/1             1             [loop1]             38.4 GiB
cached:         1/1             1             [loop0]             16.3 GiB

Btree usage:
extents:            87.0 MiB
inodes:              256 KiB
dirents:             256 KiB
alloc:              42.3 MiB
subvolumes:          256 KiB
snapshots:           256 KiB
lru:                2.75 MiB
freespace:           256 KiB
need_discard:        256 KiB
backpointers:       80.0 MiB
bucket_gens:         256 KiB
snapshot_trees:      256 KiB
rebalance_work:     13.8 MiB
accounting:          256 KiB

Pending rebalance work:
21.6 GiB

hdd (device 1):                loop1              rw
                                data         buckets    fragmented
  free:                     1.26 GiB            5177
  sb:                       3.00 MiB              13       252 KiB
  journal:                   320 MiB            1280
  btree:                         0 B               0
  user:                     38.4 GiB          157370
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  unstriped:                     0 B               0
  capacity:                 40.0 GiB          163840

ssd (device 0):                loop0              rw
                                data         buckets    fragmented
  free:                     1.27 GiB            5205
  sb:                       3.00 MiB              13       252 KiB
  journal:                   320 MiB            1280
  btree:                     228 MiB             912
  user:                     21.6 GiB           88390
  cached:                   16.3 GiB           66785       256 KiB
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:              314 MiB            1255
  unstriped:                     0 B               0
  capacity:                 40.0 GiB          163840

There is some pending rebalance work but background_target is full, so it cannot move the data. I can see rebalance thread doing constant reads even after data is written-
Screenshot_20241204_193257

I expect some constant I/O by filesystem to check if background_target has free space, but 300+MB/s seems excessive. I tried waiting for more than an hour but it did not stop. It triggers again if I remount the drive. It only stops if I delete the file I created and free up the background_target.

Underlying filesystem (where loop devices are created) is btrfs (with compression=zstd:3).

Host-

Host- NixOS
❯ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.11.5-zen1, NixOS, 24.11 (Vicuna), 24.11.20241202.f9f0d5c`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.24.10`
 - nixpkgs: `/nix/store/45bzbkwnyb6nikgc7jkrn7vjibhy4xhk-source`

bcachefs-tools version- 6.13.0

I will do some more testing on actual hardware.

@nitinkmr333
Copy link
Author

nitinkmr333 commented Dec 12, 2024

I can confirm this also happens on actual hardware. There are heavy reads when background target is full. Writes are unaffected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant