Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bcachefs hangs on intense write operations #673

Closed
maartenvanmalland opened this issue Apr 18, 2024 · 9 comments
Closed

Bcachefs hangs on intense write operations #673

maartenvanmalland opened this issue Apr 18, 2024 · 9 comments

Comments

@maartenvanmalland
Copy link

maartenvanmalland commented Apr 18, 2024

On my NAS I have the following setup:

- Created a RAM drive: modprobe -s -q brd rd_nr=1 rd_size=54857544 max_part=1
- Created a local loop device as overflow target: losetup /dev/loop667 /opt/overflow
- Formatted it with bcachefs: bcachefs format  --label=ssd.ssd1 /dev/ram0 --label=hdd.hdd1 /dev/loop667 --foreground_target=ssd --promote_target=ssd --background_target=hdd
- Mounted it on my temp download folder: mount -t bcachefs /dev/ram0:/dev/loop667 /tmpfs/

I guess this is one of those setups that you might be thinking, oh dear, what kind of users am I attracting here :). Anyways, this setup performs like a madman, up until the point it crashes/hangs:

[Wed Apr 17 10:58:31 2024] ------------[ cut here ]------------
[Wed Apr 17 10:58:31 2024] btree trans held srcu lock (delaying memory reclaim) for 18 seconds
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 21 PID: 3562983 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x118/0x130
[bcachefs]
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 9 PID: 3042134 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x118/0x130 [
bcachefs]
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 14 PID: 1602374 at fs/bcachefs/btree_iter.c:2871 bch2_trans_put+0x23b/0x260 [bcachef
s]
[Wed Apr 17 10:58:31 2024] Modules linked in: brd bcachefs lz4hc_compress veth xt_MASQUERADE nf_conntrack_netlink
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 17 PID: 3618567 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x118/0x130
[bcachefs]
[Wed Apr 17 10:58:31 2024]  xt_addrtype br_netfilter tcp_diag udp_diag inet_diag vhost_net vhost vhost_iotlb tap tun snd_seq_
dummy snd_hrtimer cpufreq_conservative cpufreq_userspace cpufreq_powersave cpufreq_ondemand xt_conntrack xfrm_user xfrm_algo
rdma_ucm ib_uverbs rdma_cm vmw_vsock_vmci_transport vsock iw_cm vmw_vmci scsi_transport_iscsi
[Wed Apr 17 10:58:31 2024] Modules linked in:

<cut a lot of linked modules, can paste it later if needed> 

[Wed Apr 17 10:58:31 2024] CPU: 21 PID: 3562983 Comm: kworker/u113:2 Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024]  crc_t10dif
[Wed Apr 17 10:58:31 2024]  ahci
[Wed Apr 17 10:58:31 2024]  blake2b_generic
[Wed Apr 17 10:58:31 2024]  sfc_driverlink(OE)
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024]  drm
[Wed Apr 17 10:58:31 2024]  libahci
[Wed Apr 17 10:58:31 2024]  efivarfs
[Wed Apr 17 10:58:31 2024]  igb
[Wed Apr 17 10:58:31 2024]  xhci_pci
[Wed Apr 17 10:58:31 2024]  raid10
[Wed Apr 17 10:58:31 2024]  crc32c_intel
[Wed Apr 17 10:58:31 2024]  t10_pi
[Wed Apr 17 10:58:31 2024]  scsi_mod
[Wed Apr 17 10:58:31 2024]  z3fold
[Wed Apr 17 10:58:31 2024]  crct10dif_generic
[Wed Apr 17 10:58:31 2024]  lz4
[Wed Apr 17 10:58:31 2024]  usbcore
[Wed Apr 17 10:58:31 2024]  sfc(OE)
[Wed Apr 17 10:58:31 2024]  lz4_compress
[Wed Apr 17 10:58:31 2024]  mxm_wmi
[Wed Apr 17 10:58:31 2024]  sr_mod
[Wed Apr 17 10:58:31 2024]  cdrom
[Wed Apr 17 10:58:31 2024]  libata
[Wed Apr 17 10:58:31 2024]  xhci_hcd
[Wed Apr 17 10:58:31 2024]  i915
[Wed Apr 17 10:58:31 2024]  mtd
[Wed Apr 17 10:58:31 2024]  crc64_rocksoft
[Wed Apr 17 10:58:31 2024]  i2c_i801
[Wed Apr 17 10:58:31 2024]  drm_buddy
[Wed Apr 17 10:58:31 2024]  crc32_pclmul
[Wed Apr 17 10:58:31 2024]  crct10dif_pclmul
[Wed Apr 17 10:58:31 2024]  video
[Wed Apr 17 10:58:31 2024]  dca crc64
[Wed Apr 17 10:58:31 2024]  crc_t10dif
[Wed Apr 17 10:58:31 2024]  drm_display_helper
[Wed Apr 17 10:58:31 2024]  sfc_driverlink(OE)
[Wed Apr 17 10:58:31 2024]  cec
[Wed Apr 17 10:58:31 2024]  i2c_smbus i2c_algo_bit
[Wed Apr 17 10:58:31 2024]  drm
[Wed Apr 17 10:58:31 2024]  crct10dif_common
[Wed Apr 17 10:58:31 2024]  rc_core
[Wed Apr 17 10:58:31 2024]  igb
[Wed Apr 17 10:58:31 2024]  ttm
[Wed Apr 17 10:58:31 2024]  scsi_common
[Wed Apr 17 10:58:31 2024]  crc32c_intel
[Wed Apr 17 10:58:31 2024]  dm_snapshot dm_bufio
[Wed Apr 17 10:58:31 2024]  scsi_mod
[Wed Apr 17 10:58:31 2024]  usb_common
[Wed Apr 17 10:58:31 2024]  raid456
[Wed Apr 17 10:58:31 2024]  crct10dif_generic
[Wed Apr 17 10:58:31 2024]  wmi
[Wed Apr 17 10:58:31 2024]  button
[Wed Apr 17 10:58:31 2024]  usbcore
[Wed Apr 17 10:58:31 2024]  async_memcpy async_pq
[Wed Apr 17 10:58:31 2024]  crct10dif_pclmul
[Wed Apr 17 10:58:31 2024]  async_xor async_tx
[Wed Apr 17 10:58:31 2024]  dca
[Wed Apr 17 10:58:31 2024] CPU: 14 PID: 1602374 Comm: sabnzbdplus Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024]  xor hid_generic
[Wed Apr 17 10:58:31 2024]  crc64
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024]  i2c_smbus
[Wed Apr 17 10:58:31 2024]  usbhid
[Wed Apr 17 10:58:31 2024]  i2c_algo_bit
[Wed Apr 17 10:58:31 2024] Workqueue: bcachefs bch2_write_point_do_index_updates [bcachefs]
[Wed Apr 17 10:58:31 2024]  uas hid
[Wed Apr 17 10:58:31 2024]  crct10dif_common scsi_common
[Wed Apr 17 10:58:31 2024]  usb_storage raid6_pq
[Wed Apr 17 10:58:31 2024]  usb_common
[Wed Apr 17 10:58:31 2024]  libcrc32c
[Wed Apr 17 10:58:31 2024]  wmi
[Wed Apr 17 10:58:31 2024]  crc32c_generic
[Wed Apr 17 10:58:31 2024]  button
[Wed Apr 17 10:58:31 2024]  raid0 bcache
[Wed Apr 17 10:58:31 2024]  [last unloaded: brd]
[Wed Apr 17 10:58:31 2024]  dm_mod
[Wed Apr 17 10:58:31 2024]  sd_mod raid1 md_mod ast drm_shmem_helper
[Wed Apr 17 10:58:31 2024] CPU: 9 PID: 3042134 Comm: kworker/9:2H Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024]  nvme nvme_core drm_kms_helper ahci
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024]  libahci xhci_pci t10_pi sfc(OE) mxm_wmi libata xhci_hcd crc64_rocksoft crc32_pclmul crc_t10dif sf
c_driverlink(OE) drm igb crc32c_intel scsi_mod crct10dif_generic usbcore mtd i2c_i801
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024]  crct10dif_pclmul dca crc64 i2c_smbus i2c_algo_bit crct10dif_common scsi_common usb_common wmi
[Wed Apr 17 10:58:31 2024]  button [last unloaded: brd]
[Wed Apr 17 10:58:31 2024] CPU: 17 PID: 3618567 Comm: kworker/17:2H Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024] Workqueue: bcachefs_btree_io btree_node_write_work [bcachefs]
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] Code: 0b 95 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 65 1a c2 d1 <0f> 0b 8b b5 a8 00 00 00 49 8d be 68 36 00 00 83 fe 01 77 0a e8 ec
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfbe32d79e8 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024] Workqueue: bcachefs_btree_io btree_node_write_work [bcachefs]
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff9dfbe32d7c20 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff909cb22fc000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R10: ffff9dfbe32d7870 R11: ffff90ae5ff6cc28 R12: ffff909cb22fc000
[Wed Apr 17 10:58:31 2024] R13: ffff90917ba3ca60 R14: ffff908ed4ec0000 R15: 0000000000000000
[Wed Apr 17 10:58:31 2024] FS:  00007fb7fd6006c0(0000) GS:ffff90addfb00000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 00000000f2c71000 CR3: 000000011ccf2002 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024] Code: ee a1 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 48 27 c2 d1 <0f> 0b e9 64 ff ff ff 0f 0b e9 73 ff ff ff 66 2e 0f 1f 84 00 00 00
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfbce32bc10 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff909b00734000 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff908ed4ec0000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R13: ffff909b00734000 R14: 0000000000000006 R15: ffff909b00734478
[Wed Apr 17 10:58:31 2024] FS:  0000000000000000(0000) GS:ffff90addfe80000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 0000000000f90f80 CR3: 000000156b620002 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024]  ? __warn+0x80/0x120
[Wed Apr 17 10:58:31 2024] Code: ee a1 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 48 27 c2 d1 <0f> 0b e9 64 ff ff ff 0f 0b e9 73 ff ff ff 66 2e 0f 1f 84 00 00 00
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfc10cafd70 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff90a5640bc000 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff908ed4ec0000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R10: ffff9dfc10cafbf8 R11: ffff90ae5ff6cc28 R12: ffff90a5640bc278
[Wed Apr 17 10:58:31 2024] R13: ffff90a5640bc000 R14: 0000000000000002 R15: ffff90a5640bc278
[Wed Apr 17 10:58:31 2024] FS:  0000000000000000(0000) GS:ffff90addf880000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 00007f5f38a49480 CR3: 000000156b620006 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024]  ? __warn+0x80/0x120
[Wed Apr 17 10:58:31 2024] Code: ee a1 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 48 27 c2 d1 <0f> 0b e9 64 ff ff ff 0f 0b e9 73 ff ff ff 66 2e 0f 1f 84 00 00 00
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfbe0aa7d70 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff909b84df4000 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff908ed4ec0000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R10: ffff9dfbe0aa7bf8 R11: ffff90ae5ff6cc28 R12: ffff909b84df4278
[Wed Apr 17 10:58:31 2024] R13: ffff909b84df4000 R14: 0000000000000002 R15: ffff909b84df4278
[Wed Apr 17 10:58:31 2024] FS:  0000000000000000(0000) GS:ffff90addfc80000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 00007feea662f480 CR3: 00000002ca3b4005 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024]  ? console_unlock+0x78/0x120
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? __warn+0x80/0x120
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024]  ? console_unlock+0x78/0x120
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? prb_read_valid+0x1b/0x30
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_trans_begin+0x5a6/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_trans_begin+0x5a6/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_write_index_default+0xce/0x4c0 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_trans_begin+0x5a6/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_setattr_nonsize+0x1a4/0x4a0 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? __mod_node_page_state+0x7a/0xb0
[Wed Apr 17 10:58:31 2024]  btree_node_write_work+0xe3/0x460 [bcachefs]
[Wed Apr 17 10:58:31 2024]  notify_change+0x22c/0x4d0
[Wed Apr 17 10:58:31 2024]  ? chmod_common+0xbb/0x170
[Wed Apr 17 10:58:31 2024]  chmod_common+0xbb/0x170
[Wed Apr 17 10:58:31 2024]  btree_node_write_work+0xe3/0x460 [bcachefs]
[Wed Apr 17 10:58:31 2024]  __x64_sys_chmod+0x6a/0xc0
[Wed Apr 17 10:58:31 2024]  do_syscall_64+0x86/0x1a0
[Wed Apr 17 10:58:31 2024]  ? __bch2_write_index+0x195/0x290 [bcachefs]
[Wed Apr 17 10:58:31 2024]  process_one_work+0x189/0x3b0
[Wed Apr 17 10:58:31 2024]  worker_thread+0x277/0x390
[Wed Apr 17 10:58:31 2024]  ? _raw_spin_lock_irqsave+0x27/0x60
[Wed Apr 17 10:58:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  kthread+0xde/0x110
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork+0x31/0x50
[Wed Apr 17 10:58:31 2024]  process_one_work+0x189/0x3b0
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  worker_thread+0x277/0x390
[Wed Apr 17 10:58:31 2024]  ret_from_fork_asm+0x1a/0x30
[Wed Apr 17 10:58:31 2024]  ? efx_poll+0x266/0x650 [sfc]
[Wed Apr 17 10:58:31 2024]  ? _raw_spin_lock_irqsave+0x27/0x60
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---
[Wed Apr 17 10:58:31 2024]  kthread+0xde/0x110
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork+0x31/0x50
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork_asm+0x1a/0x30
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---
[Wed Apr 17 10:58:31 2024]  ? __napi_poll+0x28/0x1c0
[Wed Apr 17 10:58:31 2024]  __bch2_write_index+0x195/0x290 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? net_rx_action+0x2ab/0x390
[Wed Apr 17 10:58:31 2024]  ? __raise_softirq_irqoff+0x1a/0x80
[Wed Apr 17 10:58:31 2024]  ? __do_softirq+0x15d/0x2f8
[Wed Apr 17 10:58:31 2024]  ? fpregs_assert_state_consistent+0x25/0x50
[Wed Apr 17 10:58:31 2024]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Wed Apr 17 10:58:31 2024] RIP: 0033:0x7fb814b9fc47
[Wed Apr 17 10:58:31 2024] Code: eb d9 e8 9c 04 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 5f 00 00 00 0f 05 c3 0f 1f 84 00
 00 00 00 00 b8 5a 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 a1 0d 00 f7 d8 64 89 01 48
[Wed Apr 17 10:58:31 2024] RSP: 002b:00007fb7fd5ffba8 EFLAGS: 00000246 ORIG_RAX: 000000000000005a
[Wed Apr 17 10:58:31 2024] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb814b9fc47
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 00007fb8116c9700
[Wed Apr 17 10:58:31 2024] RBP: 00000000000001b6 R08: 0000000000a5b8a8 R09: 0000000000a5b8c8
[Wed Apr 17 10:58:31 2024] R10: 0000000000000000 R11: 0000000000000246 R12: 000000003b059c40
[Wed Apr 17 10:58:31 2024] R13: 00000000ffffff9c R14: 00007fb7fd5ffbe0 R15: 000000003b37426e
[Wed Apr 17 10:58:31 2024]  bch2_write_point_do_index_updates+0xb1/0x160 [bcachefs]
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---
[Wed Apr 17 10:58:31 2024]  process_one_work+0x189/0x3b0
[Wed Apr 17 10:58:31 2024]  worker_thread+0x277/0x390
[Wed Apr 17 10:58:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  kthread+0xde/0x110
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork+0x31/0x50
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork_asm+0x1a/0x30
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---

This only happens when there is intense writing (and I'm guessing when the data begins to overflow to the hdd target).

@maartenvanmalland
Copy link
Author

maartenvanmalland commented May 1, 2024

Inspired by #672, here some extra info as the effect of this crash is indeed a hanging filesystem as well:

# cat /sys/fs/bcachefs/*/internal/journal_debug
dirty journal entries:  763/32768
seq:                    6003
seq_ondisk:             5999
last_seq:               5241
last_seq_ondisk:        5241
flushed_seq_ondisk:     5890
watermark:              stripe
each entry reserved:    401
nr flush writes:        284
nr noflush writes:      5699
average write size:     246 KiB
nr direct reclaim:      0
nr background reclaim:  544622
reclaim kicked:         0
reclaim runs in:        12 ms
blocked:                0
current entry sectors:  512
current entry error:    ok
current entry:          closed
unwritten entries:
seq:                    6000
  refcount:             0
  size:                 136 KiB
  expires:              -135360 jiffies
  flags:                must_flush separate_flush write_started write allocated
seq:                    6001
  refcount:             0
  size:                 255 KiB
  expires:              -135109 jiffies
  flags:                noflush separate_flush write_started write allocated write done
seq:                    6002
  refcount:             0
  size:                 255 KiB
  expires:              -135109 jiffies
  flags:                noflush separate_flush write_started write allocated write done
seq:                    6003
  refcount:             0
  size:                 255 KiB
  expires:              -135109 jiffies
  flags:                noflush separate_flush write_started write allocated write done
last buf closed
replay done:            1
space:
        discarded       512:4193280
        clean ondisk    512:4193280
        clean           512:4193280
        total           512:4194304
dev 0:
        nr              1582
        bucket size     512
        available       818:0
        discard_idx     307
        dirty_ondisk    307 (seq 5241)
        dirty_idx       307 (seq 5241)
        cur_idx         1069 (seq 6003)
dev 1:
        nr              8192
        bucket size     512
        available       8190:0
        discard_idx     0
        dirty_ondisk    0 (seq 0)
        dirty_idx       0 (seq 0)
        cur_idx         0 (seq 0)

# cat /sys/fs/bcachefs/caa656c1-a03a-4fda-a11d-dd1f0ac8d896/dev-0/alloc_debug
                 buckets         sectors      fragmented
free              129877               0               0
sb                    13            6152             504
journal             1582          809984               0
btree                106           54272               0
user                5898         3013288            6488
cached               801          410912               0
parity                 0               0               0
stripe                 0               0               0
need_gc_gens           0               0               0
need_discard       64291               0               0

reserves:
stripe              6386
normal              3221
copygc                56
btree                 28
btree_copygc           0
reclaim                0
interior_updates       0

freelist_wait           empty
open buckets allocated  107
open buckets this dev   101
open buckets total      1024
open_buckets_wait       empty
open_buckets_btree      28
open_buckets_user       78
buckets_to_invalidate   0
btree reserve cache     28`


# cat /sys/kernel/debug/bcachefs/caa656c1-a03a-4fda-a11d-dd1f0ac8d896/btree_transactions
2075159 bch2_moving_ctxt_init
backtrace:
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

2075160 bch2_moving_ctxt_init
backtrace:
  [<0>] bch2_rebalance_thread+0x66/0xb0 [bcachefs]
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

2177571 bch2_do_discards_work
backtrace:
  [<0>] bch2_discard_one_bucket+0x88b/0x8c0 [bcachefs]
  [<0>] bch2_do_discards_work+0x1f5/0x3a0 [bcachefs]
  [<0>] process_one_work+0x189/0x3b0
  [<0>] worker_thread+0x277/0x390
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

2182976 btree_node_write_work
backtrace:
  [<0>] __bch2_trans_commit+0xd18/0x16f0 [bcachefs]
  [<0>] __bch2_btree_node_update_key+0x28c/0x570 [bcachefs]
  [<0>] bch2_btree_node_update_key+0x35f/0x3a0 [bcachefs]
  [<0>] bch2_btree_node_update_key_get_iter+0x1d5/0x330 [bcachefs]
  [<0>] btree_node_write_work+0x10a/0x460 [bcachefs]
  [<0>] process_one_work+0x189/0x3b0
  [<0>] worker_thread+0x277/0x390
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

3914811 btree_node_write_work
backtrace:
  [<0>] __bch2_trans_commit+0xd18/0x16f0 [bcachefs]
  [<0>] __bch2_btree_node_update_key+0x28c/0x570 [bcachefs]
  [<0>] bch2_btree_node_update_key+0x35f/0x3a0 [bcachefs]
  [<0>] bch2_btree_node_update_key_get_iter+0x1d5/0x330 [bcachefs]
  [<0>] btree_node_write_work+0x10a/0x460 [bcachefs]
  [<0>] process_one_work+0x189/0x3b0
  [<0>] worker_thread+0x277/0x390
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

Full dmesg output: http://vpaste.net/FineR

@koverstreet
Copy link
Owner

Ok, so the journal got stuck. I just started seeing a similar bug in one of the tests - let me see if I can get that one figured out and maybe add some more debug info.

@maartenvanmalland
Copy link
Author

That would be great. I can test things for you as well if needed; I can let it crash on command :)

@NHOrus
Copy link

NHOrus commented May 7, 2024

I think I had same problem? Nothing stuck, external hdd with bcachefs, deleting many small files

------------[ cut here ]------------
btree trans held srcu lock (delaying memory reclaim) for 18 seconds
WARNING: CPU: 1 PID: 7310 at fs/bcachefs/btree_iter.c:2827 bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
Modules linked in: wireguard ip6_udp_tunnel udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 curve25519_x86_64 libcurve25519_generic libchacha bcachefs raid6_pq xor uas usb_storage uinput snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm fuse uhid bnep ccm algif_aead crypto_null des3_ede_x86_64 cbc des_generic libdes algif_skcipher cmac md4 algif_hash af_alg xfs vfat fat hid_logitech_hidpp hid_logitech_dj usbhid uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc amdgpu joydev mousedev snd_hda_codec_realtek snd_hda_codec_generic drm_buddy snd_hda_codec_hdmi drm_exec gpu_sched btusb amdxcp crct10dif_pclmul btrtl drm_suballoc_helper crc32_pclmul rtw88_8822ce drm_ttm_helper crc32c_intel btintel rtw88_8822c ttm polyval_clmulni snd_hda_intel snd_pci_ps i2c_algo_bit rtw88_pci polyval_generic snd_intel_dspcfg snd_rpl_pci_acp6x gf128mul drm_display_helper bluetooth ghash_clmulni_intel snd_acp_pci sha512_ssse3 snd_acp_legacy_common snd_hda_codec
 hid_multitouch rtw88_core sha256_ssse3 r8169 cec snd_pci_acp6x sha1_ssse3 serio_raw snd_hwdep ecdh_generic mac80211 aesni_intel drm_kms_helper atkbd snd_hda_core snd_pci_acp5x realtek crypto_simd i2c_hid_acpi vivaldi_fmap libarc4 pkcs8_key_parser i2c_hid libps2 wmi_bmof cryptd cfg80211 snd_pcm mdio_devres drm sp5100_tco snd_timer snd_rn_pci_acp3x snd_acp_config ahci snd_soc_acpi libphy snd_pci_acp3x ideapad_laptop snd xhci_pci libahci platform_profile k10temp i2c_piix4 xhci_pci_renesas sparse_keymap rfkill i2c_designware_platform i8042 serio video tpm_crb ccp tpm_tis wmi tpm_tis_core rng_core tpm i2c_designware_core
CPU: 1 PID: 7310 Comm: bch-reclaim/sdb Not tainted 6.8.9-gentoo #1
Hardware name: LENOVO 82KD/LNVNB161216, BIOS GLCN50WW 07/04/2022
RIP: 0010:bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
Code: 80 97 b1 d1 48 c7 c7 55 6b be c1 48 29 c2 48 b8 cf f7 53 e3 a5 9b c4 20 48 c1 ea 03 c4 e2 cb f6 f0 48 c1 ee 04 e8 8a 2e 59 d0 <0f> 0b eb a0 00 00 00 00 00 00 66 0f 1f 00 53 48 c7 c0 40 a1 bb c1
RSP: 0018:ffffa42f42dc3b70 EFLAGS: 00010246
RAX: ac8defec8a9a4300 RBX: ffff8c5e4c204000 RCX: 0000000000000027
RDX: c0000000ffffdfff RSI: 0000000000000002 RDI: ffff8c5f3e65bb08
RBP: 0000000000000000 R08: 0000000000001fff R09: ffffffff9364aa70
R10: 0000000000005ffd R11: 0000000000000004 R12: ffff8c5e4c204000
R13: ffff8c5dfe4a6c40 R14: ffff8c5dfe480000 R15: ffff8c5dfe480000
FS:  0000000000000000(0000) GS:ffff8c5f3e640000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa15a7befe0 CR3: 0000000185610000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 ? __warn+0xca/0x1c0
 ? bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
 ? report_bug+0x16a/0x210
 ? handle_bug+0x3d/0x80
 ? exc_invalid_op+0x1a/0x50
 ? asm_exc_invalid_op+0x1a/0x20
 ? bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
 bch2_trans_begin+0x4aa/0x500 [bcachefs]
 bch2_btree_write_buffer_flush_locked+0xbd/0xb00 [bcachefs]
 ? __schedule+0x5a8/0x12e0
 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
 ? schedule+0x6e/0xc0
 ? schedule_preempt_disabled+0x10/0x20
 ? __mutex_lock+0x1a8/0x650
 btree_write_buffer_flush_seq+0x249/0x2c0 [bcachefs]
 ? local_clock_noinstr+0x3a/0xb0
 ? local_clock+0x10/0x30
 bch2_btree_write_buffer_journal_flush+0x34/0x60 [bcachefs]
 journal_flush_pins+0x1f8/0x310 [bcachefs]
 ? bch2_journal_key_to_wb_slowpath+0x230/0x230 [bcachefs]
 __bch2_journal_reclaim+0x1f6/0x370 [bcachefs]
 bch2_journal_reclaim_thread+0x84/0x170 [bcachefs]
 ? bch2_journal_reclaim_start+0xe0/0xe0 [bcachefs]
 kthread+0xe3/0x100
 ? kthread_blkcg+0x30/0x30
 ret_from_fork+0x32/0x40
 ? kthread_blkcg+0x30/0x30
 ret_from_fork_asm+0x11/0x20
 </TASK>
---[ end trace 0000000000000000 ]---```

Gentoo, kernel 6.8.9

@russells-crockpot
Copy link

russells-crockpot commented May 7, 2024

I have had the same issue as well:

------------[ cut here ]------------
btree trans held srcu lock (delaying memory reclaim) for 10 seconds
WARNING: CPU: 14 PID: 70131 at fs/bcachefs/btree_iter.c:2825 bch2_trans_put+0x250/0x280>
Modules linked in: udf crc_itu_t isofs cdrom bcachefs xor raid6_pq libcrc32c crc32c_gen>
 ip_tables x_tables f2fs crc32_generic lz4hc_compress lz4_compress hid_generic usbhid a>
CPU: 14 PID: 70131 Comm: mk-screens Tainted: G           OE      6.8.9-arch1-1 #1 b1154>
Hardware name: Gigabyte Technology Co., Ltd. B550 AORUS ELITE/B550 AORUS ELITE, BIOS F1>
RIP: 0010:bch2_trans_put+0x250/0x280 [bcachefs]
Code: 48 8b 15 d3 88 42 c7 48 c7 c7 e8 84 12 c2 48 b8 07 3a 6d a0 d3 06 3a 6d 48 29 ca >
RSP: 0018:ffffbe69b14d7658 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff958661450000 RCX: 0000000000000027
RDX: ffff958d5eba19c8 RSI: 0000000000000001 RDI: ffff958d5eba19c0
RBP: ffff95878b580000 R08: 0000000000000000 R09: ffffbe69b14d74e8
R10: ffffbe69b14d74e0 R11: 0000000000000003 R12: ffff958abf1cbfc0
R13: ffff958661450000 R14: ffff95878b5a63a8 R15: ffff95878b580000
FS:  000071db176006c0(0000) GS:ffff958d5eb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000078c0b91ed7f0 CR3: 00000004ceb36000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? __warn+0x81/0x130
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? report_bug+0x171/0x1a0
 ? prb_read_valid+0x1b/0x30
 ? srso_return_thunk+0x5/0x5f
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 __bch2_create+0x5f8/0x6f0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? srso_return_thunk+0x5/0x5f
 ? bch2_create+0x2a/0x60 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 bch2_create+0x2a/0x60 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 path_openat+0xe9f/0x1190
 do_filp_open+0xb3/0x160
 do_sys_openat2+0xab/0xe0
 __x64_sys_openat+0x57/0xa0
 do_syscall_64+0x83/0x170
 ? srso_return_thunk+0x5/0x5f
 ? get_page_from_freelist+0x18e5/0x1a80
 ? srso_return_thunk+0x5/0x5f
 ? file_tty_write.isra.0+0x219/0x330
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? __alloc_pages+0x1a3/0x350
 ? alloc_pages_mpol+0x95/0x1f0
 ? srso_return_thunk+0x5/0x5f
 ? __pte_offset_map+0x1b/0x180
 ? srso_return_thunk+0x5/0x5f
 ? __mod_memcg_lruvec_state+0x97/0x110
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? set_ptes.isra.0+0x28/0x90
 ? srso_return_thunk+0x5/0x5f
 ? do_anonymous_page+0x357/0x6e0
 ? pmdp_collapse_flush+0x10/0x60
 ? srso_return_thunk+0x5/0x5f
 ? __handle_mm_fault+0xb4a/0xe40
 ? do_syscall_64+0x90/0x170
 ? srso_return_thunk+0x5/0x5f
 ? __count_memcg_events+0x4d/0xc0
 ? srso_return_thunk+0x5/0x5f
 ? count_memcg_events.constprop.0+0x1a/0x30
 ? srso_return_thunk+0x5/0x5f
 ? handle_mm_fault+0xa2/0x360
 ? srso_return_thunk+0x5/0x5f
 ? do_user_addr_fault+0x304/0x670
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 entry_SYSCALL_64_after_hwframe+0x78/0x80
RIP: 0033:0x71db3280e705
Code: 8b 55 d0 48 89 45 b0 75 a0 44 89 55 9c e8 93 8a f8 ff 44 8b 55 9c 89 da 4c 89 e6 >
RSP: 002b:000071db175fd3e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000080241 RCX: 000071db3280e705
RDX: 0000000000080241 RSI: 000071db175fd4d8 RDI: 00000000ffffff9c
RBP: 000071db175fd450 R08: 0000000000000000 R09: 010f0f010f0f0f07
R10: 00000000000001b6 R11: 0000000000000293 R12: 000071db175fd4d8
R13: 000071db175fd880 R14: 000071db175fd4d8 R15: 00000000000001b6
 </TASK>
---[ end trace 0000000000000000 ]---

@kode54
Copy link
Contributor

kode54 commented Jul 26, 2024

I had a filesystem hang attempting to pacstrap a new filesystem from the july 2024 arch iso.

@koverstreet
Copy link
Owner

Is anyone seeing this on 6.10?

@kode54
Copy link
Contributor

kode54 commented Jul 31, 2024 via email

@koverstreet
Copy link
Owner

I'm closing this for now; if anyone is still seeing this on 6.10 go ahead and reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants