Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about to insert invalid key in data update path: fatal error - emergency read only #717

Closed
jpf91 opened this issue Jul 23, 2024 · 5 comments

Comments

@jpf91
Copy link

jpf91 commented Jul 23, 2024

Hi there,

I'm currently testing bcachefs mostly for its caching features combined with flexible storage layouts (i.e. using disks with different sizes). For this use case, it already seems to work better than btrfs and zfs 😀

My basic setup is like this:

  • bcachefs on Proxmox VM host
  • exported via NFS
  • NFS mounted in a Fedora CoreOS VM
  • NFS mount mapped into Docker containers in CoreOS

This worked fine for some basic containers. However, when configuring a Jellyfin container it caused an emergency read-only when starting the initial media library scan. The filesystem did also not want to unmount, but after a reboot it mounted cleanly again and in general works just fine. I didn't dare to test whether the issue is reproducible though 🙈 Below are dmesg logs, kernel version and filesystem setup:

FS setup

bcachefs format \
    --block_size=4096 \
    --compression=zstd \
    --encrypt \
    -L "NAS_DATA" \
    --label=hdd.hdd1 /dev/sda \
    --label=hdd.hdd2 /dev/sdb1 \
    --label=hdd.hdd3 /dev/sdc \
    --discard \
    --label=ssd.ssd1 /dev/nvme0n1p4 \
    --label=ssd.ssd2 /dev/nvme1n1p4 \
    --replicas=2 \
    --foreground_target=ssd \
    --promote_target=ssd \
    --background_target=hdd

dmesg log

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only
[Wed Jul 17 13:30:03 2024] ------------[ cut here ]------------
[Wed Jul 17 13:30:03 2024] kernel BUG at fs/bcachefs/io_write.c:524!
[Wed Jul 17 13:30:03 2024] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[Wed Jul 17 13:30:03 2024] CPU: 0 PID: 3142582 Comm: kworker/u64:3 Tainted: P           O       6.8.8-2-pve #1
[Wed Jul 17 13:30:03 2024] Hardware name: GIGABYTE MC12-LE0-00/MC12-LE0-00, BIOS F13 09/27/2023
[Wed Jul 17 13:30:03 2024] Workqueue: bcachefs bch2_write_point_do_index_updates [bcachefs]
[Wed Jul 17 13:30:03 2024] RIP: 0010:__bch2_write_index+0x276/0x280 [bcachefs]
[Wed Jul 17 13:30:03 2024] Code: e0 09 e8 fd e0 03 00 eb 90 be 1e 00 00 00 44 89 ef e8 be 53 fc ff 84 c0 0f 85 7b ff ff ff 4d 8b be 38 01 00 00 e9 58 ff ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[Wed Jul 17 13:30:03 2024] RSP: 0018:ffffa9541cfd7dd8 EFLAGS: 00010246
[Wed Jul 17 13:30:03 2024] RAX: ffff8d3a372f0c28 RBX: ffff8d334f400000 RCX: 0000000000000010
[Wed Jul 17 13:30:03 2024] RDX: ffff8d3a372f0c28 RSI: 0000000000000009 RDI: 0000000000000000
[Wed Jul 17 13:30:03 2024] RBP: ffffa9541cfd7e00 R08: 0000000000000000 R09: 0000000000000000
[Wed Jul 17 13:30:03 2024] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010
[Wed Jul 17 13:30:03 2024] R13: 0000000000000000 R14: ffff8d3a372f0a98 R15: ffff8d3a372f0be0
[Wed Jul 17 13:30:03 2024] FS:  0000000000000000(0000) GS:ffff8d3e4d200000(0000) knlGS:0000000000000000
[Wed Jul 17 13:30:03 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Jul 17 13:30:03 2024] CR2: 00007f41a0e09605 CR3: 0000000845bfc000 CR4: 0000000000f50ef0
[Wed Jul 17 13:30:03 2024] PKRU: 55555554
[Wed Jul 17 13:30:03 2024] Call Trace:
[Wed Jul 17 13:30:03 2024]  <TASK>
[Wed Jul 17 13:30:03 2024]  ? show_regs+0x6d/0x80
[Wed Jul 17 13:30:03 2024]  ? die+0x37/0xa0
[Wed Jul 17 13:30:03 2024]  ? do_trap+0xd4/0xf0
[Wed Jul 17 13:30:03 2024]  ? do_error_trap+0x71/0xb0
[Wed Jul 17 13:30:03 2024]  ? __bch2_write_index+0x276/0x280 [bcachefs]
[Wed Jul 17 13:30:03 2024]  ? exc_invalid_op+0x52/0x80
[Wed Jul 17 13:30:03 2024]  ? __bch2_write_index+0x276/0x280 [bcachefs]
[Wed Jul 17 13:30:03 2024]  ? asm_exc_invalid_op+0x1b/0x20
[Wed Jul 17 13:30:03 2024]  ? __bch2_write_index+0x276/0x280 [bcachefs]
[Wed Jul 17 13:30:03 2024]  ? __bch2_write_index+0x62/0x280 [bcachefs]
[Wed Jul 17 13:30:03 2024]  bch2_write_point_do_index_updates+0xa1/0x150 [bcachefs]
[Wed Jul 17 13:30:03 2024]  process_one_work+0x16d/0x350
[Wed Jul 17 13:30:03 2024]  worker_thread+0x306/0x440
[Wed Jul 17 13:30:03 2024]  ? __pfx_worker_thread+0x10/0x10
[Wed Jul 17 13:30:03 2024]  kthread+0xf2/0x120
[Wed Jul 17 13:30:03 2024]  ? __pfx_kthread+0x10/0x10
[Wed Jul 17 13:30:03 2024]  ret_from_fork+0x47/0x70
[Wed Jul 17 13:30:03 2024]  ? __pfx_kthread+0x10/0x10
[Wed Jul 17 13:30:03 2024]  ret_from_fork_asm+0x1b/0x30
[Wed Jul 17 13:30:03 2024]  </TASK>
[Wed Jul 17 13:30:03 2024] Modules linked in: tcp_diag inet_diag vhost_vsock vmw_vsock_virtio_transport_common vsock rpcsec_gss_krb5 act_police cls_basic sch_ingress sch_htb veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables bonding tls softdog binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha edac_mce_amd snd_sof_amd_acp63 bcachefs snd_sof_amd_vangogh snd_sof_amd_rembrandt lz4hc_compress kvm_amd ipmi_ssif lz4_compress snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp kvm irqbypass snd_sof crct10dif_pclmul polyval_clmulni polyval_generic snd_sof_utils ghash_clmulni_intel sha256_ssse3 snd_soc_core sha1_ssse3 snd_hda_intel aesni_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_compress crypto_simd ac97_bus snd_pcm_dmaengine snd_hda_codec cryptd snd_pci_ps snd_rpl_pci_acp6x snd_acp_pci snd_hda_core snd_acp_legacy_common snd_hwdep snd_pci_acp6x snd_pcm
[Wed Jul 17 13:30:03 2024]  snd_pci_acp5x snd_rn_pci_acp3x snd_timer snd_acp_config wmi_bmof rapl acpi_ipmi snd ipmi_si snd_soc_acpi pcspkr soundcore ipmi_devintf snd_pci_acp3x ccp k10temp ast ipmi_msghandler mac_hid zfs(PO) nfsd auth_rpcgss spl(O) nfs_acl vhost_net lockd vhost grace vhost_iotlb tap efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid btrfs blake2b_generic xor raid6_pq libcrc32c xhci_pci xhci_pci_renesas nvme crc32_pclmul igb nvme_core i2c_algo_bit ahci i2c_piix4 xhci_hcd dca libahci nvme_auth video wmi gpio_amdpt
[Wed Jul 17 13:30:03 2024] ---[ end trace 0000000000000000 ]---
[Wed Jul 17 13:30:04 2024] RIP: 0010:__bch2_write_index+0x276/0x280 [bcachefs]
[Wed Jul 17 13:30:04 2024] Code: e0 09 e8 fd e0 03 00 eb 90 be 1e 00 00 00 44 89 ef e8 be 53 fc ff 84 c0 0f 85 7b ff ff ff 4d 8b be 38 01 00 00 e9 58 ff ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[Wed Jul 17 13:30:04 2024] RSP: 0018:ffffa9541cfd7dd8 EFLAGS: 00010246
[Wed Jul 17 13:30:04 2024] RAX: ffff8d3a372f0c28 RBX: ffff8d334f400000 RCX: 0000000000000010
[Wed Jul 17 13:30:04 2024] RDX: ffff8d3a372f0c28 RSI: 0000000000000009 RDI: 0000000000000000
[Wed Jul 17 13:30:04 2024] RBP: ffffa9541cfd7e00 R08: 0000000000000000 R09: 0000000000000000
[Wed Jul 17 13:30:04 2024] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010
[Wed Jul 17 13:30:04 2024] R13: 0000000000000000 R14: ffff8d3a372f0a98 R15: ffff8d3a372f0be0
[Wed Jul 17 13:30:04 2024] FS:  0000000000000000(0000) GS:ffff8d3e4d200000(0000) knlGS:0000000000000000
[Wed Jul 17 13:30:04 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Jul 17 13:30:04 2024] CR2: 00007f41a0e09605 CR3: 0000000845bfc000 CR4: 0000000000f50ef0
[Wed Jul 17 13:30:04 2024] PKRU: 55555554

(BTW, am I leaking any important key data here? If so please let me know 😉 )

I also sometimes see these, but I think they are unrelated:

[Wed Jul 17 13:29:08 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): bch2_inode_peek(): error looking up inum 1:402850194: ENOENT_inode
[Wed Jul 17 13:29:17 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): bch2_inode_peek(): error looking up inum 1:192547: ENOENT_inode
[Wed Jul 17 13:29:31 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): bch2_inode_peek(): error looking up inum 1:134422620: ENOENT_inode
[Wed Jul 17 13:29:45 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): bch2_inode_peek(): error looking up inum 1:537051072: ENOENT_inode

OS Version etc

  • Kernel 6.8.8-2-pve
  • bcachefs-tools 1.7.0
  • Proxmox 8.2.4, Debian GNU/Linux 12

I realize this is a bit old as bcachefs is rapidly changing. I'm also seeing some more bugs (tvheadend TV recording files being cut off, QEMU raw disk images placed on bcachefs not mounting inside linux VMs (disk timeouts)). But I think I'll only report those if I can reproduce with a newer kernel. Once the ubuntu mainline PPA has the 6.10 kernel release ready for amd64, I'll see if I can reproduce the issue there.

@Lykos153
Copy link

Lykos153 commented Jul 23, 2024

I see this error, too. First time happened yesterday as reported in #716 (comment). I did an fsck,fix_errors and it was working fine until now when it recurred, see log below.

Versions

  • Linux version 6.9.9 (EDIT: Problem persists with 6.10.0)
  • bcachefs 1.7.0-unstable-2024-05-09
dmesg log
[34164.217970] about to insert invalid key in data update path
[34164.217972] old: u64s 9 type extent 542933749:4416:4294967294 len 64 ver 160241587: durability: 1 crc: c_size 56 size 64 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 6:6524:632 gen 10 rebalance: target hdd compression zstd
[34164.217973] k:   u64s 9 type extent 542933749:4416:4294967294 len 8 ver 160241589: durability: 1 crc: c_size 8 size 8 offset 0 nonce 0 csum chacha20_poly1305_80 compress incompressible ptr: 6:6524:688 gen 10 rebalance: target hdd compression zstd
[34164.217974] new: u64s 12 type extent 542933749:4416:4294967294 len 8 ver 160241589: durability: 1 crc: c_size 8 size 8 offset 0 nonce 0 csum chacha20_poly1305_80 compress incompressible ptr: 6:6524:688 gen 10 cached rebalance: target hdd compression zstd crc: c_size 56 size 64 offset 56 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:299163:304 gen 6
[34164.217985] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): fatal error - emergency read only
[34164.218069] ------------[ cut here ]------------
[34164.218070] kernel BUG at fs/bcachefs/io_write.c:532!
[34164.218076] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[34164.218079] CPU: 2 PID: 345632 Comm: kworker/u17:0 Not tainted 6.9.9 #1-NixOS
[34164.218082] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Extreme4, BIOS P3.50 03/11/2018
[34164.218083] Workqueue: bcachefs bch2_write_point_do_index_updates [bcachefs]
[34164.218194] RIP: 0010:__bch2_write_index+0x281/0x290 [bcachefs]
[34164.218258] Code: 03 00 5a e9 77 ff ff ff be 1e 00 00 00 44 89 e7 e8 44 42 fc ff 84 c0 0f 85 62 ff ff ff 4d 8b b5 38 01 00 00 e9 3f ff ff ff 90 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90
[34164.218260] RSP: 0018:ffffa43fc107be10 EFLAGS: 00010246
[34164.218262] RAX: ffff917008845c20 RBX: ffff916c0f2c0000 RCX: 0000000000000040
[34164.218263] RDX: ffff917008845c20 RSI: 0000000000000008 RDI: 00081d896d91ffff
[34164.218275] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000003
[34164.218276] R10: ffffa43fc107bdf8 R11: ffffffffabb3a128 R12: 0000000000000000
[34164.218277] R13: ffff917008845a98 R14: ffff917008845be0 R15: dead000000000100
[34164.218278] FS:  0000000000000000(0000) GS:ffff917117700000(0000) knlGS:0000000000000000
[34164.218280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34164.218281] CR2: 00007fc1f1b17000 CR3: 00000003bde20001 CR4: 00000000001706f0
[34164.218283] Call Trace:
[34164.218285]  <TASK>
[34164.218287]  ? die+0x36/0x90
[34164.218292]  ? do_trap+0xdd/0x100
[34164.218294]  ? __bch2_write_index+0x281/0x290 [bcachefs]
[34164.218353]  ? do_error_trap+0x6a/0x90
[34164.218356]  ? __bch2_write_index+0x281/0x290 [bcachefs]
[34164.218413]  ? exc_invalid_op+0x51/0x70
[34164.218418]  ? __bch2_write_index+0x281/0x290 [bcachefs]
[34164.218487]  ? asm_exc_invalid_op+0x1a/0x20
[34164.218491]  ? __bch2_write_index+0x281/0x290 [bcachefs]
[34164.218575]  bch2_write_point_do_index_updates+0xb1/0x160 [bcachefs]
[34164.218635]  process_one_work+0x183/0x3a0
[34164.218640]  worker_thread+0x245/0x350
[34164.218643]  ? __pfx_worker_thread+0x10/0x10
[34164.218646]  kthread+0xd0/0x100
[34164.218650]  ? __pfx_kthread+0x10/0x10
[34164.218652]  ret_from_fork+0x34/0x50
[34164.218655]  ? __pfx_kthread+0x10/0x10
[34164.218657]  ret_from_fork_asm+0x1a/0x30
[34164.218662]  </TASK>
[34164.218663] Modules linked in: rfcomm ccm nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype overlay xt_CHECKSUM xt_MASQUERADE ipt_REJECT nf_reject_ipv4 nft_chain_nat uhid cmac algif_hash algif_skcipher af_alg bnep msr af_packet btrfs blake2b_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core cmdlinepart spi_nor mtd intel_rapl_msr mei_hdcp at24 iTCO_wdt mei_pxp spi_intel_platform intel_pmc_bxt watchdog spi_intel intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 iwlmvm sha1_ssse3 xt_conntrack uvcvideo mac80211 aesni_intel nls_iso8859_1 nls_cp437 vfat fat libarc4 crypto_simd ip6t_rpfilter iwlwifi ipt_rpfilter uvc cryptd xt_pkttype rapl btusb xt_LOG btrtl gspca_vc032x intel_cstate nf_log_syslog snd_hda_codec_realtek snd_usb_audio btintel xt_tcpudp btbcm mxm_wmi snd_hda_codec_generic cfg80211 gspca_main btmtk nft_compat intel_uncore snd_hda_scodec_component snd_hda_codec_hdmi i2c_i801
[34164.218704]  snd_usbmidi_lib videobuf2_vmalloc videobuf2_memops i2c_smbus snd_ump snd_hda_intel ledtrig_audio nf_tables mei_me bluetooth snd_intel_dspcfg snd_rawmidi videobuf2_v4l2 sch_fq_codel hid_jabra lpc_ich mei uinput snd_intel_sdw_acpi atkbd snd_seq_device videodev snd_hda_codec ecdh_generic rfkill snd_hda_core libps2 videobuf2_common snd_hwdep snd_pcm ecc mc serio crc16 pl2303 snd_timer vivaldi_fmap snd soundcore tiny_power_button loop button cpufreq_ondemand xt_nat input_leds nf_nat led_class joydev evdev nf_conntrack mousedev nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth uas bridge mac_hid stp llc tun kvm_intel kvm vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd fuse efi_pstore configfs nfnetlink zram efivarfs dmi_sysfs ip_tables x_tables autofs4 poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha bcachefs libcrc32c crc32c_generic lz4_compress lz4hc_compress xor raid6_pq hid_generic usbhid hid i915 usb_storage sd_mod i2c_algo_bit drm_buddy ttm nvme ahci intel_gtt libahci
[34164.218749]  e1000e drm_display_helper nvme_core libata nvme_auth t10_pi crc64_rocksoft xhci_pci crc_t10dif xhci_pci_renesas crct10dif_generic scsi_mod xhci_hcd cec ptp crc64 crct10dif_pclmul crct10dif_common crc32c_intel sha256_ssse3 ehci_pci ehci_hcd pps_core firmware_class scsi_common video rtc_cmos wmi backlight dm_snapshot dm_bufio dm_mod dax
[34164.218769] ---[ end trace 0000000000000000 ]---
[34164.797231] RIP: 0010:__bch2_write_index+0x281/0x290 [bcachefs]
[34164.797319] Code: 03 00 5a e9 77 ff ff ff be 1e 00 00 00 44 89 e7 e8 44 42 fc ff 84 c0 0f 85 62 ff ff ff 4d 8b b5 38 01 00 00 e9 3f ff ff ff 90 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90
[34164.797321] RSP: 0018:ffffa43fc107be10 EFLAGS: 00010246
[34164.797324] RAX: ffff917008845c20 RBX: ffff916c0f2c0000 RCX: 0000000000000040
[34164.797326] RDX: ffff917008845c20 RSI: 0000000000000008 RDI: 00081d896d91ffff
[34164.797327] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000003
[34164.797328] R10: ffffa43fc107bdf8 R11: ffffffffabb3a128 R12: 0000000000000000
[34164.797329] R13: ffff917008845a98 R14: ffff917008845be0 R15: dead000000000100
[34164.797330] FS:  0000000000000000(0000) GS:ffff917117700000(0000) knlGS:0000000000000000
[34164.797332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34164.797333] CR2: 00007fc1f1b17000 CR3: 00000003c57e8002 CR4: 00000000001706f0
bcachefs fs usage
Filesystem: 677cf0a7-1abe-4ce3-876c-2ca63301229d
Size:                       6.29 TiB
Used:                       6.01 TiB
Online reserved:            14.5 MiB

Data type       Required/total  Durability    Devices
reserved:       1/0                 [] 131 MiB
btree:          1/3             3             [sde1 nvme0n1p2 sdf1] 6.43 GiB
btree:          1/3             3             [sdc1 sde1 sdh1]    1.63 GiB
btree:          1/2             2             [sdc1 sdf1]         8.07 GiB
btree:          1/3             3             [sdc1 nvme0n1p2 sdf1] 6.86 GiB
btree:          1/2             2             [sdc1 nvme0n1p2]    8.27 GiB
btree:          1/3             3             [sde1 sdd1 nvme0n1p2] 1.63 GiB
btree:          1/3             3             [sdd1 nvme0n1p2 sdf1] 39.5 GiB
btree:          1/2             2             [sdc1 sde1]         5.06 GiB
btree:          1/2             2             [sde1 nvme0n1p2]    7.77 GiB
btree:          1/2             2             [sde1 sdf1]         7.62 GiB
btree:          1/3             3             [sdc1 sdd1 nvme0n1p2] 1.67 GiB
user:           1/1             1             [sdf1]               376 KiB
user:           1/1             1             [sdh1]              2.60 TiB
user:           1/1             1             [sdc1]              91.2 GiB
user:           1/1             1             [nvme0n1p2]          405 GiB
user:           1/1             1             [sde1]              85.2 GiB
user:           1/1             1             [sdd1]              88.2 GiB
user:           1/1             1             [sdb1]              2.63 TiB
cached:         1/1             1             [sdd1]               178 GiB
cached:         1/1             1             [sde1]              3.47 GiB
cached:         1/1             1             [sdc1]              3.56 GiB
cached:         1/1             1             [sdh1]              1.62 GiB
cached:         1/1             1             [nvme0n1p2]          268 GiB
cached:         1/1             1             [sdf1]               194 GiB

hdd.hdd1 (device 2):            sdh1              ro
                                data         buckets    fragmented
  free:                      125 GiB          513524
  sb:                       3.00 MiB              13       252 KiB
  journal:                  2.00 GiB            8192
  btree:                     555 MiB            2220
  user:                     2.60 TiB        10911306      11.6 MiB
  cached:                   1.62 GiB           11094
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                 2.73 TiB        11446349

hdd.hdd2 (device 5):            sdb1              rw
                                data         buckets    fragmented
  free:                     87.7 GiB           89846
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                         0 B               0
  user:                     2.63 TiB         2763545       197 MiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                 2.73 TiB         2861587

hdd.hdd3 (device 3):            sdd1              rw
                                data         buckets    fragmented
  free:                     2.44 TiB         2564073
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                    14.3 GiB           14778       157 MiB
  user:                     88.2 GiB           90872       519 MiB
  cached:                    178 GiB          183666
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:             2.00 MiB               2
  capacity:                 2.73 TiB         2861587

ssd.ssd1 (device 0):            sdc1              rw
                                data         buckets    fragmented
  free:                     9.31 GiB           38133
  sb:                       3.00 MiB              13       252 KiB
  journal:                   954 MiB            3815
  btree:                    14.1 GiB           57699
  user:                     91.2 GiB          373695       153 KiB
  cached:                   3.56 GiB           15058
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                  119 GiB          488413

ssd.ssd2 (device 1):            sde1              rw
                                data         buckets    fragmented
  free:                     8.73 GiB           35750
  sb:                       3.00 MiB              13       252 KiB
  journal:                   894 MiB            3577
  btree:                    13.5 GiB           55112
  user:                     85.2 GiB          348893       321 KiB
  cached:                   3.47 GiB           14544
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                  112 GiB          457889

ssd.ssd4 (device 4):       nvme0n1p2              rw
                                data         buckets    fragmented
  free:                      245 GiB          501681
  sb:                       3.00 MiB               7       508 KiB
  journal:                  3.91 GiB            8000
  btree:                    26.7 GiB           57045      1.13 GiB
  user:                      405 GiB          829658      1.17 MiB
  cached:                    268 GiB          556027
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:             2.00 MiB               4
  capacity:                  953 GiB         1952422

ssd.ssd5 (device 6):            sdf1              rw
                                data         buckets    fragmented
  free:                     9.36 GiB           19179
  sb:                       3.00 MiB               7       508 KiB
  journal:                  1.82 GiB            3726
  btree:                    25.5 GiB           54418      1.12 GiB
  user:                      376 KiB               2       648 KiB
  cached:                    194 GiB          399614
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:             1.00 MiB               2
  capacity:                  233 GiB          476948

@Lykos153
Copy link

I just did another fsck but interestingly it didn't find anything. So I guess the errors it found in #716 were related to the other error.

fsck dmesg log
[    8.731568] stage-1-init: [Tue Jul 23 18:40:37 UTC 2024] enter passphrase for /bcachefs: unlocking successful.
[    8.742807] stage-1-init: [Tue Jul 23 18:40:37 UTC 2024] mounting none on /...
[    8.756741] stage-1-init: [Tue Jul 23 18:40:37 UTC 2024] mounting UUID=677cf0a7-1abe-4ce3-876c-2ca63301229d on /bcachefs...
[    8.812294] raid6: avx2x4   gen() 34164 MB/s
[    8.829295] raid6: avx2x2   gen() 34133 MB/s
[    8.846295] raid6: avx2x1   gen() 26386 MB/s
[    8.846296] raid6: using algorithm avx2x4 gen() 34164 MB/s
[    8.863295] raid6: .... xor() 12281 MB/s, rmw enabled
[    8.863296] raid6: using avx2x2 recovery algorithm
[    8.864079] xor: automatically using best checksumming function   avx       
[    8.997813] bcachefs (UUID=677cf0a7-1abe-4ce3-876c-2ca63301229d): error reading superblock: error opening UUID=677cf0a7-1abe-4ce3-876c-2ca63301229d: ENOENT
[    9.043527] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): mounting version 1.7: mi_btree_bitmap opts=metadata_replicas=3,compression=zstd,foreground_target=ssd,background_target=hdd,promote_target=ssd,fsck,fix_errors=yes
[    9.043533] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): recovering from unclean shutdown
[   66.545187] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): journal read done, replaying entries 3334951-3336954
[   66.545193] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): dropped unflushed entries 3336955-3336955
[   76.086495] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): alloc_read... done
[   76.250702] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): stripes_read... done
[   76.250710] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): snapshots_read... done
[   76.250723] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_allocations... done
[  286.333132] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): going read-write
[  286.568557] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): journal_replay... done
[  290.370381] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_alloc_info... done
[  307.551404] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_lrus... done
[  308.894373] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_btree_backpointers... done
[  503.241502] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_backpointers_to_extents...
[  518.711628] ------------[ cut here ]------------
[  518.711632] btree trans held srcu lock (delaying memory reclaim) for 15 seconds
[  518.711647] WARNING: CPU: 3 PID: 259 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  518.711707] Modules linked in: poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha bcachefs libcrc32c crc32c_generic lz4_compress lz4hc_compress xor raid6_pq hid_generic usbhid hid sd_mod usb_storage i915 ahci libahci i2c_algo_bit drm_buddy libata ttm nvme intel_gtt scsi_mod nvme_core drm_display_helper xhci_pci xhci_pci_renesas xhci_hcd ehci_pci e1000e ehci_hcd crc32c_intel nvme_auth sha256_ssse3 t10_pi scsi_common firmware_class crc64_rocksoft ptp crc_t10dif cec crct10dif_generic crct10dif_pclmul crc64 pps_core crct10dif_common video wmi rtc_cmos backlight dm_snapshot dm_bufio dm_mod dax
[  518.711736] CPU: 3 PID: 259 Comm: mount.bcachefs Not tainted 6.9.9 #1-NixOS
[  518.711737] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Extreme4, BIOS P3.50 03/11/2018
[  518.711738] RIP: 0010:bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  518.711781] Code: e9 c5 48 c7 c7 d0 85 ec c0 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 c1 ea 03 48 f7 e2 48 89 d6 48 c1 ee 04 e8 c6 3c 34 c4 90 <0f> 0b 90 90 e9 5f ff ff ff 90 0f 0b 90 e9 6c ff ff ff 0f 1f 00 90
[  518.711783] RSP: 0018:ffffabd640b33780 EFLAGS: 00010286
[  518.711784] RAX: 0000000000000000 RBX: ffff8cd2ce03c000 RCX: c0000000ffffdfff
[  518.711785] RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: 0000000000000001
[  518.711786] RBP: ffff8cd2cf580000 R08: 0000000000000000 R09: 0000000000000003
[  518.711787] R10: ffffabd640b33628 R11: ffffffff86d3a128 R12: 0000000000000001
[  518.711788] R13: ffff8cd2ce03c000 R14: 0000000000000000 R15: ffff8cd2ce03c000
[  518.711789] FS:  00007ff36dbdf080(0000) GS:ffff8cd7d7780000(0000) knlGS:0000000000000000
[  518.711790] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  518.711791] CR2: 00005633ddc86000 CR3: 000000010a240006 CR4: 00000000001706f0
[  518.711792] Call Trace:
[  518.711794]  <TASK>
[  518.711796]  ? __warn+0x80/0x120
[  518.711799]  ? bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  518.711839]  ? report_bug+0x164/0x190
[  518.711843]  ? handle_bug+0x3d/0x80
[  518.711846]  ? exc_invalid_op+0x17/0x70
[  518.711848]  ? asm_exc_invalid_op+0x1a/0x20
[  518.711851]  ? bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  518.711891]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[  518.711930]  bch2_trans_begin+0x5a5/0x600 [bcachefs]
[  518.711970]  bch2_btree_iter_peek_node_and_restart+0x44/0x50 [bcachefs]
[  518.712010]  bch2_get_btree_in_memory_pos+0x1d0/0x2e0 [bcachefs]
[  518.712046]  bch2_check_backpointers_to_extents+0xa8/0x5d0 [bcachefs]
[  518.712082]  ? __bch2_print+0x87/0xe0 [bcachefs]
[  518.712132]  bch2_run_recovery_pass+0x38/0xa0 [bcachefs]
[  518.712183]  bch2_run_recovery_passes+0xb6/0x180 [bcachefs]
[  518.712233]  bch2_fs_recovery+0xc00/0x1330 [bcachefs]
[  518.712285]  ? __bch2_print+0x87/0xe0 [bcachefs]
[  518.712354]  ? bch2_printbuf_exit+0x20/0x30 [bcachefs]
[  518.712404]  ? print_mount_opts+0x131/0x180 [bcachefs]
[  518.712449]  ? bch2_recalc_capacity+0x106/0x370 [bcachefs]
[  518.712484]  bch2_fs_start+0x2f6/0x460 [bcachefs]
[  518.712530]  bch2_fs_open+0x6b7/0x6d0 [bcachefs]
[  518.712575]  bch2_mount+0x5bd/0x790 [bcachefs]
[  518.712625]  legacy_get_tree+0x2b/0x50
[  518.712629]  vfs_get_tree+0x29/0xe0
[  518.712632]  path_mount+0x4ca/0xb10
[  518.712634]  __x64_sys_mount+0x11a/0x150
[  518.712636]  do_syscall_64+0xb8/0x200
[  518.712639]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  518.712641] RIP: 0033:0x7ff36dd0611e
[  518.712648] Code: 48 8b 0d fd 2c 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca 2c 0d 00 f7 d8 64 89 01 48
[  518.712649] RSP: 002b:00007fff83a39518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  518.712651] RAX: ffffffffffffffda RBX: 00005633ddc1bca0 RCX: 00007ff36dd0611e
[  518.712652] RDX: 00005633ddc1bca0 RSI: 00005633ddc64ef0 RDI: 00005633ddc1ed10
[  518.712653] RBP: 8000000000000000 R08: 00005633ddc24960 R09: 0000000000000000
[  518.712653] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000020
[  518.712654] R13: 00005633ddc24960 R14: 00005633ddc64ef0 R15: 0000000000000009
[  518.712656]  </TASK>
[  518.712656] ---[ end trace 0000000000000000 ]---
[  753.154046] ------------[ cut here ]------------
[  753.154048] btree trans held srcu lock (delaying memory reclaim) for 11 seconds
[  753.154065] WARNING: CPU: 3 PID: 259 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  753.154123] Modules linked in: poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha bcachefs libcrc32c crc32c_generic lz4_compress lz4hc_compress xor raid6_pq hid_generic usbhid hid sd_mod usb_storage i915 ahci libahci i2c_algo_bit drm_buddy libata ttm nvme intel_gtt scsi_mod nvme_core drm_display_helper xhci_pci xhci_pci_renesas xhci_hcd ehci_pci e1000e ehci_hcd crc32c_intel nvme_auth sha256_ssse3 t10_pi scsi_common firmware_class crc64_rocksoft ptp crc_t10dif cec crct10dif_generic crct10dif_pclmul crc64 pps_core crct10dif_common video wmi rtc_cmos backlight dm_snapshot dm_bufio dm_mod dax
[  753.154151] CPU: 3 PID: 259 Comm: mount.bcachefs Tainted: G        W          6.9.9 #1-NixOS
[  753.154153] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Extreme4, BIOS P3.50 03/11/2018
[  753.154154] RIP: 0010:bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  753.154196] Code: e9 c5 48 c7 c7 d0 85 ec c0 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 c1 ea 03 48 f7 e2 48 89 d6 48 c1 ee 04 e8 c6 3c 34 c4 90 <0f> 0b 90 90 e9 5f ff ff ff 90 0f 0b 90 e9 6c ff ff ff 0f 1f 00 90
[  753.154198] RSP: 0018:ffffabd640b33780 EFLAGS: 00010286
[  753.154199] RAX: 0000000000000000 RBX: ffff8cd2ce03c000 RCX: c0000000ffffdfff
[  753.154200] RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: 0000000000000001
[  753.154201] RBP: ffff8cd2cf580000 R08: 0000000000000000 R09: 0000000000000003
[  753.154202] R10: ffffabd640b33628 R11: ffffffff86d3a128 R12: ffff8cd2ce03c4f8
[  753.154203] R13: ffff8cd2ce03c000 R14: 0000000000000007 R15: ffff8cd2ce03c4f8
[  753.154204] FS:  00007ff36dbdf080(0000) GS:ffff8cd7d7780000(0000) knlGS:0000000000000000
[  753.154205] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  753.154206] CR2: 00005633ddc86000 CR3: 000000010a240006 CR4: 00000000001706f0
[  753.154207] Call Trace:
[  753.154210]  <TASK>
[  753.154212]  ? __warn+0x80/0x120
[  753.154215]  ? bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  753.154256]  ? report_bug+0x164/0x190
[  753.154258]  ? handle_bug+0x3d/0x80
[  753.154262]  ? exc_invalid_op+0x17/0x70
[  753.154267]  ? asm_exc_invalid_op+0x1a/0x20
[  753.154269]  ? bch2_trans_srcu_unlock+0x11b/0x130 [bcachefs]
[  753.154310]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[  753.154349]  bch2_trans_begin+0x5a5/0x600 [bcachefs]
[  753.154390]  bch2_btree_iter_peek_node_and_restart+0x44/0x50 [bcachefs]
[  753.154429]  bch2_get_btree_in_memory_pos+0x1d0/0x2e0 [bcachefs]
[  753.154465]  bch2_check_backpointers_to_extents+0xa8/0x5d0 [bcachefs]
[  753.154501]  bch2_run_recovery_pass+0x38/0xa0 [bcachefs]
[  753.154555]  bch2_run_recovery_passes+0xb6/0x180 [bcachefs]
[  753.154605]  bch2_fs_recovery+0xc00/0x1330 [bcachefs]
[  753.154655]  ? __bch2_print+0x87/0xe0 [bcachefs]
[  753.154701]  ? bch2_printbuf_exit+0x20/0x30 [bcachefs]
[  753.154751]  ? print_mount_opts+0x131/0x180 [bcachefs]
[  753.154797]  ? bch2_recalc_capacity+0x106/0x370 [bcachefs]
[  753.154831]  bch2_fs_start+0x2f6/0x460 [bcachefs]
[  753.154878]  bch2_fs_open+0x6b7/0x6d0 [bcachefs]
[  753.154924]  bch2_mount+0x5bd/0x790 [bcachefs]
[  753.154975]  legacy_get_tree+0x2b/0x50
[  753.154979]  vfs_get_tree+0x29/0xe0
[  753.154981]  path_mount+0x4ca/0xb10
[  753.154984]  __x64_sys_mount+0x11a/0x150
[  753.154986]  do_syscall_64+0xb8/0x200
[  753.154989]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  753.154990] RIP: 0033:0x7ff36dd0611e
[  753.154998] Code: 48 8b 0d fd 2c 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca 2c 0d 00 f7 d8 64 89 01 48
[  753.154999] RSP: 002b:00007fff83a39518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  753.155001] RAX: ffffffffffffffda RBX: 00005633ddc1bca0 RCX: 00007ff36dd0611e
[  753.155002] RDX: 00005633ddc1bca0 RSI: 00005633ddc64ef0 RDI: 00005633ddc1ed10
[  753.155003] RBP: 8000000000000000 R08: 00005633ddc24960 R09: 0000000000000000
[  753.155003] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000020
[  753.155004] R13: 00005633ddc24960 R14: 00005633ddc64ef0 R15: 0000000000000009
[  753.155006]  </TASK>
[  753.155006] ---[ end trace 0000000000000000 ]---
[  960.895426]  done
[  960.895431] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_extents_to_backpointers... done
[ 1125.005975] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_alloc_to_lru_refs... done
[ 1131.814925] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_snapshot_trees... done
[ 1131.814937] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_snapshots... done
[ 1131.814946] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_subvols... done
[ 1131.819369] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_subvol_children... done
[ 1131.819376] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): delete_dead_snapshots... done
[ 1131.819378] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_inodes... done
[ 1156.324947] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_extents... done
[ 1210.614355] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_indirect_extents... done
[ 1233.324384] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_dirents... done
[ 1284.263334] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_xattrs... done
[ 1284.263573] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_root... done
[ 1284.263579] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_subvolume_structure... done
[ 1284.263595] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_directory_structure... done
[ 1320.433753] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): check_nlinks... done
[ 1334.953506] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): resume_logged_ops... done
[ 1334.954660] bcachefs (677cf0a7-1abe-4ce3-876c-2ca63301229d): delete_dead_inodes... done
[ 1335.056021] stage-1-init: [Tue Jul 23 19:02:43 UTC 2024] mounting /mnt-root/bcachefs/nix on /nix...

@koverstreet
Copy link
Owner

I'm trying to fix the assertion pop, but I don't see an assert at line 524 in either 6.9 or 6.10; can you show me what you have there?

@jpf91
Copy link
Author

jpf91 commented Aug 14, 2024

Thank you for looking into this!

It's actually an 6.8 kernel, but based on the ubuntu kernel sources. I verified my journal logs again and double-checked that this error was indeed seen with the 6.8.8-2-pve kernel. It took some time to find the proper sources, but this kernel is based on Ubuntu-6.8.0-38.38 which can be found here: https://git.proxmox.com/?p=mirror_ubuntu-kernels.git;a=commit;h=ffe6f6578643a9ae802c3eb5a64096f51a0e0524 (Proxmox adds some more patches, but nothing bcachefs related)

Direct link to the source code with the assert:
https://git.proxmox.com/?p=mirror_ubuntu-kernels.git;a=blob;f=fs/bcachefs/io_write.c;h=2c098ac017b30b6a4b5d016e9f5dde93ee258f2f;hb=ffe6f6578643a9ae802c3eb5a64096f51a0e0524#l524

And here's an excerpt:

 503 static void __bch2_write_index(struct bch_write_op *op)
 504 {
 505         struct bch_fs *c = op->c;
 506         struct keylist *keys = &op->insert_keys;
 507         unsigned dev;
 508         int ret = 0;
 509 
 510         if (unlikely(op->flags & BCH_WRITE_IO_ERROR)) {
 511                 ret = bch2_write_drop_io_error_ptrs(op);
 512                 if (ret)
 513                         goto err;
 514         }
 515 
 516         if (!bch2_keylist_empty(keys)) {
 517                 u64 sectors_start = keylist_sectors(keys);
 518 
 519                 ret = !(op->flags & BCH_WRITE_MOVE)
 520                         ? bch2_write_index_default(op)
 521                         : bch2_data_update_index_update(op);
 522 
 523                 BUG_ON(bch2_err_matches(ret, BCH_ERR_transaction_restart));
 524                 BUG_ON(keylist_sectors(keys) && !ret);

Upstream 6.8.8 has the same assert here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/bcachefs/io_write.c?h=v6.8.8#n524

koverstreet pushed a commit that referenced this issue Aug 26, 2024
This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

#717

Signed-off-by: Kent Overstreet <[email protected]>
koverstreet pushed a commit that referenced this issue Aug 27, 2024
This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

#717

Signed-off-by: Kent Overstreet <[email protected]>
@koverstreet
Copy link
Owner

Fixed by d269356 bcachefs: Fix bch2_extents_match() false positive

zhuyj pushed a commit to zhuyj/linux that referenced this issue Sep 3, 2024
This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
koverstreet pushed a commit that referenced this issue Sep 10, 2024
[ Upstrea commit  d269356 ]

This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

#717

Signed-off-by: Kent Overstreet <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Sep 14, 2024
[ Upstream commit  d269356 ]

This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Sep 15, 2024
[ Upstream commit  d269356 ]

This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Sep 16, 2024
[ Upstream commit  d269356 ]

This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Mr-Bossman pushed a commit to Mr-Bossman/linux that referenced this issue Sep 16, 2024
This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Sep 17, 2024
[ Upstream commit  d269356 ]

This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Sep 17, 2024
[ Upstream commit  d269356 ]

This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gregkh pushed a commit to gregkh/linux that referenced this issue Sep 18, 2024
[ Upstream commit  d269356 ]

This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
intelfx pushed a commit to intelfx/linux that referenced this issue Oct 11, 2024
This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
yhamamachi pushed a commit to yhamamachi/linux-pcie-virtio-net that referenced this issue Oct 23, 2024
This was caught as a very rare nonce inconsistency, on systems with
encryption and replication (and tiering, or some form of rebalance
operation running):

[Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
[Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
[Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
[Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only

bch2_extents_match() was reporting true for extents that did not
actually point to the same data.

bch2_extent_match() iterates over pairs of pointers, looking for
pointers that point to the same location on disk (with matching
generation numbers). However one or both extents may have been trimmed
(or merged) and they might not have the same disk offset: it corrects
for this by subtracting the key offset and the checksum entry offset.

However, this failed when an extent was immediately partially
overwritten, and the new overwrite was allocated the next adjacent disk
space.

Normally, with compression off, this would never cause a bug, since the
new extent would have to be immediately after the old extent for the
pointer offsets to match, and the rebalance index update path is not
looking for an extent outside the range of the extent it moved.

However with compression enabled, extents take up less space on disk
than they do in the btree index space - and spuriously matching after
partial overwrite is possible.

To fix this, add a secondary check, that strictly checks that the
regions pointed to on disk overlap.

koverstreet/bcachefs#717

Signed-off-by: Kent Overstreet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants