Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bcachefs crash (Oops) in LZ4HC_compress_generic+0x3b3/0x1b90 #753

Open
thememika opened this issue Sep 28, 2024 · 3 comments
Open

Bcachefs crash (Oops) in LZ4HC_compress_generic+0x3b3/0x1b90 #753

thememika opened this issue Sep 28, 2024 · 3 comments

Comments

@thememika
Copy link

🥝Kernel: 6.10.0
Hello, developers. Ive just decided to enable compression on a production filesystem because it was critically running out of space. I did it by:

echo "lz4:15" > /sys/fs/bcachefs/MY_UID/options/compression

The filesystem was under heavy use at that moment.
bch-rebalance thread became 100% CPU ,I think it means that compression has started.

But after less than a minute I got a crash of a kworker process in a workqueue.

[509621.528103] [T10653] BUG: unable to handle page fault for address: ffffc9002a03d000
[509621.528109] [T10653] #PF: supervisor write access in kernel mode
[509621.528110] [T10653] #PF: error_code(0x0002) - not-present page
[509621.528111] [T10653] PGD 100000067 P4D 100000067 PUD 1002f2067 PMD 9b9af8067 PTE 0
[509621.528116] [T10653] Oops: Oops: 0002 [#1] PREEMPT_RT SMP
[509621.528118] [T10653] CPU: 7 PID: 10653 Comm: kworker/u80:4 Kdump: loaded Tainted: G        W  OE      6.10.0-rc6-blahaj-rt+ #14
[509621.528121] [T10653] Hardware name: OEM X79G/X79G, BIOS 4.6.5 08/02/2022
[509621.528122] [T10653] Workqueue: writeback wb_workfn (flush-bcachefs-2)
[509621.528127] [T10653] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528131] [T10653] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[509621.528133] [T10653] RSP: 0018:ffffc9002a4cb430 EFLAGS: 00010296
[509621.528135] [T10653] RAX: ffffc9002a03d001 RBX: ffffc9002a02d000 RCX: 2704138e8569e88d
[509621.528136] [T10653] RDX: ffffc9002a03cffa RSI: 000000000000fef9 RDI: ffffc90029f7cefd
[509621.528137] [T10653] RBP: ffffc9002a4cb540 R08: 0000000000000004 R09: 0000000000000001
[509621.528138] [T10653] R10: 0000000000000002 R11: ffffc90029f7cf00 R12: ffff888211da0000
[509621.528139] [T10653] R13: ffffc9002a02d000 R14: 0000000000000000 R15: 0000000000010000
[509621.528140] [T10653] FS:  0000000000000000(0000) GS:ffff888fff9c0000(0000) knlGS:0000000000000000
[509621.528141] [T10653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[509621.528143] [T10653] CR2: ffffc9002a03d000 CR3: 00000007e58b7003 CR4: 00000000001706f0
[509621.528144] [T10653] Call Trace:
[509621.528146] [T10653]  <TASK>
[509621.528147] [T10653]  ? show_regs.part.0+0x22/0x24
[509621.528152] [T10653]  ? __die_body.cold+0x8/0x1c
[509621.528155] [T10653]  ? __die+0x2e/0x40
[509621.528157] [T10653]  ? page_fault_oops+0x102/0x270
[509621.528159] [T10653]  ? search_bpf_extables+0xdb/0xf0
[509621.528161] [T10653]  ? search_exception_tables+0x60/0x70
[509621.528164] [T10653]  ? fixup_exception+0x32/0x450
[509621.528166] [T10653]  ? kernelmode_fixup_or_oops.isra.0+0x45/0x50
[509621.528167] [T10653]  ? bad_area_nosemaphore+0x119/0x160
[509621.528169] [T10653]  ? do_kern_addr_fault+0x7b/0x90
[509621.528171] [T10653]  ? exc_page_fault+0x194/0x1a0
[509621.528175] [T10653]  ? asm_exc_page_fault+0x2b/0x30
[509621.528179] [T10653]  ? LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528183] [T10653]  LZ4_compress_HC+0x94/0xb0
[509621.528185] [T10653]  attempt_compress+0x205/0x270
[509621.528190] [T10653]  ? kvmalloc_node_noprof+0x3f/0xe0
[509621.528193] [T10653]  ? mempool_kvmalloc+0x1e/0x20
[509621.528196] [T10653]  ? mempool_alloc_noprof+0x45/0x140
[509621.528199] [T10653]  bch2_bio_compress+0x21a/0x5a0
[509621.528201] [T10653]  __bch2_write+0x1791/0x1ba0
[509621.528205] [T10653]  bch2_write+0x19e/0x440
[509621.528208] [T10653]  ? bch2_write+0x19e/0x440
[509621.528211] [T10653]  __bch2_writepage+0x452/0x910
[509621.528213] [T10653]  ? bchfs_read.isra.0+0xd00/0xd00
[509621.528214] [T10653]  write_cache_pages+0x54/0x90
[509621.528218] [T10653]  bch2_writepages+0x59/0xe0
[509621.528219] [T10653]  ? rt_spin_unlock+0x1b/0x50
[509621.528222] [T10653]  do_writepages+0xd1/0x230
[509621.528223] [T10653]  ? psi_group_change+0x1c5/0x410
[509621.528227] [T10653]  __writeback_single_inode+0x44/0x340
[509621.528229] [T10653]  writeback_sb_inodes+0x222/0x5d0
[509621.528233] [T10653]  __writeback_inodes_wb+0x54/0x160
[509621.528234] [T10653]  wb_writeback+0x1ff/0x370
[509621.528236] [T10653]  ? get_nr_inodes+0x56/0xa0
[509621.528239] [T10653]  wb_workfn+0x1e6/0x480
[509621.528241] [T10653]  ? finish_task_switch.isra.0+0x8f/0x2b0
[509621.528244] [T10653]  process_one_work+0x188/0x360
[509621.528247] [T10653]  worker_thread+0x350/0x4e0
[509621.528249] [T10653]  ? max_active_store+0xe0/0xe0
[509621.528252] [T10653]  kthread+0xe5/0x110
[509621.528254] [T10653]  ? kthread_park+0x90/0x90
[509621.528256] [T10653]  ret_from_fork+0x3a/0x60
[509621.528258] [T10653]  ? kthread_park+0x90/0x90
[509621.528260] [T10653]  ret_from_fork_asm+0x11/0x20
[509621.528263] [T10653]  </TASK>
[509621.528264] [T10653] Modules linked in: quota_v2 nls_iso8859_1 tls 88x2bu(OE) option usb_wwan usbserial cdc_acm netlink_diag yellowfb(OE) whitefb(OE) redfb(OE) purplefb(OE) pinkfb(OE) peachfb(OE) orangefb(OE) magentafb(OE) indigofb(OE) greenfb(OE) cyanfb(OE) brownfb(OE) bluefb(OE) blackfb(OE) missmikatools(OE) mikakernelm cfg80211 8021q garp mrp stp llc nft_chain_nat xt_MASQUERADE xt_nat nf_nat nft_limit ipt_REJECT nf_reject_ipv4 xt_recent xt_limit xt_pkttype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype xt_tcpudp nft_compat nf_tables x_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel uvcvideo cdc_ether usbnet snd_hda_codec_realtek snd_hda_codec_generic videobuf2_vmalloc kvm snd_usb_audio snd_hda_codec_hdmi snd_hda_scodec_component uvc snd_hda_intel videobuf2_memops snd_intel_dspcfg videobuf2_v4l2 crct10dif_pclmul snd_intel_sdw_acpi videodev crc32_pclmul snd_hda_codec ghash_clmulni_intel sha512_ssse3 sha256_ssse3 iTCO_wdt intel_pmc_bxt
[509621.528306] [T10653]  missmikafs sha1_ssse3 iTCO_vendor_support aesni_intel snd_hda_core crypto_simd cryptd snd_usbmidi_lib rapl snd_rawmidi psmouse videobuf2_common r8152 intel_cstate snd_seq_device snd_hwdep mc snd_pcm r8169 mii pcspkr serio_raw snd_timer snd i2c_i801 ioatdma realtek i2c_mux lpc_ich i2c_smbus dca soundcore mac_hid
[509621.528321] [T10653] CR2: ffffc9002a03d000
[509621.528323] [T10653] ---[ end trace 0000000000000000 ]---
[509621.528410] [T10653] pstore: backend (erst) writing error (-28)
[509621.528410] [T10653] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528410] [T10653] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[509621.528410] [T10653] RSP: 0018:ffffc9002a4cb430 EFLAGS: 00010296
[509621.528410] [T10653] RAX: ffffc9002a03d001 RBX: ffffc9002a02d000 RCX: 2704138e8569e88d
[509621.528410] [T10653] RDX: ffffc9002a03cffa RSI: 000000000000fef9 RDI: ffffc90029f7cefd
[509621.528410] [T10653] RBP: ffffc9002a4cb540 R08: 0000000000000004 R09: 0000000000000001
[509621.528410] [T10653] R10: 0000000000000002 R11: ffffc90029f7cf00 R12: ffff888211da0000
[509621.528410] [T10653] R13: ffffc9002a02d000 R14: 0000000000000000 R15: 0000000000010000
[509621.528410] [T10653] FS:  0000000000000000(0000) GS:ffff888fff9c0000(0000) knlGS:0000000000000000
[509621.528410] [T10653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[509621.528410] [T10653] CR2: ffffc9002a03d000 CR3: 00000007e58b7003 CR4: 00000000001706f0
[509621.528410] [T10653] note: kworker/u80:4[10653] exited with irqs disabled

I believe that last your function was attempt_compress+0x205/0x270. Then crash in the lz4 module/code.
I don't know if it is your bug, or an issue with configuration. My configuration is this:

Device:                                     (unknown device)
External UUID:                             <uuid>
Internal UUID:                             <uuid>
Magic number:                              <uuid>
Device index:                              0
Label:                                     <name>
Version:                                   1.7: mi_btree_bitmap
Version upgrade complete:                  1.7: mi_btree_bitmap
Oldest version on disk:                    1.7: mi_btree_bitmap
Created:                                   <date>
Sequence number:                           34
Time of last write:                        Sat Sep 28 12:12:12 2024
Superblock size:                           4.66 KiB/1.00 MiB
Clean:                                     0
Devices:                                   1
Sections:                                  members_v1,replicas_v0,quota,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  lz4,zstd,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue fix_safe panic [ro] 
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             lz4:15
  background_compression:                  none
  str_hash:                                [crc32c] crc64 siphash 
  metadata_target:                         none
  foreground_target:                       none
  background_target:                       none
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  acl:                                     1
  usrquota:                                1
  grpquota:                                1
  prjquota:                                1
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 160):
Device:                                    0
  Label:                                   (none)
  UUID:                                    <uuid>
  Size:                                    119 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 488386
  Last mount:                              Sun Sep 22 14:53:58 2024
  Last superblock write:                   34
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        4.00 MiB
  Btree allocated bitmap:                  0000000000000000000000000000000100000000000000000000000100011011
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1

errors (size 8):

Thank you very much!

The filesystem is still up and it accepts reads writes. bch-rebalance is also still running at 100%. I don't know if a reboot is required anytime soon. Just the manual sync() calls don't return.

@thememika
Copy link
Author

thememika commented Sep 28, 2024

UPD: bch-rebalance process has now just crashed in the same function (LZ4HC_compress_generic).
I decided to find out about this function in source code

# scripts/faddr2line ./vmlinux LZ4HC_compress_generic+0x3b3/0x1b90
LZ4HC_compress_generic+0x3b3/0x1b90:
LZ4_copy8 at lib/lz4/lz4defs.h:158
(inlined by) LZ4_wildCopy at lib/lz4/lz4defs.h:180
(inlined by) LZ4HC_encodeSequence at lib/lz4/lz4hc_compress.c:296                               (inlined by) LZ4HC_compress_generic at lib/lz4/lz4hc_compress.c:402

Also for attempt_compress:

attempt_compress+0x205/0x270:
attempt_compress at fs/bcachefs/compress.c:323 (discriminator 2)

I find you make call to the lz4 function here:

int ret = LZ4_compress_HC(
   src,  dst,   src_len, dst_len, 
   compression.level,workspace
);

And that function crashes on this code:

static FORCE_INLINE void LZ4_copy8(void *dst, const void *src) {
  #if LZ4_ARCH64  // this was True during kernel build
    U64 a = get_unaligned((const U64 *)src);
    /* THE CRASH IS HERE -> */      put_unaligned(a, (U64 *)dst);
  #else
    U32 a = get_unaligned((const U32 *)src);
    U32 b = get_unaligned((const U32 *)src + 1);

    ...

Respectively this is put_unaligned which is a macro in include/asm-generic/unaligned.h:

#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))

Respectively, macro "__put_unaligned_t" is finally defined in the same file:

#define __put_unaligned_t(type, val, ptr) do {					\
	struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);		\
	__pptr->x = (val);							\
} while (0)

Here is the exact crashing line:

__pptr->x = (val);

I wasn't able to understand myself why this action randomly faults.
As I observed, the base chance of the fault is very low. But when the compression is happening at 100% CPU, eventually after some time it happens...

@thememika
Copy link
Author

Interesting detail: I still haven't rebooted because the filesystem continues to work! Despite that it's bch-rebalance kthread is crashed. fsync() calls return. This is hilarious. Surprising and reliable design, thanks. It made my day.

@thememika
Copy link
Author

@koverstreet not fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant