Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsck kernel BUG at fs/bcachefs/btree_cache.c:280! #776

Closed
thememika opened this issue Nov 9, 2024 · 6 comments
Closed

fsck kernel BUG at fs/bcachefs/btree_cache.c:280! #776

thememika opened this issue Nov 9, 2024 · 6 comments

Comments

@thememika
Copy link

thememika commented Nov 9, 2024

Steps to reproduce:

  1. Create zeroed file
dd if=/dev/zero of=tmp bs=1M count=2000
  1. Losetup
losetup /dev/loop123 ./tmp
  1. Make FS
bcachefs format /dev/loop123
  1. Mount FS
mount -t bcachefs /dev/loop123 /mnt
  1. Copy any data to it (until "No space left on device", but not necessarily)
cp -xr   /usr   /mnt/
  1. Unmount filesystem
umount /mnt
  1. Corrupt the filesystem
dd  if=/dev/urandom  of=/dev/loop123 bs=1M oseek=1234 count=50

Run the command few times with oseek= set to various places of the Device (but not the superblock of course).

  1. Mount the filesystem with opts fsck,fix_errors
mount -t bcachefs -o fsck,fix_errors /dev/loop123 /mnt
  1. You will experience the crash
[210269.384000] [T26320] ------------[ cut here ]------------
[210269.384002] [T26320] kernel BUG at fs/bcachefs/btree_cache.c:280!
...
[210269.384022] [T26320] RIP: 0010:__bch2_btree_node_hash_insert+0x24f/0x290
...
[210269.384388] [T26320] ---[ end trace 0000000000000000 ]---

I'm using latest commit.

This must not happen: corrupt filesystem must not crash the computer.

Thanks!

@g2p
Copy link
Contributor

g2p commented Nov 9, 2024

Syzbot also found it:

https://lore.kernel.org/linux-bcachefs/[email protected]/T/
https://syzkaller.appspot.com/bug?extid=19c1a30221401d741bc2

@thememika
Copy link
Author

thememika commented Nov 9, 2024

@g2p I'm overall surprised by the overwhelming use of BUG* macros in Bcachefs code. It looks like if it was done to spite Linus, who specifically spoke against excessive BUG(). I can't understand why it's not possible to just do "emergency read only" or something like that and return to user with an error — but not a kernel crash...

@g2p
Copy link
Contributor

g2p commented Nov 9, 2024

A filesystem has a lot of invariants, not all of them should be handled at runtime. The various approaches are documented in Documentation/filesystems/bcachefs/CodingStyle.rst. It turns out this assert recently became reachable, and so the cause will be fixed.

But the realistic alternative to using BUG_ON in cases that are believed to be unreachable is that broken invariants are missed. Error paths for every "this should not happen" assert is not feasible, and in C, error paths, especially ones that can't be tested, are a large source of bugs.

@koverstreet
Copy link
Owner

Chasing down a BUG_ON() that pops is way easier than code that continues off into undefined behaviour la la land.

The bug that this patch was fixing was a use after free of btree node data buffers, which somehow lurked for years because it was so rarely hit - that's really, really bad.

This one's getting fixed right away.

@thememika
Copy link
Author

@koverstreet sorry but when is this "right away", besides the words?

@koverstreet
Copy link
Owner

This is fixed in my bcachefs-for-upstream branch and will be in the next pull request to Linus this week.

@thememika, I appreciate a more professional attitude here, or at least more coherent.

But if you want to test and confirm the fix, that would be lovely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants