-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.12] FS fails to mount due to "filesystem marked as clean but have deleted inode" #786
Comments
I too faced this error today.
Link to files- https://drive.google.com/drive/folders/1MI330A27LLChW3hTr8EvELCfFGlX45RV?usp=sharing I don't have full logs after running fsck but have the last few lines-
These seems to be same as the ones posted by @dominikpaulus. show-super (after running fsck):
(I am fairly certain deleted_inode_but_clean was not present before running fsck) Filesystem usage (after running fsck):
Host device:
|
@nitinkmr333 Are you using snapshots? I can confirm that this problem is almost definitely related to snapshot usage. It's not just about read-only snapshots, but about any snapshots. It seems that this occurs if a snapshot is taken while there are files on the FS that have been
Then, use this little C program:
Mount bcachefs, then run above C:
in another shell:
now press enter in the first to actually close the FD. Afterwards,
will fail due to broken inodes:
|
@dominikpaulus Great find! Yes, I have set up cron jobs to take a read-only snapshot of all my subvolumes at midnight everyday. I tried testing with your script and got same error on kernel 6.12.1 (Debian). I also tried this on another system with kernel 6.11.5 (NixOS). I am not getting any errors while mounting (and I am able to mount the image without any errors in dmesg) but I am getting errors while running fsck. Kernel 6.11-
2nd fsck run (running again after previous fsck)-
3rd fsck run (after previous)-
Subsequent fsck runs give same error as 3rd one. I think in my case, it is happening because of docker. I recently set up docker volumes (bind mounts) in one of the subvolumes in filesystem. The docker containers have open files inside the FS when snapshot is taken. I also saw some errors in subvolume's Unfortunately, I cannot show the exact error (I deleted those subvolumes) but running |
I created another issue about it- #790 @dominikpaulus, Can you check if |
I think I have the same issue on my system (also NixOS, kernel 6.12, single-device bcachefs as root, daily snapshots for backups that are immediately deleted again). The system boots with kernel 6.11 and I can indeed see broken files in the lost+found folder:
And corresponding dmesg output from that ls:
|
I just reproduced this on 6.12; it turns out this is already fixed in the master branch by 4814218 bcachefs: Use separate rhltable for bch2_inode_or_descendents_is_open() 6.11 had more bugs with snapshots and unlinked files, you'll all definitely want to upgrade to my master branch. |
@koverstreet: FYI, I tried the master branch (dd7d7f2), i.e. I applied the patches on top of 6.12.4 but it hangs indefinitely when trying to mount my encrypted bcachefs filesystem. Is it safe to just cherry-pick commit 4814218 on top of the latest 6.12 kernel? |
Yes. FYI, it's probably not "hung", just taking awhile to upgrade. I'm adding back a progress indicator and I have a bit more performance optimization to do on the upgrade, but it's going to be an expensive one (should be the last expensive forced upgrade, though). |
Thanks! After waiting 10 minutes, I shut down my computer, and the filesystem still works (with the stock kernel), so the upgrade seems power-cut safe :) EDIT: Tried again but gave up after 14 hours... |
Since I upgraded to 6.12, my bcachefs (root) filesystem repeatedly fails to mount at system bootup, because of
filesystem marked as clean but have deleted inode
errors at mount time (see below for details). Note that this is extremely well correlated with the update to 6.12.I then reboot to a separate system (also with Kernel 6.12), run fsck, umount the FS again, and can then successfully boot into my root-on-bcachefs system. However, if I then use the system for some time and reboot, the same issue happens again and I see the same failures at mount time.
All of this is after a clean shutdown/umount.
The setup is rather simple, nothing too funky - it's a notebook machine with NixOS on bcachefs root (on LVM + dm-crypt). Single-device. I enabled
snapper
for automatic hourly snapshots in the background (speculative: I think it might be quite likely that this only occurs after snapper created a new snapshot in the background? Is this worth ruling out that potential root cause?)Below is one instance of this happening. I used my bcachefs filesystem for some time, then rebooted into a different system. I then dumped the journal, tried to mount the FS (it indeed failed to mount), and then did run fsck:
bcachefs list_journal /dev/root/nixos:
mount /dev/root/nixos /mnt/nix:
mount -o fsck,fix_errors=yes /dev/root/nixos /mnt/nix/:
The text was updated successfully, but these errors were encountered: