Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use systemd Boot Assessment #28

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

danyspin97
Copy link

Automatic Boot Assessment allows systemd-boot and systemd to mark boot entries as either good or bad, depending on if they can boot successfully or not.

This PR changes health-checker into a service that is part of the automatic boot assessment, by using the special target boot-complete.target. When systemd-boot and /etc/kernel/tries is greater than 0, the current boot entry get renamed to start the counting.

If health-checker tests pass without errors, then the boot entry is marked as good by systemd-bless-boot. If there is any error, then health-checker decides if there should be a reboot, or it should start an emergency shell. If the current entry is the default one and the entry has still some tries left (i.e. it has not been marked as bad), then reboot. If the current entry is the default one and there are no tries left, then start an emergency shell. The default entry will be picked among the one that are known to work or haven't been tested yet, so the emergency shell is only started when all entries have been tried (this could lead to many reboots). If the user choose an entry instead of letting systemd-boot pick the default one, then health-checker will not reboot by default (this can be enforced with the argument below).

I have also added two kernel cmdline arguments to fix #8 :

  • health-checker-reboot:
    • force: always reboot when health-checker fails and the loaded boot entry is not the default one
    • disable: health-checker never reboots
  • health-checker=disabled: skill all tests and mark health-checker as successful. This breaks systemd Automatic Boot Counting but helps with debugging or some edge cases.

Requirements

  • /etc/kernel/tries to have a number greater than 0. Currently, I am shipping this file in the health-checker package.

Current blockers

  • systemd-bless-boot cannot rename the boot entries, due to selinux enforcing policy. Bug tracked here.

sbin/health-checker.in Outdated Show resolved Hide resolved
systemd/health-checker.service Outdated Show resolved Hide resolved
@Vogtinator
Copy link
Member

What do we do on platforms without EFI vars? Just declare them unsupported by this mechanism?

@danyspin97
Copy link
Author

What do we do on platforms without EFI vars? Just declare them unsupported by this mechanism?

This version uses EFI vars for simplicity, since it makes it easier to retrieve the current and default boot entries. This can work with any bootloader as long as we know this info and Automatic Boot Assessment is supported.

@danyspin97 danyspin97 marked this pull request as ready for review December 2, 2024 14:53
@danyspin97
Copy link
Author

I replaced the EFI variables by reimplementing Automatic Boot Assessment logic in health-checker. I calculate the default one as being the first entry, descending order based on the name, that also has not been disabled by the boot counting. It is quite bare bones, but it works. One possible issue would be having different kernel versions, then health-checker would require a more robust parser. For detecting the current entry, I am checking the snapshot version of the current mounted snapshot.

@danyspin97
Copy link
Author

I thought a little bit more on the approach I have taken in this PR. bootctl can set the default entry by changing the EFI variables, so I'd go back to reading the EFI variables first. I still think the bash implementation of the BLS logic for choosing the default is good as fallback for systems that don't support EFI vars.

@aplanas
Copy link

aplanas commented Dec 11, 2024

bootctl follows the BLS and also the BLI, that describes the set of EFI variables that the bootloader will follow. Because grub2-bls does not really follow this last BLI specification, sdbootutil needed to re-implement set/get-default and set/get-timeout for BLI and non-BLI bootloaders.

For example, if we are using systemd-boot in an architecture that does not has EFI variables, it will set the configuration in the loader.conf file in the ESP, and if we are in a grub2-bls system, then will set the grubenv and the EFI variable (or loader.conf), so bootctl information will always read the correct information for both bootloaders.

My recommendation is to follow this path, or use sdbootutil set-default and get-default to abstract this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to disable health-checker in Grub
4 participants