-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use systemd-nsresourced to allocate user namespaces and UID/GID ranges #24828
Comments
I have only looked briefly at this when it was added but I don't think it is possible to switch to that with the current podman storage design. We write the subuid's on disk in your home directory as plain directories/files so this cannot work when the uid's are transient. And even systemd goes to great length to lock this uid "backdoor" down via BPF:
And if we go into the nspawn man page:
So my understanding is it is impossible to use a normal directory layout. |
Can |
I am not sure how the mounting is supposed to work but I don't that is the problem. We can natively mount overlayfs in a user namesapce without privilege escalation. The issue I think is that we cannot write files with these extra uids to disk which means all images would be limited to one uid, or need something like fuse-overlayfs that can map uids dynamically in the extended attributes. |
According to my understanding of the BPF-LSM code of nsresourced, this is not enforced. For a mount that is in the userns or in the allowlist, operations in it are all allowed. The "no extra uids" limit is enforced by mountfsd. As we can mount overlayfs without privilege, we do not need to use mountfsd and can therefore bypass the limit. They are all my guesses. I haven't confirmed them yet. Sorry |
Feature request description
Currently, podman uses
newuidmap
andnewgidmap
fromshadow-utils
to set up UID/GID mapping for user namespaces of rootless containers. This requires predefined UID/GID ranges in/etc/subuid
and/etc/subgid
. In some configurations, for example users managed bysystemd-homed
(#12590) and users managed by a network authentication system, users do not have records in/etc/subuid
and/etc/subgid
, preventing podman from creating rootless containers.Systemd 256 has introduced a service
systemd-nsresourced
that exposes a Varlink interfaceio.systemd.NamespaceResource
. Unprivileged clients may allocate a user namespace, and then request a transient UID/GID range to be assigned to it via this service. Users do no need to have predefined sub-UID/GID ranges. I wonder whether podman can usesystemd-nsresourced
to allocate user namespaces and UID/GID ranges for rootless containers.Suggest potential solution
Podman can add a code path to use
systemd-nsresourced
to allocate user namespaces and UID/GID ranges. If it is not running, podman can fallback tonewuidmap
andnewgidmap
. This ensures backward compatibility.Have you considered any alternatives?
No.
Additional context
systemd/systemd#26826
man:systemd-nsresourced.service(8)
Interface `io.systemd.NamespaceResource`
The text was updated successfully, but these errors were encountered: