-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instance: Fix deadlock during failed snapshot creation #13821
Conversation
8527ebd
to
4618f4d
Compare
We need a non-locking delete on an `Instance`. I don't love the type assertion here. However, since CreateInternal returns Instance, we don't have a lot of options beyond enriching that interface or providing an additional one. A solution that doesn't use type assertions would be to define an internal interface for `delete` and possibly a few other functions in the drivers/ package and implement it for common/lxc/qemu. Alternatively, moving the `Interface` interface into the same package as the drivers would allow us to define some private methods on `Interface` instead. Signed-off-by: Wesley Hershberger <[email protected]>
4618f4d
to
c4df03e
Compare
The container does become I tried Other backends:
Didn't try Ceph as I don't have a cluster handy. Let me know if you'd like other changes; thanks! |
I think we should log a warning but continue the onstop hook function if the error contains "disk quota exceeded" (or if there is a better way to detect that error). |
This at least runs through the reverter that gets built in case snapshot creation fails. Signed-off-by: Wesley Hershberger <[email protected]>
e6a7c95
to
4c38c09
Compare
... rootfs to stop. Signed-off-by: Wesley Hershberger <[email protected]>
4c38c09
to
8fbb18e
Compare
Correct me if I'm wrong, but the failure of those chown/chmod would only leave the mount point owned by a uid that could potentially be reused for another container. All that does is allow the mount point to be traversed by a process running under that uid that has access to the parent dir. Agreed, not a concern. Thanks for your patience. |
Yes thats my understanding, and either way that doesn't succeeded whether we return an error or not, its just that by returning an error further bad things happen because the instance's devices are not cleaned up on the host side. |
Fixes #13466
We need a non-locking delete on an
Instance
; otherwise a failure while mounting the instance/updating the backup file will deadlock.I don't love the type assertion here. However, since
CreateInternal
returnsInstance
, we don't have a lot of options beyond enriching that interface or providing an additional one.In order to avoid using type assertions we'd need to define an internal interface
instance
fordelete
and possibly a few other functions in the drivers/ package and implement it forlxc
/qemu
. Alternatively, moving theInterface
interface into the same package as the drivers would allow us to define some private methods onInterface
instead.(Updating the backup file on the full volume was very likely the failure in #13466; this is after applying the fix):
I will throw a snapshot failure test on here this afternoon.
LXD-1117