-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lxd/instance/drivers/qemu: Pick a random vsock Context ID #11896
Conversation
1ba9452
to
43dfdab
Compare
43dfdab
to
0bf2040
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I like where you're going with it.
There's a few improvements I think we can benefit from:
- Just use volatile vsock ID setting in
getAgentClient
. - Check if the volatile vsock ID is free in
vsockID()
and if so use it. - If not (or not set) then use the existing stable random generator function with the full UUID.
- Then iterate using the random number generator to find a free ID.
23bd9af
to
04ba9dc
Compare
04ba9dc
to
409403f
Compare
When creating a new HTTPClient from lxd/vsock use an actual uint32 for the Context ID. Signed-off-by: Julian Pelizäus <[email protected]>
409403f
to
2ae01e2
Compare
2ae01e2
to
26e7888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good.
Please can you test 2 level nesting with sideloaded LXD binaries to check this works that deep.
As last time we modified the vsock logic we introduced a regression for that:
26e7888
to
56247ce
Compare
Test successful when following the reproducer steps from the original issue (with side-loaded binary in both hv1 & hv2): julian@thinkpad:~$ lxc ls
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| hv1 | RUNNING | 10.92.91.1 (lxdbr0) | fd42:6bc0:c99e:baad:216:3eff:fe05:8187 (eth0) | CONTAINER | 0 |
| | | 10.159.171.40 (eth0) | fd42:3bf9:4e1d:4704::1 (lxdbr0) | | |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| hv2 | RUNNING | 10.226.110.1 (lxdbr0) | fd42:f207:c4b9:82d1::1 (lxdbr0) | CONTAINER | 0 |
| | | 10.159.171.140 (eth0) | fd42:6bc0:c99e:baad:216:3eff:fe4a:959 (eth0) | | |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
julian@thinkpad:~$ lxc shell hv1
root@hv1:~# lxc launch images:ubuntu/bionic/cloud bionic --vm
Creating bionic
Starting bionic
root@hv1:~#
logout
julian@thinkpad:~$ lxc shell hv2
root@hv2:~# lxc launch images:ubuntu/bionic/cloud bionic --vm
Creating bionic
Starting bionic
root@hv2:~#
logout
julian@thinkpad:~$ lxc exec hv1 -- lxc config show bionic | grep vsock_id
volatile.vsock_id: "2748304871"
julian@thinkpad:~$ lxc exec hv2 -- lxc config show bionic | grep vsock_id
volatile.vsock_id: "969274292"
|
That looks like a single VM inside a container (so not nested VMs). |
Yes of course, the example above just mounts in the same vsock dev from the host, I missed that. But fortunately the results for nested VMs also look quite promising: Interestingly just by patching the host LXD (sideloaded), the issue can already be fixed "temporarily" if the nested LXDs don't pick an already acquired vsock Context ID by mistake: julian@thinkpad:~$ lxc ls
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| hv1 | RUNNING | 10.82.68.1 (lxdbr0) | fd42:a349:d83a:22cb::1 (lxdbr0) | VIRTUAL-MACHINE | 0 |
| | | 10.159.171.108 (enp5s0) | fd42:6bc0:c99e:baad:216:3eff:fed9:c0d5 (enp5s0) | | |
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| hv2 | RUNNING | 10.65.72.1 (lxdbr0) | fd42:6bc0:c99e:baad:216:3eff:fe22:c96f (enp5s0) | VIRTUAL-MACHINE | 0 |
| | | 10.159.171.122 (enp5s0) | fd42:5a05:bd97:d7b0::1 (lxdbr0) | | |
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
julian@thinkpad:~$ lxc shell hv1
root@hv1:~# lxc launch images:ubuntu/bionic/cloud bionic --vm
Creating bionic
Starting bionic
root@hv1:~#
logout
julian@thinkpad:~$ lxc shell hv2
root@hv2:~# lxc launch images:ubuntu/bionic/cloud bionic --vm
Creating bionic
Starting bionic
root@hv2:~#
logout
julian@thinkpad:~$ lxc exec hv1 -- lxc config show bionic | grep vsock_id
volatile.vsock_id: "1946733570"
julian@thinkpad:~$ lxc exec hv2 -- lxc config show bionic | grep vsock_id
volatile.vsock_id: "1156093895"
julian@thinkpad:~$ lxc config show hv1 | grep vsock_id
volatile.vsock_id: "1946733567"
julian@thinkpad:~$ lxc config show hv2 | grep vsock_id
volatile.vsock_id: "1156093892"
When comparing both host and VM After patching LXD also in the VMs, the julian@thinkpad:~$ lxc ls
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| hv1 | RUNNING | 10.82.68.1 (lxdbr0) | fd42:a349:d83a:22cb::1 (lxdbr0) | VIRTUAL-MACHINE | 0 |
| | | 10.159.171.108 (enp5s0) | fd42:6bc0:c99e:baad:216:3eff:fed9:c0d5 (enp5s0) | | |
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| hv2 | RUNNING | 10.65.72.1 (lxdbr0) | fd42:6bc0:c99e:baad:216:3eff:fe22:c96f (enp5s0) | VIRTUAL-MACHINE | 0 |
| | | 10.159.171.122 (enp5s0) | fd42:5a05:bd97:d7b0::1 (lxdbr0) | | |
+------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
julian@thinkpad:~$ lxc shell hv1
root@hv1:~# lxc launch images:ubuntu/bionic/cloud bionic --vm
Creating bionic
Starting bionic
root@hv1:~#
logout
julian@thinkpad:~$ lxc shell hv2
root@hv2:~# lxc launch images:ubuntu/bionic/cloud bionic --vm
Creating bionic
Starting bionic
root@hv2:~#
logout
julian@thinkpad:~$ lxc exec hv1 -- lxc config show bionic | grep vsock_id
volatile.vsock_id: "2443293773"
julian@thinkpad:~$ lxc exec hv2 -- lxc config show bionic | grep vsock_id
volatile.vsock_id: "3647469267"
julian@thinkpad:~$ lxc config show hv1 | grep vsock_id
volatile.vsock_id: "1946733567"
julian@thinkpad:~$ lxc config show hv2 | grep vsock_id
volatile.vsock_id: "1156093892"
|
As a client retrieving the already existing vsock Context ID for a VM is now handled by a call to getVsockID() Signed-off-by: Julian Pelizäus <[email protected]>
When acquiring a new Context ID for the communication via vsock, use the UUID of the instance as a seed for generating random uint32 candidates. The loop is kept open until a free Context ID is found or the timeout of 5s is reached. The syscall to the vsock returns ENODEV in case the Context ID is not yet assigned. In case the Context ID of a stopped VM was already acquired again, a new one gets picked. Removes the `vhost_vsock` feature since the value isn't anymore accessed. Fixes lxc#11508 Signed-off-by: Julian Pelizäus <[email protected]>
56247ce
to
90ea2ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we just need to ensure that if the volatile vsock ID key is set to 0-2 that in nextVsockID
we detect this and switch to finding the next free ID.
@roosterfish I updated the PR description to represent what the PR ended up doing for posterity. |
When acquiring a new Context ID for the communication via vsock, use a stable random uint32 derived using a random generator seeded from the instance's UUID. If there is a collision, try different random numbers in the stable sequence for up to 5s.
The syscall to the vsock returns ENODEV in case the Context ID is not yet assigned.
Fixes https://github.com/lxc/lxd/issues/11508