Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxd/instance/exec: Only use keepalives on TCP sockets #13731

Merged
merged 5 commits into from
Jul 10, 2024

Conversation

tomponline
Copy link
Member

Also restore websocket package to latest non-pre release version.

This is to try workaround recently started "Error: websocket: close 1006 (abnormal closure): unexpected EOF" errors on riscv64 builders with LXD 6.1.

cjwatson and others added 2 commits July 10, 2024 08:06
Keepalives aren't particularly useful on Unix sockets, and it seems that
there are some issues with them sometimes being written but not read:
this means that processes launched via `lxc exec` can sometimes
eventually hang because a websocket buffer fills up, causing an attempt
to send a keepalive to return `EAGAIN`, causing LXD to give up on
mirroring output from the corresponding file descriptor, and thus
eventually causing subprocesses to hang when the buffer for their
standard output/error fills up in turn.

I've observed this particularly on slower architectures such as riscv64
and arm64 in the context of non-interactive `lxc exec` via
`launchpad-buildd`, though I don't quite know the exact set of triggers
and so don't have a minimal test case - the best I have is building a
snap from `https://git.launchpad.net/~canonical-lxd/lxd
latest-candidate` in `launchpad-buildd` on riscv64, which reliably hangs
prior to this change.

Signed-off-by: Colin Watson <[email protected]>
(cherry picked from commit 781905a)
As newer versions are currently pre-released versions.

Renovate didn't mention this.

Signed-off-by: Thomas Parrott <[email protected]>
@tomponline tomponline self-assigned this Jul 10, 2024
Copy link
Contributor

@markylaing markylaing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MusicDin
Copy link
Member

One of the tests has failed:

time="2024-07-10T07:48:35Z" level=error msg="Failed to distribute new image" err="close /home/runner/work/lxd/lxd/test/tmp.kYU/W5F/images/bf140ba42a4619aa0e31f6b2e27e8b6d8cd2ef6e4ee2ac36daf1740eb5fdbb34: file already closed" fingerprint=bf140ba42a4619aa0e31f6b2e27e8b6d8cd2ef6e4ee2ac36daf1740eb5fdbb34
time="2024-07-10T07:48:35Z" level=error msg="Error deleting old image from database" ID=1 err="database is locked" fingerprint=b9c8a0319f8cf212938721cb0bfc3db63d0a0bb1330ca594de94c103ff200c08
time="2024-07-10T07:48:35Z" level=error msg="Error deleting old image from database" ID=3 err="database is locked" fingerprint=b9c8a0319f8cf212938721cb0bfc3db63d0a0bb1330ca594de94c103ff200c08

Also, I think go mod tidy can be run, as there is checksum left for gorilla websocket 1.5.3

@tomponline tomponline merged commit b40e87d into canonical:main Jul 10, 2024
27 of 28 checks passed
@tomponline
Copy link
Member Author

Thanks we can tidy up later once we've proved this works time isnt on our side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants