Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reduce deadlock potential in shell->client #180

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ethanpailes
Copy link
Contributor

This patch attempts to reduce the potential to deadlock in the shell->client thread by removing the mutex lock around the shell->client unix socket stream. While recent changes to reduce deadlocks have successfully prevented a single locked session from gumming up all the other sessions, the core issue seems to be persisting. I don't see a smoking gun in the logs that a reporter sent me, but reducing the number of locks in the shell->client thread, which seems to be the one that gets locked up, might help. This should also be slight better for performance (not that it really matters here).

In addition to that main change, this patch contains a few little bugfixes:

  • Previously, every time the client probed for daemon presence, the daemon would put a distracting stack trace in its log. This wasn't a real issue, but it could be confusing for people not familiar with how shpool works. I've had a few people point to it in bug reports. I suppressed the stack trace and replaced it with a more civilized log line. Stack traces in the logs should be indicative of an issue, not business as usual.

  • I fixed the child_watcher thread to directly call waitpid rather than using the shpool_pty crate to do the work. This isn't ideal, but is required to avoid a use-after-close issue that was causing trouble. I think this bug was already present, but it just wasn't presenting a problem during tests for whatever reason (likely just timing). I had to fix this to get the tests passing.

This patch attempts to reduce the potential to deadlock
in the shell->client thread by removing the mutex lock
around the shell->client unix socket stream. While recent
changes to reduce deadlocks have successfully prevented
a single locked session from gumming up all the other sessions,
the core issue seems to be persisting. I don't see a smoking
gun in the logs that a reporter sent me, but reducing the
number of locks in the shell->client thread, which seems to
be the one that gets locked up, might help. This should also
be slight better for performance (not that it really matters here).

In addition to that main change, this patch contains a few
little bugfixes:

- Previously, every time the client probed for daemon presence,
  the daemon would put a distracting stack trace in its log. This
  wasn't a real issue, but it could be confusing for people not
  familiar with how shpool works. I've had a few people point to
  it in bug reports. I suppressed the stack trace and replaced it
  with a more civilized log line. Stack traces in the logs should
  be indicative of an issue, not business as usual.

- I fixed the child_watcher thread to directly call waitpid rather
  than using the shpool_pty crate to do the work. This isn't ideal,
  but is required to avoid a use-after-close issue that was causing
  trouble. I think this bug was already present, but it just wasn't
  presenting a problem during tests for whatever reason (likely just
  timing). I had to fix this to get the tests passing.
@ethanpailes ethanpailes requested a review from Aetf December 3, 2024 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant