-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::process::exit
is not thread-safe in combination with C code calling exit
#126600
Comments
This has been discussed previously in #83994. There the decision was to not do anything about the concerns with Given that I just spend the better part of a weekend debugging a segfault that was caused by concurrent |
On that thread joshtriplett notes:
We do now mark |
Linux libc documentation does claim that |
Doesn't solve the problem with exit getting called from non-rust code. And no, an atexit handler won't help since it leaves a race-window and if you're in a situation where threads concurrently call exit you're already racing. https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised. |
I'd guess the safe thing to do is kill all other threads then call |
Not sure if serious... Killing threads would make anything that accesses their stacks UB. Freezing them would work but that's difficult to implement reliably. Libc would be in a position to do it since they control pthreads. But then they might as well fix their locking. |
Adding 32 |
I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's This also means that Rust cdylibs mustn't call |
Based on how the glibc code is set up it very much looks like I think the previous assumption that glibc's |
That's explained by the fact that one block in the list of exit handlers contains 32 entries, and the last block is never freed. So to trigger the use-after-free you need at least two blocks in the list. glibc installs one exit handler on its own, so 32 more need to be installed to add a free-able block to the list. |
Triage: marking this as |
This comment from the main author of musl was pointed out in the libs-api meeting:
I looked up the spec to confirm and it seems to be correct. https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01
|
According to the C11 standard, calling C11 7.22.4.4p2
Calling it from two threads at the same time is calling it twice. It is thus explicitly thread-unsafe in C11. |
The POSIX description also contains that sentence. https://pubs.opengroup.org/onlinepubs/009695299/functions/exit.html
So the POSIX spec is kind of in conflict with itself? |
@tbu- See the latter half of #83994 (comment) |
It makes sense to fix this on the C side, too. Currently, we have |
This should probably be reported as a bug to glibc, along with a C program that reproduces the issue. |
Here is a mostly-one-to-one C11 translation of the program in the OP that reproduces a segfault on my machine (Ubuntu 24.04, gcc 13.2.0-23ubuntu4, glibc 2.39-0ubuntu8.2), if someone wants to use it in a bug report. code#include <stdlib.h>
#include <threads.h>
void exit_handler(void) {
thrd_sleep(&(struct timespec){.tv_sec = 1}, NULL);
}
int thread_function(void *arg) {
(void)arg;
exit(0);
}
int main(void) {
for (int i = 0; i < 32; ++i) {
atexit(exit_handler);
}
for (int i = 0; i < 2; ++i) {
thrd_t thread;
thrd_create(&thread, thread_function, NULL);
}
} |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I don't think it's exactly a bug in glibc, though, and reporting it that way might get the issue bounced. The Linux manual page for
So glibc is at least conforming to its own documentation. As @richfelker said in #83994 (comment), I do think it's worth trying to get this fixed over in the POSIX/C world, but I think that looks like engaging the folks who work on the POSIX and C standards to get this updated in the next version of those standards, and looping the glibc maintainers into those conversations for feedback. |
I do not enjoy feeling compelled to silence discussion, even if it runs hot. I invite any of my peers to reverse this lock if they feel a sufficient cooling-off period to have elapsed, or perhaps if the conversation in question has been moved to https://internals.rust-lang.org/ or somewhere else likely. But this issue is not about debating the deeper semantics of Rust, its abstract goals, or whether modern programs have too many dependencies. It is not even about implementing thread-safe programs despite various forms of destructors existing. It is about a concern about It was previously hypothesized that due to implementation details, this function was actually "thread-safe" (in a meaningful sense) on common implementations. Now that we know this to be false, it does not matter whether the second call leads to undefined behavior because of various cleanup functions being called or because the second call is followed, in the implementing libc, by control-flow arriving at a C expression like |
That's what started the discussion, yes. But that part has been mostly resolved, I think:
But it is still a valid question, and related to thread safety of
From a Rust perspective, I don't think there is any credible alternative to saying that this is a library bug. We cannot say that it is a bug in the code that calls No matter whose bug it is, it will be a bunch of extra work for someone to deal with this and fix the bug, but at least this choice means we get a credible compositional safety story in return, making it better than the alternative. Maybe this issue is the wrong place to discuss that question, but it is a question worth discussing. Or rather, since I think we already know the answer we want to give, we should "just" document that, to make it clear that this is an expectation Rust imposes on libraries: #129581. |
We are also re-introducing I'm pretty much entirely in agreewith with @RalfJung's above comment. Destroying objects that could still be in use from an I think we do need better resources for programmers to understand these issues so that they don't keep happening. Aside from documenting that destroying global objects as part of a dtor/atexit handler is unsafe as far as Rust is concerned, I think most further work on this is outside the scope of the current issue report, which should be considered resolved by adding the Rust-side lock and getting libc implementors and standards on-track for doing locking on our side too. |
As a C library author, I understand the arguments why Rust cannot reasonably prevent this situation, but I also don't think there's anything I can do, given the state of the ecosystem. If platforms could ensure that library finalizers run after the end of library usage, then it would be safe to destroy global resources. But platforms don't, and there's no obvious fix for that. If platforms did not implement library unloading, then there would be no need to release global memory resources. But platforms do, and people would (correctly) report bugs against my libraries if they leaked resources on each library unload. I saw a lot of discussion in this PR which ignored library unloading, or dismissed it as a niche thing that most libraries don't support. I don't think that is correct. I would certainly prefer it if (at least by default) libraries remained mapped after all handles to them are closed and their global values persisted for a subsequent open, but that isn't generally the case, and frameworks like PAM can repeatedly open and close libraries. If portable threads APIs didn't require dynamic mutex initialization and destruction, then the singleton mutexes used to protect global resources could persist after library finalizers. But that isn't the case. (I think the story gets better if you decide to rely on common properties of real-world pthreads implementations. I don't think that's the case for Windows.) One possible platform improvement would be to refrain from unloading libraries during process exit, and somehow communicate to library finalizers whether the process is exiting. Then only finalizations affecting resources beyond the process (like flushing I/O buffers or resetting tty state) would need to be run. |
You can refrain from having global state in your library at all, and instead have an explicit state object that any user of the library is responsible for requesting and releasing. Parts of this can even then be shared between multiple consumers of the library within the same program; having explicit release ensures that you free things only when the last user is done - which might be never, but which will be before library unloading if the library is going to be unloaded. |
That's great if you can manage it. C libraries frequently have history which precludes it, or are written to an API specification which precludes it. |
Then they can be marked non-unloadable so that |
If I can mark my libraries as non-unloadable in an even vaguely portable fashion, then I would like to sign up immediately. I don't think that's a common platform facility. |
For ELF-based platforms,
Libtool may have its own abstraction for how to mark libraries non-unloadable; I'm not sure. None of this is perfectly portable but I think it meets your requirement for "even vaguely portable". At worst you may need to add special cases for Windows and Mac. |
On Windows I’m not sure if making a library non-unloadable is an always an option, as it leaks the DLL’s handles. Fortunately, |
On macOS using thread local storage effectively blocks dlclose. (There are several issues on the rust issue tracker about rust dylibs not getting unloaded when using dlclose. This due to rust's standard library using TLS internally and it by default getting statically linked into dylibs.) |
Indeed. But on the other hand there doesn't seem to be any explicit "never unload" flag for macOS. No nice solution. |
So it seems what you need is a function that is called on Or alternatively, if supporting I am not sure what is the best platform to request features like that -- it'd have to come from libcs, right? |
Yes. My main concern is that the best use of this issue is, I think, continuing to coordinate these efforts. A concern that goes far enough afield, like advocacy for Rust adding an explicit API for exit handlers, is better taken to the actual decision-makers. For solving the issue of dynamic libraries having problems with dlclose, it sounds like we would want either something like "if this magic symbol is included in your dylib, it shouldn't be actually-unloaded at |
For ELF based platforms there is already |
Doesn't that require the person driving the linker to do it? It's not something the author can assure. And ELF is not under the purview of the Austin Group, I think. |
Fair. I feel like coordinating those efforts is out of scope of Rust, but it'd make sense for us to track progress here, and maybe remove our own Discussing how to write C libraries that can tolerate their exit handlers to be invoked at any time while also not leaking resources on |
exit: explain our expectations for the exit handlers registered in a Rust program This documents the position of `@Amanieu` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established `@rust-lang/libs-api` consensus so I presume this will have to go through FCP. Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible. The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
exit: explain our expectations for the exit handlers registered in a Rust program This documents the position of ``@Amanieu`` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established ``@rust-lang/libs-api`` consensus so I presume this will have to go through FCP. Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible. The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
Rollup merge of rust-lang#129581 - RalfJung:exit, r=joshtriplett exit: explain our expectations for the exit handlers registered in a Rust program This documents the position of ``@Amanieu`` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established ``@rust-lang/libs-api`` consensus so I presume this will have to go through FCP. Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible. The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
Even if C/POSIX standard states that exit is not formally thread-unsafe, calling it more than once is UB. The glibc already supports it for the single-thread, and both elf/nodelete2.c and tst-rseq-disable.c call exit from a DSO destructor (which is called by _dl_fini, registered at program startup with __cxa_atexit). However, there are still race issues when it is called more than once concurrently by multiple threads. A recent Rust PR triggered this issue [1], which resulted in an Austin Group ask for clarification [2]. Besides it, there is a discussion to make concurrent calling not UB [3], wtih a defined semantic where any remaining callers block until the first call to exit has finished (reentrant calls, leaving through longjmp, and exceptions are still undefined). For glibc, at least reentrant calls are required to be supported to avoid changing the current behaviour. This requires locking using a recursive lock, where any exit called by atexit() handlers resumes at the point of the current handler (thus avoiding calling the current handle multiple times). Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] rust-lang/rust#126600 [2] https://austingroupbugs.net/view.php?id=1845 [3] https://www.openwall.com/lists/libc-coord/2024/07/24/4 Reviewed-by: Carlos O'Donell <[email protected]>
This is a rather long thread, can we get a summary of the current status added to the original issue body? What are actionable steps that can be taken to resolve this unsound issue? AFAICT the status appears to be "wait until the rest of the world is fixed", which IMO is tantamount to closing this issue as WONTFIX, given the prospects of that ever happening. And I'm not entirely satisfied by the dismissal of ameliorations on the Rust side (either adding a lock or marking as |
The pure Rust fix would be to have some Rust API for adding on exit functions. This could side-step the issue entirely so long as you ignore C. Note that we already have a lock to mitigate this issue. However it's by no means a fix. |
Things are looking rosier than that. Posix, musl and glibc agreed to improve things and if I read the comments on the posix issue correctly they'll also bring it to the appropriate C committee. |
I have gathered a short summary and a few links from other comments into the top comment. Edit away if there's anything to correct. |
std::process::exit
is not thread-safestd::process::exit
is not thread-safe in combination with C code calling exit
The current implementation of
std::process::exit
is unsound on Linux and possibly related platforms where it defers to libc'sexit()
function.exit()
is documented to be not thread-safe, and hencestd::process::exit
is not thread-safe either, despite being a non-unsafe
function.To show that this isn't just a theoretical problem, here is a minimal example that segfaults on my machine (Ubuntu with glibc 2.37):
The example contains
unsafe
code, but only to install exit handlers. AFAICT nothing about thelibc::atexit
call is unsafe. The UB is introduced by callingstd::process::exit
concurrently afterwards.If you are curious, https://github.com/MaterializeInc/database-issues/issues/6528 lays out what causes the segfault (it's a use-after-free). That's not terribly relevant though, given that glibc has no obligation to ensure
exit()
is thread safe when it's clearly documented not to be. Instead, Rust should either markstd::process::exit
asunsafe
(which is something it could do for the 2024 edition), or introduce locking to ensure only a single thread gets to callexit()
at a time.Current status
Mitigation on our end (only covers pure Rust programs): #126606
The current status is that the specification language stems from a time when C did not specify multithreading and the intent was to forbid reentrancy. External discussion indicates that there's intent to fix it both in libc implementations and the specs:
The text was updated successfully, but these errors were encountered: