Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::process::exit is not thread-safe in combination with C code calling exit #126600

Open
teskje opened this issue Jun 17, 2024 · 153 comments
Open
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@teskje
Copy link

teskje commented Jun 17, 2024

The current implementation of std::process::exit is unsound on Linux and possibly related platforms where it defers to libc's exit() function. exit() is documented to be not thread-safe, and hence std::process::exit is not thread-safe either, despite being a non-unsafe function.

To show that this isn't just a theoretical problem, here is a minimal example that segfaults on my machine (Ubuntu with glibc 2.37):

use std::thread;

fn main() {
    for _ in 0..32 {
        unsafe { libc::atexit(exit_handler) };
    }
    for _ in 0..2 {
        thread::spawn(|| std::process::exit(0));
    }
}

extern "C" fn exit_handler() {
    thread::sleep_ms(1000);
}

The example contains unsafe code, but only to install exit handlers. AFAICT nothing about the libc::atexit call is unsafe. The UB is introduced by calling std::process::exit concurrently afterwards.

If you are curious, https://github.com/MaterializeInc/database-issues/issues/6528 lays out what causes the segfault (it's a use-after-free). That's not terribly relevant though, given that glibc has no obligation to ensure exit() is thread safe when it's clearly documented not to be. Instead, Rust should either mark std::process::exit as unsafe (which is something it could do for the 2024 edition), or introduce locking to ensure only a single thread gets to call exit() at a time.


Current status

Mitigation on our end (only covers pure Rust programs): #126606

The current status is that the specification language stems from a time when C did not specify multithreading and the intent was to forbid reentrancy. External discussion indicates that there's intent to fix it both in libc implementations and the specs:

@teskje teskje added the C-bug Category: This is a bug. label Jun 17, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 17, 2024
@teskje
Copy link
Author

teskje commented Jun 17, 2024

This has been discussed previously in #83994. There the decision was to not do anything about the concerns with std::process::exit, mostly based on the reasoning that exit() is thread safe. The above example shows it's clearly not, neither according to the documentation nor in the implementation.

Given that I just spend the better part of a weekend debugging a segfault that was caused by concurrent std::process::exit invocations I feel quite strongly that this issue deserves reconsideration.

@ChrisDenton
Copy link
Member

On that thread joshtriplett notes:

The same arguments apply here as they do with environment handling: some types of unsynchronized changes to the environment from C code could race with an otherwise-synchronized one from Rust on some platforms, but we don't mark the corresponding functions in std::env as unsafe.

We do now mark std::env::set_var as unsafe so maybe thinking on this has changed. Though I'm not sure how you would mark returning from main as being unsafe?

@conradludgate
Copy link
Contributor

conradludgate commented Jun 17, 2024

Linux libc documentation does claim that exit() is thread-unsafe: "MT-Unsafe race:exit", although the original thread suggests that all libc implementations introduce appropriate locks. Perhaps introducing a Once before calling libc::exit is sufficient and not a performance blocker?

@the8472
Copy link
Member

the8472 commented Jun 17, 2024

Doesn't solve the problem with exit getting called from non-rust code. And no, an atexit handler won't help since it leaves a race-window and if you're in a situation where threads concurrently call exit you're already racing.

https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.

@ChrisDenton
Copy link
Member

I'd guess the safe thing to do is kill all other threads then call exit.

@the8472
Copy link
Member

the8472 commented Jun 17, 2024

Not sure if serious...

Killing threads would make anything that accesses their stacks UB. Freezing them would work but that's difficult to implement reliably. Libc would be in a position to do it since they control pthreads. But then they might as well fix their locking.

@tbu-
Copy link
Contributor

tbu- commented Jun 17, 2024

Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.

@tbu-
Copy link
Contributor

tbu- commented Jun 17, 2024

I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's exit on different threads.

This also means that Rust cdylibs mustn't call std::process::exit since they might not be the only language running, and applications must ensure that no foreign functions call exit. The latter should be a bug anyway due to the C standard saying that multiple calls to exit are UB.

@teskje
Copy link
Author

teskje commented Jun 17, 2024

https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.

Based on how the glibc code is set up it very much looks like exit isn't intended to be thread safe. It uses a lock, but releases it every time it calls one of the exit handlers. That'd make no sense if it wanted to prevent other threads from modifying the list of exit handlers concurrently. The lock is useful for preventing corruption in the face of other threads calling atexit in parallel. There is a comment that says that this is its intended purpose (though admittedly it could be clearer).

I think the previous assumption that glibc's exit is thread-safe was a misconception based on the existence of this lock alone.

@jieyouxu jieyouxu added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 17, 2024
@teskje
Copy link
Author

teskje commented Jun 17, 2024

Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.

That's explained by the fact that one block in the list of exit handlers contains 32 entries, and the last block is never freed. So to trigger the use-after-free you need at least two blocks in the list. glibc installs one exit handler on its own, so 32 more need to be installed to add a free-able block to the list.

@jieyouxu jieyouxu added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Jun 17, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 17, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Jun 17, 2024

Triage: marking this as I-unsound, but please readjust the label accordingly.

@ChrisDenton ChrisDenton added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 18, 2024
@ChrisDenton
Copy link
Member

This comment from the main author of musl was pointed out in the libs-api meeting:

POSIX requires exit be thread-safe; it is not one of the functions listed in the exceptions, and all functions are required to be thread-safe by default. My understanding is that C11 also requires this.

I looked up the spec to confirm and it seems to be correct. https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01

All functions defined by this volume of POSIX.1-2017 shall be thread-safe, except that the following functions1 need not be thread-safe.

exit is not on the list.

@tbu-
Copy link
Contributor

tbu- commented Jun 18, 2024

According to the C11 standard, calling exit more than once is undefined behavior.

C11 7.22.4.4p2

If a program calls the exit function more than once, or calls the quick_exit function in addition to the exit function, the behavior is undefined.

Calling it from two threads at the same time is calling it twice. It is thus explicitly thread-unsafe in C11.

@tbu-
Copy link
Contributor

tbu- commented Jun 18, 2024

The POSIX description also contains that sentence.

https://pubs.opengroup.org/onlinepubs/009695299/functions/exit.html

If exit() is called more than once, the behavior is undefined.

So the POSIX spec is kind of in conflict with itself?

@ericlagergren
Copy link

@tbu- See the latter half of #83994 (comment)

@tbu-
Copy link
Contributor

tbu- commented Jun 18, 2024

If there's any proper action to be had here, it's on the C side. The text about calling exit more than once being UB was written in the absence of threads and was obviously intended to make recursive exit (via atexit handlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.

It makes sense to fix this on the C side, too.

Currently, we have process::exit causing practical UB on a tier 1 platform, it's not just a hypothetical issue, so I think something like #126606 also makes sense. Basically making sure exit() is only called once, and other threads calling exit simply going to sleep forever (until they're cleaned up by the OS).

@Amanieu
Copy link
Member

Amanieu commented Jun 19, 2024

This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.

@zachs18
Copy link
Contributor

zachs18 commented Jun 21, 2024

Here is a mostly-one-to-one C11 translation of the program in the OP that reproduces a segfault on my machine (Ubuntu 24.04, gcc 13.2.0-23ubuntu4, glibc 2.39-0ubuntu8.2), if someone wants to use it in a bug report.

code
#include <stdlib.h>
#include <threads.h>

void exit_handler(void) {
    thrd_sleep(&(struct timespec){.tv_sec = 1}, NULL);
}

int thread_function(void *arg) {
    (void)arg;
    exit(0);
}

int main(void) {
    for (int i = 0; i < 32; ++i) {
        atexit(exit_handler);
    }

    for (int i = 0; i < 2; ++i) {
        thrd_t thread;
        thrd_create(&thread, thread_function, NULL);
    }
}

@carbotaniuman

This comment was marked as outdated.

@tbu-

This comment was marked as outdated.

@benesch
Copy link
Contributor

benesch commented Jul 1, 2024

This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.

I don't think it's exactly a bug in glibc, though, and reporting it that way might get the issue bounced. The Linux manual page for exit(3) says clearly:

The exit() function uses a global variable that is not protected, so it is not thread-safe.

So glibc is at least conforming to its own documentation.

As @richfelker said in #83994 (comment), I do think it's worth trying to get this fixed over in the POSIX/C world, but I think that looks like engaging the folks who work on the POSIX and C standards to get this updated in the next version of those standards, and looping the glibc maintainers into those conversations for feedback.

@rust-lang rust-lang locked as too heated and limited conversation to collaborators Aug 23, 2024
@workingjubilee
Copy link
Member

workingjubilee commented Aug 23, 2024

I do not enjoy feeling compelled to silence discussion, even if it runs hot. I invite any of my peers to reverse this lock if they feel a sufficient cooling-off period to have elapsed, or perhaps if the conversation in question has been moved to https://internals.rust-lang.org/ or somewhere else likely. But this issue is not about debating the deeper semantics of Rust, its abstract goals, or whether modern programs have too many dependencies. It is not even about implementing thread-safe programs despite various forms of destructors existing.

It is about a concern about std::process::exit that makes any multithreaded program that calls std::process::exit "simultaneously" on two threads unsound. This fact remains even if we remove atexit, finalizers, and the like from the picture. The C Standard and POSIX, that I can see, say that two calls to exit within the lifetime of the same program causes the following behavior to not be defined. There is no additional caveat or requirement. No "if atexit was ever called". Merely a second exit will do.

It was previously hypothesized that due to implementation details, this function was actually "thread-safe" (in a meaningful sense) on common implementations. Now that we know this to be false, it does not matter whether the second call leads to undefined behavior because of various cleanup functions being called or because the second call is followed, in the implementing libc, by control-flow arriving at a C expression like *NULL.

@RalfJung
Copy link
Member

RalfJung commented Aug 25, 2024

It is about a concern about std::process::exit that makes any multithreaded program that calls std::process::exit "simultaneously" on two threads unsound. This fact remains even if we remove atexit, finalizers, and the like from the picture. The C Standard and POSIX, that I can see, say that two calls to exit within the lifetime of the same program causes the following behavior to not be defined. There is no additional caveat or requirement. No "if atexit was ever called". Merely a second exit will do.

That's what started the discussion, yes. But that part has been mostly resolved, I think:

  • We added a lock on the Rust side to ensure that the 2nd exit will not happen. This is a work-around, not good enough for when exit also gets called by non-Rust code, but it helps.
  • We started the slow process towards having libc's commit to, effectively, carry such a lock themselves. According to information here, glibc 2.41 will have such a lock. FreeBSD libc received a similar patch.

But it is still a valid question, and related to thread safety of exit, to ask "whose bug it is" when we have a situation like

  • some library's exit handler (via atexit or fini sections or whatever) destroys a global resource (such as a mutex, in the case mentioned above)
  • some code calls exit, or returns from main
  • another thread is still running library code that accesses the mutex that is destroyed by the atexit handler

From a Rust perspective, I don't think there is any credible alternative to saying that this is a library bug. We cannot say that it is a bug in the code that calls exit, because we cannot make "return from main" an unsafe operation. (The only possibility I was able to come up with here is for the standard library to provide a "do not leave main" lock that libraries can acquire to ensure return-from-main blocks until library cleanup was done. That seems terrible.)

No matter whose bug it is, it will be a bunch of extra work for someone to deal with this and fix the bug, but at least this choice means we get a credible compositional safety story in return, making it better than the alternative.

Maybe this issue is the wrong place to discuss that question, but it is a question worth discussing. Or rather, since I think we already know the answer we want to give, we should "just" document that, to make it clear that this is an expectation Rust imposes on libraries: #129581.

@richfelker
Copy link

We are also re-introducing exit locking in musl libc, to be included in version 1.2.6.

I'm pretty much entirely in agreewith with @RalfJung's above comment. Destroying objects that could still be in use from an atexit handler is a library bug. Unfortunately this also means the very existence of non-PoD global C++ objects in a library is usually a library bug. There are probably general patterns that can be used to mitigate it, however.

I think we do need better resources for programmers to understand these issues so that they don't keep happening.

Aside from documenting that destroying global objects as part of a dtor/atexit handler is unsafe as far as Rust is concerned, I think most further work on this is outside the scope of the current issue report, which should be considered resolved by adding the Rust-side lock and getting libc implementors and standards on-track for doing locking on our side too.

@greghudson
Copy link

From a Rust perspective, I don't think there is any credible alternative to saying that this is a library bug

As a C library author, I understand the arguments why Rust cannot reasonably prevent this situation, but I also don't think there's anything I can do, given the state of the ecosystem.

If platforms could ensure that library finalizers run after the end of library usage, then it would be safe to destroy global resources. But platforms don't, and there's no obvious fix for that.

If platforms did not implement library unloading, then there would be no need to release global memory resources. But platforms do, and people would (correctly) report bugs against my libraries if they leaked resources on each library unload. I saw a lot of discussion in this PR which ignored library unloading, or dismissed it as a niche thing that most libraries don't support. I don't think that is correct. I would certainly prefer it if (at least by default) libraries remained mapped after all handles to them are closed and their global values persisted for a subsequent open, but that isn't generally the case, and frameworks like PAM can repeatedly open and close libraries.

If portable threads APIs didn't require dynamic mutex initialization and destruction, then the singleton mutexes used to protect global resources could persist after library finalizers. But that isn't the case. (I think the story gets better if you decide to rely on common properties of real-world pthreads implementations. I don't think that's the case for Windows.)

One possible platform improvement would be to refrain from unloading libraries during process exit, and somehow communicate to library finalizers whether the process is exiting. Then only finalizations affecting resources beyond the process (like flushing I/O buffers or resetting tty state) would need to be run.

@richfelker
Copy link

As a C library author, I understand the arguments why Rust cannot reasonably prevent this situation, but I also don't think there's anything I can do, given the state of the ecosystem.

You can refrain from having global state in your library at all, and instead have an explicit state object that any user of the library is responsible for requesting and releasing. Parts of this can even then be shared between multiple consumers of the library within the same program; having explicit release ensures that you free things only when the last user is done - which might be never, but which will be before library unloading if the library is going to be unloaded.

@greghudson
Copy link

You can refrain from having global state in your library at all

That's great if you can manage it. C libraries frequently have history which precludes it, or are written to an API specification which precludes it.

@richfelker
Copy link

That's great if you can manage it. C libraries frequently have history which precludes it, or are written to an API specification which precludes it.

Then they can be marked non-unloadable so that dlopen/dlclose N times does not leak O(N) resources but leaves O(1) resources live for the entire program lifetime. Perfectly acceptable behavior.

@greghudson
Copy link

Then they can be marked non-unloadable

If I can mark my libraries as non-unloadable in an even vaguely portable fashion, then I would like to sign up immediately. I don't think that's a common platform facility.

@richfelker
Copy link

If I can mark my libraries as non-unloadable in an even vaguely portable fashion, then I would like to sign up immediately. I don't think that's a common platform facility.

For ELF-based platforms, -Wl,-z,nodelete does it. If the library knows its own name, dlopen(myname,RTLD_NOW|RTLD_LOCAL) and throwing away the result does it at runtime on any platform with dlopen. If you want to write code that's compatible with static linking into an environment that might not have dlopen, declaring a weak reference to dlopen and testing it before calling should work.

#include <dlfcn.h>
__typeof__(dlopen) dlopen __attribute__((__weak__));
...
if (dlopen) ...

Libtool may have its own abstraction for how to mark libraries non-unloadable; I'm not sure.

None of this is perfectly portable but I think it meets your requirement for "even vaguely portable". At worst you may need to add special cases for Windows and Mac.

@DemiMarie
Copy link
Contributor

On Windows I’m not sure if making a library non-unloadable is an always an option, as it leaks the DLL’s handles. Fortunately, DllMain on Windows takes an argument indicating whether it is called due to process termination (in which case cleaning up resources is not safe or needed) or due to DLL unloading (in which case cleaning up is both safe and needed).

@bjorn3
Copy link
Member

bjorn3 commented Aug 25, 2024

On macOS using thread local storage effectively blocks dlclose. (There are several issues on the rust issue tracker about rust dylibs not getting unloaded when using dlclose. This due to rust's standard library using TLS internally and it by default getting statically linked into dylibs.)

@comex
Copy link
Contributor

comex commented Aug 25, 2024

On macOS using thread local storage effectively blocks dlclose. (There are several issues on the rust issue tracker about rust dylibs not getting unloaded when using dlclose. This due to rust's standard library using TLS internally and it by default getting statically linked into dylibs.)

Indeed. But on the other hand there doesn't seem to be any explicit "never unload" flag for macOS. No nice solution.

@RalfJung
Copy link
Member

As a C library author, I understand the arguments why Rust cannot reasonably prevent this situation, but I also don't think there's anything I can do, given the state of the ecosystem.

So it seems what you need is a function that is called on dlclose but not on exit? And then for dlclose, it is the caller's requirement to ensure that no thread runs library code any more (that's anyway required) so there can't be any conflict with concurrently running code?

Or alternatively, if supporting dlclose is not even your intent -- some way to tell libc about that so that it doesn't end up being incorrectly half-unloaded or so?

I am not sure what is the best platform to request features like that -- it'd have to come from libcs, right?

@workingjubilee
Copy link
Member

We #126600 (comment) the slow process towards having libc's commit to, effectively, carry such a lock themselves. According to information here, glibc 2.41 will have such a lock. FreeBSD libc received a similar patch.

Yes. My main concern is that the best use of this issue is, I think, continuing to coordinate these efforts. A concern that goes far enough afield, like advocacy for Rust adding an explicit API for exit handlers, is better taken to the actual decision-makers.

For solving the issue of dynamic libraries having problems with dlclose, it sounds like we would want either something like "if this magic symbol is included in your dylib, it shouldn't be actually-unloaded at dlclose time" going on? The appropriate venue for that is probably POSIX, as dlopen is not a Standard C function.

@bjorn3
Copy link
Member

bjorn3 commented Aug 26, 2024

For solving the issue of dynamic libraries having problems with dlclose, it sounds like we would want either something like "if this magic symbol is included in your dylib, it shouldn't be actually-unloaded at dlclose time" going on? The appropriate venue for that is probably POSIX, as dlopen is not a Standard C function.

For ELF based platforms there is already -Wl,-z,nodelete as noted a couple of comments back.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 26, 2024

Doesn't that require the person driving the linker to do it? It's not something the author can assure. And ELF is not under the purview of the Austin Group, I think.

@RalfJung
Copy link
Member

Yes. My main concern is that the best use of this issue is, I think, continuing to coordinate these efforts.

Fair. I feel like coordinating those efforts is out of scope of Rust, but it'd make sense for us to track progress here, and maybe remove our own exit lock once all libc guarantee thread-safe exit (but with the rate at which we increment our min glibc version, that will take many years).

Discussing how to write C libraries that can tolerate their exit handlers to be invoked at any time while also not leaking resources on dlclose (or somehow declaring they don't support being unloaded on dlclose) is also out of scope of Rust, but people are inevitably going to discuss this anyway, so I've created a forum thread for this.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 27, 2024
exit: explain our expectations for the exit handlers registered in a Rust program

This documents the position of `@Amanieu` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established `@rust-lang/libs-api`  consensus so I presume this will have to go through FCP.

Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible.

The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 27, 2024
exit: explain our expectations for the exit handlers registered in a Rust program

This documents the position of ``@Amanieu`` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established ``@rust-lang/libs-api``  consensus so I presume this will have to go through FCP.

Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible.

The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Aug 27, 2024
Rollup merge of rust-lang#129581 - RalfJung:exit, r=joshtriplett

exit: explain our expectations for the exit handlers registered in a Rust program

This documents the position of ``@Amanieu`` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established ``@rust-lang/libs-api``  consensus so I presume this will have to go through FCP.

Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible.

The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
Chiichen pushed a commit to DragonOS-Community/glibc-mirror that referenced this issue Oct 18, 2024
Even if C/POSIX standard states that exit is not formally thread-unsafe,
calling it more than once is UB.  The glibc already supports
it for the single-thread, and both elf/nodelete2.c and tst-rseq-disable.c
call exit from a DSO destructor (which is called by _dl_fini, registered
at program startup with __cxa_atexit).

However, there are still race issues when it is called more than once
concurrently by multiple threads.  A recent Rust PR triggered this
issue [1], which resulted in an Austin Group ask for clarification [2].
Besides it, there is a discussion to make concurrent calling not UB [3],
wtih a defined semantic where any remaining callers block until the first
call to exit has finished (reentrant calls, leaving through longjmp, and
exceptions are still undefined).

For glibc, at least reentrant calls are required to be supported to avoid
changing the current behaviour.  This requires locking using a recursive
lock, where any exit called by atexit() handlers resumes at the point of
the current handler (thus avoiding calling the current handle multiple
times).

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

[1] rust-lang/rust#126600
[2] https://austingroupbugs.net/view.php?id=1845
[3] https://www.openwall.com/lists/libc-coord/2024/07/24/4
Reviewed-by: Carlos O'Donell <[email protected]>
@bstrie
Copy link
Contributor

bstrie commented Oct 23, 2024

This is a rather long thread, can we get a summary of the current status added to the original issue body? What are actionable steps that can be taken to resolve this unsound issue? AFAICT the status appears to be "wait until the rest of the world is fixed", which IMO is tantamount to closing this issue as WONTFIX, given the prospects of that ever happening. And I'm not entirely satisfied by the dismissal of ameliorations on the Rust side (either adding a lock or marking as unsafe). If we can do better, then let's do better, even if preventing people from linking to broken C code is out of our power.

@ChrisDenton
Copy link
Member

The pure Rust fix would be to have some Rust API for adding on exit functions. This could side-step the issue entirely so long as you ignore C.

Note that we already have a lock to mitigate this issue. However it's by no means a fix.

@the8472
Copy link
Member

the8472 commented Oct 23, 2024

AFAICT the status appears to be "wait until the rest of the world is fixed", which IMO is tantamount to closing this issue as WONTFIX, given the prospects of that ever happening.

Things are looking rosier than that. Posix, musl and glibc agreed to improve things and if I read the comments on the posix issue correctly they'll also bring it to the appropriate C committee.

@the8472
Copy link
Member

the8472 commented Oct 23, 2024

I have gathered a short summary and a few links from other comments into the top comment. Edit away if there's anything to correct.

@RalfJung RalfJung changed the title std::process::exit is not thread-safe std::process::exit is not thread-safe in combination with C code calling exit Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests