-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread-local storage doesn't work in PIE #352
Comments
https://bugs.freedesktop.org/show_bug.cgi?id=35268 contains an interesting discussion of how the "mesa" shared library uses the initial-exec TLS model, and it's still possible to dlopen() it because glibc deliberately allocates for the main program (for us - the kernel) a surplus size of TLS, exactly for this purpose. The Musl author points out in this thread that he hates this solution because it wastes additional space per-thread. Also, I think this solution will only work for the "initial exec" model but not for the even more optimized "local exec" model? (see http://dev.gentoo.org/~dberkholz/articles/toolchain/tls.pdf for the long description of all these models). |
The problem with "local exec" model is that the executable assumes that it is the first loaded object and therefore offset to its TLS block (relative to fs:0) is known at compile time. The kernel assumes the same and as a result TLS of the executable and the kernel overlap. "Initial exec" is easier as OSv can decide what is the offset (again, relative to fs:0) of the TLS of the object. For threads created after loading the executable (i.e. when the kernel already is aware about static TLS defined by the executable) it is quite straightforward to implement without the need for any hacks. Reclaimer thread and other kernel threads that may end up executing user code (are there any other?) may be a problem though. |
As explained in issue #352, a PIE (position-independent executable) using TLS (thread-local storage) does not currently work correctly. Fixing this case is not easy, but until we do, it is dangerous to allow this case to be ignored, as it causes reading and writing to wrong positions in memory whenever TLS is used in the PIE. So this patch adds a warning printout (to stdout) whenever one attempts to load a PIE which has any TLS variables. This is, of course, just a temporary measure. The real solution is to fix object load and not just warn. Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
In addition to breaking PIEs, as described above, another problem we have by not supporting initial-exec is that shared-libraries that were deliberately (for performance reasons) compiled with the initial-exec TLS model cannot run on OSv. One such library that cannot run on OSv today is gcc's OpenMP library (libgomp), which is deliberately compiled with initial-exec TLS (see https://gcc.gnu.org/ml/gcc-patches/2012-04/msg01347.html). |
In commit 8e99d15, @pdziepak already added support for the initial-exec model. But for reasons explained above by @pdziepak local-exec model is still not supported even today. I wonder how much performance it would cost us to switch the kernel to the initial-exec model, i.e., it will need to use an offset to find its part of TLS area (and can't assume the kernel's offset is zero), so we can allow the application (just one of them...) to use local-exec. Or, I wonder if there's another way for us to support local-exec. |
Well, there might be another solution, but it is extremely messy. The idea is to switch It is not easy since OSv isn't separated form the applications as much as traditional kernels do and dynamic linker needs to be hijacked to insert some trampoline code. There are also exceptions to be dealt with... Early, incomplete and possibly broken attempt at doing something like this is available here. |
I have just realized that since version 1.6 Golang supports '--buildmode pie" but unfortunately generated pie uses local-exec (as you can see the app executes but then blows up in the end):
|
I have just added two new Golang examples built with --buildmode=pie and they work just fine (please see cloudius-systems/osv-apps@4485cc1 for details). It turns out that the crash from my older comment below was really caused by missing sys_exit_group call since added (see a52e73c). One will see 'WARNING: /httpserver.so is a PIE using TLS. This is currently unsupported (see issue #352). Link with '-shared' instead of '-pie'.' when one running golang-pie-example or golang-pie-httpserver. However both examples function just fine with httpserver behaving normally under stress tests using load. It might be that Golang runtime does not use thread local variables. In either case it is a good news as it seems we have another simpler way to run golang apps without wrapper. The only caveat I saw was this stacktrace when trying to Ctrl-C running httpserver example:
|
As I was trying to understand how local-exec works I came across yet another article that describes all TLS scenarios with good examples - https://chao-tic.github.io/blog/2018/12/25/tls. Just to confirm how local-exec works, there are two forms compiler will generate code for local-exec like in those examples: mov $0xfffffffffffffff8,%rcx
mov %fs:(%rcx),%rcx or movq $0xa,%fs:0xffffffffffffffb0 Is it correct? So here is an idea on how we might support local-exec for PIEs given these assumptions:
My point is that TLS used by PIEs would in most cases be pretty small as far as memory goes. So maybe potential solution would be to some reserve just enough space for PIE local-exec executable TLS and have "real" thread local variables used OSv kernel in local-exec mode outside of this reserved part of the TLS area. In other words if we declared a TLS array buffer like so (assuming 256 bytes would be enough in most cases): __thread char __pie_tls_reservation[256]; and somehow made @nyh @pdziepak Is it possible to pull it off with some magical gcc settings, somehow force it to position specific TLS local-exec variables at specific offsets when we build loader.elf? I have also come across this gcc flags:
and some discussion around it here - https://bugs.llvm.org/show_bug.cgi?id=16145. Could this help us somehow? Finally how is it possible golang PIEs that use TLS work on OSv? Are we lucky because it has single TLS variable (see size of 8) -
and somehow falls into this 8 bytes of TLS not used by OSv kernel. Or is it because TLS for new app threads are located on the stack (read this paragraph - https://chao-tic.github.io/blog/2018/12/25/tls#the-initialisation-of-tcb-or-tls-in-non-main-threads) [well maybe not in OSv) and when we switch stacks during SYSCALL calls the TLS is in different areas in memory? |
I think this might be a solution (minus any gaps I have missed):
The symbols in TLS section of loader.elf look like this:
As you can see thanks to a trick in loader.ld (will it always put this in the beginning?) __pie_local_exec_tls_reservation is the very first 256 bytes where local exec of PIE can live without any collision with "used" part of kernel TLS. I have tested the example from the issue (see above) and seems to work:
|
Nice trick with that __pie_local_exec_tls_reservation thing. Looks to me like a reasonable approach but @pdziepak is the real expert here and might have other thoughts (?). I will be easier to review the patch on the mailing list, but I'll start here:
Wow, took me a while to understand why this is correct, but I think it is. Maybe this deserves a comment, e.g., saying that the first 256 bytes of OSv's TLS (which is the 256 bytes at the end of the allocation, since the TLS is addressed with negative offsets) is reserved for lor local-exec TLS in executables which don't know about OSv - so if we have such an executable, we need to copy its TLS initialization. Or something like that. Can you please create a unit test for testing this, which also includes several local-exec variables, some initialized some not (so should be zero-initialized), and see we also get the right initialization (e.g., if we initialize a to 1 and b to 2, we don't want to see this order reversed ;-)). The obvious downside of this technique is that it adds yet another 256 bytes to every thread (as well as the work to copy these bytes). I wonder if we shouldn't start with a much lower number and grow it later if we want? Can you please change the "256" number repeated three times in your code to a constant set only once? (sadly, it must be a constant, it can't be a boot-time configuration, because we need it during compilation). |
I hope to send a patch later once I do more testing. As far as reservation size I agree we could go with 64 bytes instead. As I noted thread-local variable I think are used sparingly so most apps should be OK with that limit. As far as wasting space I think we would not waste any given that existing TLS kernel size is around 1.5K which means that our malloc uses whole 4K page in this case so adding 64 bytes or even more will not change anything for now. See #1000. |
As you may have seen in the patch I just have sent the reservation area actually needs to be at the end of the kernel TLS. This actually also explains why Golang pies work even before the patch. Golang pies only have 8-bytes long TLS and the kernel TLS had already unintended unused area of 40 bytes because of 64-bytes round-up (see loader.ld). |
This patch enhances OSv dynamic loader to support pies and position dependant executables that use TLS (Thread Local Storage) in local-exec mode. It does so by reserving an extra slot in kernel static TLS block at its end and designating it as user static TLS for the executable ELF. Any dependant ELF objects are still placed in the area before the kernel TLS. For the specifics please read comments added to arch-elf.cc and arch-switch.hh. Please note that this solution limits the size of the application TLS block to 64 bytes plus extra gap due to 64-bytes alignment of the kernel TLS. This should be sufficient for most applications which either use tiny TLS (Golang uses 8-bytes long) if at all. Rust ELFs tend to rely on quite large TLS in which case the limit in loader.ld needs to be increased accordingly and loader.elf relinked. Fixes cloudius-systems#352 Signed-off-by: Waldemar Kozaczuk <[email protected]> Message-Id: <[email protected]>
As Pawel Dziepak pointed in issue #213, when an application uses thread-local variables, it will work correctly when linked as a shared object ("-shared") but not work correctly when linked as a PIE ("-pie"). For example, this program:
When compiling it as a PIE, i.e., the compile line is
The resulting a.out runs correctly on Linux (prints 123), but on OSv, it prints 0. Things don't change if "-fpic" is used instead of "-fpie", by the way.
When linking the same objects as a shared object ("-shared") instead of pi executable ("-pie"), the resulting object runs correctly on OSv, and prints 123.
The text was updated successfully, but these errors were encountered: