Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lld doesn't support x86-32 -fno-plt GD->LD, GD->LE, LD->LE optimizations #59769

Closed
rakslice opened this issue Dec 31, 2022 · 8 comments
Closed
Assignees
Labels

Comments

@rakslice
Copy link

rakslice commented Dec 31, 2022

$ clang++ --version
Alpine clang version 15.0.6
Target: i586-alpine-linux-musl
Thread model: posix
InstalledDir: /usr/bin
$ ld.lld --version
LLD 15.0.6 (compatible with GNU linkers)

I'm in the process of chasing the segfault in firefox in alpine 3.17 x86.

I've found that an object file mozglue/baseprofiler/Unified_cpp_mozglue_baseprofiler0.o built with clang++
(build command line)

/usr/bin/clang++ -Qunused-arguments -std=gnu++17 -o Unified_cpp_mozglue_baseprofiler0.o -c  -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/dist/stl_wrappers -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/dist/system_wrappers -include /home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fstack-clash-protection -DNDEBUG=1 -DTRIMMED=1 -DIMPL_MFBT -DIMPL_MFBT -DMOZ_HAS_MOZGLUE -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/mozglue/baseprofiler -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/mozglue/baseprofiler -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/mozglue/baseprofiler/core -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/mozglue/linker -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/dist/include -I/usr/include/nspr -I/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/dist/include/nss -I/usr/include/libpng16 -I/usr/include/pixman-1 -DMOZILLA_CLIENT -include /home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/mozilla-config.h -fno-sized-deallocation -fno-aligned-new -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -Os -fomit-frame-pointer -O2 -fno-plt -fomit-frame-pointer -funwind-tables -Wall -Wbitfield-enum-conversion -Wdeprecated-this-capture -Wempty-body -Wformat-type-confusion -Wignored-qualifiers -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wno-error=tautological-type-limit-compare -Wunreachable-code -Wunreachable-code-return -Wunused-but-set-parameter -Wno-invalid-offsetof -Wclass-varargs -Wempty-init-stmt -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wno-range-loop-analysis -Wc++2a-compat -Wenum-compare-conditional -Wenum-float-conversion -Wno-ambiguous-reversed-operator -Wno-error=deprecated -Wno-error=depecated-anon-enum-enum-conversion -Wno-error=deprecated-enum-enum-conversion -Wno-error=deprecated-pragma -Wno-error=deprecated-this-capture -Wno-error=deprecated-volatile -Wcomma -Wimplicit-fallthrough -Wstring-conversion -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=free-nonheap-object -Wno-error=atomic-alignment -Wno-error=deprecated-builtins -Wformat -Wformat-security -Wno-psabi -Wthread-safety -Wno-unknown-warning-option -Wno-ignored-qualifiers -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/Unified_cpp_mozglue_baseprofiler0.o.pp   Unified_cpp_mozglue_baseprofiler0.cpp

has TLS lea-call sequences like

 175:   f0 ff 46 28             lock incl 0x28(%esi)
 179:   8d 83 00 00 00 00       lea    0x0(%ebx),%eax
 17f:   ff 93 00 00 00 00       call   *0x0(%ebx)
 185:   89 c1                   mov    %eax,%ecx
 187: 89 a8 00 00 00 00 mov %ebp,0x0(%eax)

with relevant relocs

OFFSET   TYPE              VALUE
0000017b R_386_TLS_LDM     _ZN7mozilla12baseprofiler19TLSRegisteredThread17sRegisteredThreadE
00000181 R_386_GOT32       ___tls_get_addr

when this is linked with lld
(build command line)

/usr/bin/clang++ -Qunused-arguments -std=gnu++17 -o ../../dist/bin/firefox -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fstack-clash-protection -fno-sized-deallocation -fno-aligned-new -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -Os -fomit-frame-pointer -O2 -fno-plt -fomit-frame-pointer -funwind-tables /home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/browser/app/firefox.list -lpthread -fuse-ld=lld -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,-Bsymbolic-functions -Wl,--build-id=sha1 -fstack-protector-strong -fstack-clash-protection -rdynamic -Wl,-rpath-link,/home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/dist/bin -Wl,-rpath-link,/usr/lib -pie -latomic

(resulting lld command line)

`"/usr/bin/ld.lld" -pie -export-dynamic -z now -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_i386 -export-dynamic -dynamic-linker /lib/ld-musl-i386.so.1 -o ../../dist/bin/firefox /usr/lib/Scrt1.o /usr/lib/crti.o /usr/bin/../lib/gcc/i586-alpine-linux-musl/12.2.1/crtbeginS.o -L/usr/bin/../lib/gcc/i586-alpine-linux-musl/12.2.1 -L/usr/bin/../lib/gcc/i586-alpine-linux-musl/12.2.1/../../../../i586-alpine-linux-musl/lib -L/lib -L/usr/lib /home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/browser/app/firefox.list -lpthread -z noexecstack -z text -z relro -Bsymbolic-functions --build-id=sha1 -rpath-link /home/rakslice/src/aports/community/firefox/src/firefox-108.0.1/obj/dist/bin -rpath-link /usr/lib -latomic -lssp_nonshared -lstdc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc /usr/bin/../lib/gcc/i586-alpine-linux-musl/12.2.1/crtendS.o /usr/lib/crtn.o`

it generates broken output like

   444d5:       f0 ff 46 28             lock incl 0x28(%esi)
   444d9:       65 a1 00 00 00 00       mov    %gs:0x0,%eax
   444df:       90                      nop
   444e0:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
   444e4:       00 89 c1 89 a8 f4       add    %cl,-0xb57763f(%ecx)
   444ea:       ff                      (bad)
   444eb:       ff                      (bad)
   444ec:       ff 89 a8 f8 ff ff       decl   -0x758(%ecx)

this mov-nop-lea-no-op is the canned sequence from X86::relaxTlsLdToLe() in lld/ELF/Arch/X86.cpp https://github.com/llvm/llvm-project/blob/llvmorg-15.0.6/lld/ELF/Arch/X86.cpp#L462,

  // Convert
  //   leal foo(%reg),%eax
  //   call ___tls_get_addr
  // to
  //   movl %gs:0,%eax
  //   nop
  //   leal 0(%esi,1),%esi
  const uint8_t inst[] = {
      0x65, 0xa1, 0x00, 0x00, 0x00, 0x00, // movl %gs:0,%eax
      0x90,                               // nop
      0x8d, 0x74, 0x26, 0x00,             // leal 0(%esi,1),%esi
  };

which is basically the LD->LE transition right out of "ELF Handling for Thread-Local Storage" v0.21 (2013-08-22) starting on p.54 (https://lrita.github.io/images/posts/cplusplus/ELF_Handling_For_Thread-Local_Storage.pdf#page=54), edit: actually it's the second one, "The GNU variant" on the next page.

But that particular rewrite is clearly inappropriate; the call here in the GOT ___tls_get_addr case is 6 bytes rather than presumably 5 the canned sequence is expecting so it is a byte short, and leaves us with garbage output.

How lld got to this:

  1. lld/ELF/Arch/X86.cpp:92: use expr R_TLSLD_GOTPLT for R_386_TLS_LDM
    https://github.com/llvm/llvm-project/blob/llvmorg-15.0.6/lld/ELF/Arch/X86.cpp#L92
  2. lld/ELF/Relocations.cpp:1218: handling of the R_TLSLD_GOTPLT relocation figures that this LD->LE "relaxation" applies https://github.com/llvm/llvm-project/blob/llvmorg-15.0.6/lld/ELF/Relocations.cpp#L1218
  3. lld/ELF/Arch/X86.cpp:462: as noted X86::relaxTlsLdToLe() applies canned code block presumably designed for a different call than the one we have https://github.com/llvm/llvm-project/blob/llvmorg-15.0.6/lld/ELF/Arch/X86.cpp#L462

Anyway, there you go. I don't know if this .o is ABI compliant, I don't know if this is a valid candidate for some other LD->LE transition, I don't know what a minimal example of TLS-using code that produces the lea-call with these relocs would look like, I'm just an end user chasing after a broken binary I got shipped.

@rakslice
Copy link
Author

rakslice commented Dec 31, 2022

Perhaps I should mention just skipping the if (toExecRelax) { block at Relocations.cpp:1221 https://github.com/llvm/llvm-project/blob/llvmorg-15.0.6/lld/ELF/Relocations.cpp#L1221, forcing it not to attempt this LD->LE transition, gives me a working binary, but I don't know if it's correct to still have LD (or whatever this actually is) in the output executable.

@rakslice
Copy link
Author

rakslice commented Jan 1, 2023

FYI the downstream issue is: https://gitlab.alpinelinux.org/alpine/aports/-/issues/14395

@rakslice rakslice changed the title lld 15.0.6 mislink of x86 TLS GOT call lld 15.0.6 mislink of x86 TLS GOT call with LD->LE transition Jan 1, 2023
@MaskRay
Copy link
Member

MaskRay commented Jan 1, 2023

Your build uses gcc -fno-plt and the TLS GD/LD model uses call *___tls_get_addr@GOT(%edx). Clang doesn't do this.
I wonder why you use -fno-plt. It's usually a pessimization instead of an optimization (https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table#fno-plt).

I can add the support to lld x86-32. The x86-64 port does support GOTPCRELX __tls_get_addr calls.

@nekopsykose
Copy link

nekopsykose commented Jan 1, 2023

Your build uses gcc -fno-plt and the TLS GD/LD model uses call *___tls_get_addr@GOT(%edx). Clang doesn't do this.

the above build is with clang, no? (clang++ -fno-plt -fuse-ld=lld ...)

I wonder why you use -fno-plt. It's usually a pessimization instead of an optimization

i had added it to shave a subpercent off binary size- i didn't think it was actually broken on 32-bit x86. fooled myself again :)

@MaskRay
Copy link
Member

MaskRay commented Jan 1, 2023

@rakslice
Copy link
Author

rakslice commented Jan 1, 2023

@rakslice Try https://reviews.llvm.org/D140813

Looks good. The new output for this example is

   44495:       f0 ff 46 28             lock incl 0x28(%esi)
   44499:       65 a1 00 00 00 00       mov    %gs:0x0,%eax
   4449f:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
   444a5:       89 c1                   mov    %eax,%ecx
   444a7:       89 a8 f4 ff ff ff       mov    %ebp,-0xc(%eax)

The resulting binary works.

@llvmbot
Copy link
Member

llvmbot commented Jan 1, 2023

@llvm/issue-subscribers-lld-elf

@MaskRay MaskRay closed this as completed in 8dc7366 Jan 1, 2023
@MaskRay MaskRay self-assigned this Jan 1, 2023
@MaskRay MaskRay changed the title lld 15.0.6 mislink of x86 TLS GOT call with LD->LE transition lld doesn't support x86-32 -fno-plt GD->LD, GD->LE, LD->LE optimizations Jan 1, 2023
@MaskRay
Copy link
Member

MaskRay commented Jan 1, 2023

Your build uses gcc -fno-plt and the TLS GD/LD model uses call *___tls_get_addr@GOT(%edx). Clang doesn't do this.

the above build is with clang, no? (clang++ -fno-plt -fuse-ld=lld ...)

OK, I actually added -fno-plt for __tls_get_addr in 2019 but disabled it when x86 relax relocations were disabled to work around a GNU ld issue (fixed in binutils in 2016-06).
There was a difference between clang -fno-plt -S and clang -fno-plt -c. Fixed by 83cec14

I wonder why you use -fno-plt. It's usually a pessimization instead of an optimization

i had added it to shave a subpercent off binary size- i didn't think it was actually broken on 32-bit x86. fooled myself again :)

-fno-plt will usually increase the binary size :)

CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Jan 4, 2023
For x86-32, {clang,gcc} -fno-plt uses `call *___tls_get_addr@GOT(%reg)` instead
of `call ___tls_get_addr@PLT`. GD to IE/LE relaxations need to shift the offset
by one while LD to LE relaxation needs to use a different code sequence.

While here, fix some comments.

Fix llvm/llvm-project#59769

Differential Revision: https://reviews.llvm.org/D140813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants