Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI for __m256 and __m512 is wrong when avx/avx512 is disabled globally, or enabled per-function #64706

Open
chorman0773 opened this issue Aug 15, 2023 · 4 comments
Labels
ABI Application Binary Interface backend:X86

Comments

@chorman0773
Copy link

Based on this discussion: https://groups.google.com/g/x86-64-abi/c/FMhl2vDl1D8

Currently, llvm passes __m256 and __m512 parameters/return values when it cannot use ymm/zmm registers as follows:

  • Parameters are passed on the stack
  • Return values are spanned accross 2-4 xmm registers.

Further, when the avx/avx512f features are enabled at the function level (not globally, using __attribute__((target))), it passes parameters/return values:

  • Paramaters are passed on the stack
  • Return values are placed in a single ymm/zmm register.

In contrast the behaviour of gcc (which is apparantly the correct behaviour in both cases) is:

When ymm/zmm registers are unavailable:

  • Parameters are passed on the stack
  • Return values in memory (return pointer in rdi)

When ymm/zmm registers are available at the function level (using __attribute__((target))), it passes and returns values as it does when the feature is available globally via a -m flag.

The difference in behaviour can be demonstrated by https://godbolt.org/z/8sYcn6654.

Based on a short discussion on the x86-64 psABI mailing list, this appears to be entirely incorrect on behalf of llvm: When returning w/o the registers available, it must return in memory as the ABI requires it to place the 2nd SSEUP eightbyte in the 3rd eightbyte of xmm0, which fails, and sends the entire value to memory. In the locally-enabled case, the registers are available, so it should be passing fully in ymm1 and returning fully in ymm0 (llvm seems to think that it is available given that it does return in ymm0).

@EugeneZelenko EugeneZelenko added backend:X86 ABI Application Binary Interface and removed new issue labels Aug 15, 2023
@llvmbot
Copy link
Member

llvmbot commented Aug 15, 2023

@llvm/issue-subscribers-backend-x86

@phoebewang
Copy link
Contributor

It's unfortunate LLVM has gap with GCC. But I'd argue passing __m256 / __m512 on targets don't support YMM/ZMM register is problematic according to current ABI. I proposed to improve the ABI to always generate YMM/ZMM registers. https://groups.google.com/g/x86-64-abi/c/vQcfj--osKs

@RalfJung
Copy link
Contributor

Further, when the avx/avx512f features are enabled at the function level (not globally, using attribute((target))), it passes parameters/return values:

So the ABI is different depending on whether the target feature is enabled globally vs in the current function? Ugh, that's a pretty critical bug, isn't it?

The case where the target features are missing is somewhat odd and maybe shouldn't even be accepted, but the case where the target feature is present for the current function should obviously work.

@brianosman
Copy link

Yes, the varying ABI is really troublesome. I dug into this with some coworkers earlier this year (for 256-bit vectors), and just recently shared our findings/thoughts here: https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi/?utm_source=share&utm_medium=web2x&context=3

TL;DR: llvm and gcc behave differently, and llvm's handling of the target attribute is only useful today if you write all of your vector code inside a giant loop with locals. The ABI issue means that it's dramatically better to continue compiling multiple versions with separate TUs (and separate flags).

aarongable pushed a commit to chromium/chromium that referenced this issue Nov 17, 2023
This will allow us to have separate -march= flags for Haswell
and Skylake pixel-conversion loops, which is important to dodge
a Clang bug (llvm/llvm-project#64706)

Bug: b/310927123
Change-Id: I828f0cf591a1c4d0a6260fef786c80e95b38ead7
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5038351
Auto-Submit: John Stiles <[email protected]>
Reviewed-by: Brian Osman <[email protected]>
Commit-Queue: Leon Scroggins <[email protected]>
Reviewed-by: Leon Scroggins <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1226103}
DanShaders added a commit to DanShaders/serenity that referenced this issue Jul 5, 2024
This warning is triggered when one accepts or returns vectors from a
function (that is not marked with [[gnu::target(...)]]) which would have
been otherwise passed in register if the current translation unit had
been compiled with more permissive flags wrt instruction selection (i.
e. if one adds -mavx2 to cmdline). This will never be a problem for us
since we (a) never use different instruction selection options across
ABI boundaries; (b) most of the affected functions are actually
TU-local.

Moreover, even if we somehow properly annotated all of the SIMD helpers,
calling them across ABI (or target) boundaries would still be very
dangerous because of inconsistent and bogus handling of
[[gnu::target(...)]] across compilers. See
llvm/llvm-project#64706 and
https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
DanShaders added a commit to DanShaders/serenity that referenced this issue Jul 11, 2024
This warning is triggered when one accepts or returns vectors from a
function (that is not marked with [[gnu::target(...)]]) which would have
been otherwise passed in register if the current translation unit had
been compiled with more permissive flags wrt instruction selection (i.
e. if one adds -mavx2 to cmdline). This will never be a problem for us
since we (a) never use different instruction selection options across
ABI boundaries; (b) most of the affected functions are actually
TU-local.

Moreover, even if we somehow properly annotated all of the SIMD helpers,
calling them across ABI (or target) boundaries would still be very
dangerous because of inconsistent and bogus handling of
[[gnu::target(...)]] across compilers. See
llvm/llvm-project#64706 and
https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
nico pushed a commit to SerenityOS/serenity that referenced this issue Jul 12, 2024
This warning is triggered when one accepts or returns vectors from a
function (that is not marked with [[gnu::target(...)]]) which would have
been otherwise passed in register if the current translation unit had
been compiled with more permissive flags wrt instruction selection (i.
e. if one adds -mavx2 to cmdline). This will never be a problem for us
since we (a) never use different instruction selection options across
ABI boundaries; (b) most of the affected functions are actually
TU-local.

Moreover, even if we somehow properly annotated all of the SIMD helpers,
calling them across ABI (or target) boundaries would still be very
dangerous because of inconsistent and bogus handling of
[[gnu::target(...)]] across compilers. See
llvm/llvm-project#64706 and
https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
alimpfard pushed a commit to alimpfard/serenity that referenced this issue Jul 15, 2024
This warning is triggered when one accepts or returns vectors from a
function (that is not marked with [[gnu::target(...)]]) which would have
been otherwise passed in register if the current translation unit had
been compiled with more permissive flags wrt instruction selection (i.
e. if one adds -mavx2 to cmdline). This will never be a problem for us
since we (a) never use different instruction selection options across
ABI boundaries; (b) most of the affected functions are actually
TU-local.

Moreover, even if we somehow properly annotated all of the SIMD helpers,
calling them across ABI (or target) boundaries would still be very
dangerous because of inconsistent and bogus handling of
[[gnu::target(...)]] across compilers. See
llvm/llvm-project#64706 and
https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface backend:X86
Projects
None yet
Development

No branches or pull requests

6 participants