SIMDMask<>.any() and .all() dramatically slower than scalar boolean operations #78203
Labels
bug
A deviation from expected or documented behavior. Also: expected but undesirable behavior.
triage needed
This issue needs more specific labels
Description
Converting arrays in an existing App to SIMD4 created noticeable slowdowns. Profile shows the bottlenecks are calls to
any(SIMDMask<SIMD4<Float>>)
orall(...)
calculating the same values with alternate simple methods resulted in a dramatic speedup:
Reproduction
Code listing with CPU profiler data. Note Bottleneck: 82% of function time attributed to
any()
The bottleneck is particularly apparent given the heavy computation in the rest of the function - calculating multi body gravitational attractions - dwarfed by checking 4 bits? : )
switching to the workaround code drops the boolean check from 82% to <1% and the other values bubble up appropriately. real world speed shows large increase. Functionality is identical aside from the performance change.
measuring execution time using osSignPoster
these durations are outside a loop which call the function 6250x in the test runs
Average duration x 6250. using any(): 31.85ms
Average duration x 6250. using workaround:: 9.98ms
Expected behavior
expected SIMDMask.any() and .all() to execute at least as efficiently as individual boolean comparisons
Environment
swift-driver version: 1.115 Apple Swift version 6.0.2 (swiftlang-6.0.2.1.2 clang-1600.0.26.4)
Target: arm64-apple-macosx14.0
Additional information
related bug (yet to be filed): similar bottleneck in SIMD4.max(), manual comparison is order of magnitude faster.
this is my first compiler bug report. if i’m doing it wrong, tell me how to do it right.
The text was updated successfully, but these errors were encountered: