Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMDMask<>.any() and .all() dramatically slower than scalar boolean operations #78203

Open
S1D1T1 opened this issue Dec 15, 2024 · 1 comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. triage needed This issue needs more specific labels

Comments

@S1D1T1
Copy link

S1D1T1 commented Dec 15, 2024

Description

Converting arrays in an existing App to SIMD4 created noticeable slowdowns. Profile shows the bottlenecks are calls to
any(SIMDMask<SIMD4<Float>>) or all(...)
calculating the same values with alternate simple methods resulted in a dramatic speedup:

Reproduction

Code listing with CPU profiler data. Note Bottleneck: 82% of function time attributed to any()

        // sum the acceleration vectors acting on a given point, by a set of masses
       func gravAccelerationSum(_ posX:SIMD4<Float>,_ posY:SIMD4<Float>) -> (SIMD4<Float>,SIMD4<Float>) {
         var accelerationX:SIMD4<Float> = .zero
         var accelerationY:SIMD4<Float> = .zero
     
5.5%         for grav in self.gravs {
2.3%           let xDiffs = posX - grav.position.x
2.3%           let yDiffs = posY - grav.position.y
     
3.5%           let distanceSquared = xDiffs * xDiffs + yDiffs * yDiffs
     		// compare to range threshold. ignore if ALL 4 locations beyond a certain range
           let inRange = distanceSquared .< grav.influenceRadius4 // creates SIMDMask
     
82%           if any(inRange) {  // ** Bottleneck
    // also tried:   if !all(inRange), with comparable results
     //      if inRange[0] || inRange[1] || inRange[2] || inRange[3] { // ** WORKAROUND
0.8%           let n = distanceSquared + grav.softeningSquared
0.7%             let gravDenominator = n * n.squareRoot()
1.5%             accelerationX +=   (grav.attractionTimesTimeTimesBaseG) * xDiffs / gravDenominator
1.5%             accelerationY +=   (grav.attractionTimesTimeTimesBaseG) * yDiffs / gravDenominator
           }
         }
         return (accelerationX,accelerationY)
       }

The bottleneck is particularly apparent given the heavy computation in the rest of the function - calculating multi body gravitational attractions - dwarfed by checking 4 bits? : )
switching to the workaround code drops the boolean check from 82% to <1% and the other values bubble up appropriately. real world speed shows large increase. Functionality is identical aside from the performance change.

measuring execution time using osSignPoster

      let signpost = physicsSP.beginInterval("gravAccelerationSum",id:physicsSP.makeSignpostID())

      for i in 0..<SIMD4Count { // SIMD4Count = 6250 in this test case
        (gravInfluences.x[i], gravInfluences.y[i]) = gravAccelerationSum(particleLocations.x[i],particleLocations.y[i])
      }
      physicsSP.endInterval("gravAccelerationSum", signpost)

these durations are outside a loop which call the function 6250x in the test runs

Average duration x 6250. using any(): 31.85ms
Average duration x 6250. using workaround:: 9.98ms

Expected behavior

expected SIMDMask.any() and .all() to execute at least as efficiently as individual boolean comparisons

Environment

swift-driver version: 1.115 Apple Swift version 6.0.2 (swiftlang-6.0.2.1.2 clang-1600.0.26.4)
Target: arm64-apple-macosx14.0

  • Xcode 16.1 and 16.2 - tested vs both
  • macOS 14.7.1
  • Apple M2 Max cpu

Additional information

related bug (yet to be filed): similar bottleneck in SIMD4.max(), manual comparison is order of magnitude faster.

this is my first compiler bug report. if i’m doing it wrong, tell me how to do it right.

@S1D1T1 S1D1T1 added bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. triage needed This issue needs more specific labels labels Dec 15, 2024
@S1D1T1 S1D1T1 changed the title SIMDMask<>.any() dramatically slower than scalar boolean operations SIMDMask<>.any() and .all() dramatically slower than scalar boolean operations Dec 15, 2024
@S1D1T1
Copy link
Author

S1D1T1 commented Dec 18, 2024

this is related to, if not a duplicate of #72413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. triage needed This issue needs more specific labels
Projects
None yet
Development

No branches or pull requests

1 participant