Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build option for show vectorization info #4861

Merged
merged 2 commits into from
Aug 25, 2022

Conversation

daschuer
Copy link
Member

With -DINFO_VECTORIZE=ON all target compiler will list vectorized loops in the terminal output
Unfortunately enabling this rebuilds all files!

I have added this to compare the vectorization capability using the GitHub ci.

In this PR it is disabled, an enabled run can be found here:
https://github.com/daschuer/mixxx/actions/runs/2702452852

@github-actions github-actions bot added the build label Jul 20, 2022
@daschuer
Copy link
Member Author

daschuer commented Jul 21, 2022

This is a compasion of the loops in sample.cpp
The result of msvc is disaopinting: 13/39 loops vectorized

  Gcc 9.4 msvc clang Gcc 12.1
124 optimized optimized optimized optimized
145 optimized missed optimized optimized
153 optimized optimized optimized missed
169 optimized missed optimized optimized
189 optimized missed missed optimized
195 optimized missed missed optimized
205 optimized missed missed optimized
211 optimized missed missed optimized
222 optimized optimized optimized optimized
236 optimized missed optimized optimized
254 optimized missed optimized optimized
261 optimized optimized optimized optimized
236 optimized optimized optimized optimized
281 optimized optimized optimized optimized
304 optimized optimized optimized optimized
323 optimized optimized optimized optimized
352 optimized missed optimized optimized
359 optimized optimized optimized optimized
378 optimized optimized optimized optimized
391 optimized missed optimized optimized
407 optimized missed optimized optimized
433 optimized missed optimized optimized
444 optimized missed optimized optimized
456 optimized missed optimized optimized
477 optimized missed missed optimized
471 optimized missed missed optimized
492 optimized missed missed optimized
498 optimized missed missed optimized
512 optimized missed optimized optimized
522 optimized missed optimized optimized
533 optimized missed missed optimized
545 optimized missed optimized optimized
557 optimized missed optimized optimized
571 optimized missed missed optimized
585 optimized missed missed optimized
594 missed missed missed optimized
608 missed missed missed optimized

@Swiftb0y
Copy link
Member

The result of msvc is disaopinting: 13/39 loops vectorized

Yet another reason to use xsimd in favor of auto-vectorization

@daschuer
Copy link
Member Author

Yet another reason to use xsimd in favor of auto-vectorization

Yes, I agree.

@Swiftb0y
Copy link
Member

Great. The problem with that is that this requires large scale change to our audio processing. Essentially every xsimd-related code needs to be templated on the vector instruction set (unless we want to dynamically dispatch everywhere). Also the binary size and build time will be increased because we're generating the same code for many architectures. The advantage would be that portable builds could include avx2, sse4, etc code so the official portable binaries can run almost as fast as native builds.

@JoergAtGithub
Copy link
Member

JoergAtGithub commented Jul 21, 2022

This is a compasion of the loops in sample.cpp The result of msvc is disaopinting: 13/39 loops vectorized

Did you call all these compilers with the same processor instruction set extensions as target?

@daschuer
Copy link
Member Author

This is the output from our default build settings on GitHub. That goes to our release builds as well. Maybe our MSVC build flags are bad?

Do you have better results in you local build?

@JoergAtGithub
Copy link
Member

@Swiftb0y
Copy link
Member

Could you rerun your benchmark with /arch:AVX512 for MSVC (default is to use only SSE2 instruction set)

To provide portable builds, we can only distribute with SSE2. gcc only builds with sse2 and is able to a good job at autovectorization. So specifying a different target instruction set does not make sense. Especially not AVX512 which is only present on few modern high-end (usually desktop) CPUs.

@daschuer
Copy link
Member Author

I have testes the /arch:AVX512 flag here: https://github.com/daschuer/mixxx/runs/7462994413?check_suite_focus=true
No improvement.

@JoergAtGithub
Copy link
Member

Thanks for testing! It seems, that MSVCs auto-vectorizer is that poor: http://0x80.pl/notesen/2021-02-17-autovectorization-msvc.html

@daschuer
Copy link
Member Author

@ferranpujolcamins Is this ready for merge?

@ferranpujolcamins ferranpujolcamins merged commit f3ad153 into mixxxdj:main Aug 25, 2022
@daschuer daschuer deleted the info-vec-optimized-main branch November 16, 2022 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants