Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daxpy 10x slower on macOS (Haswell) #1470

Closed
simonbyrne opened this issue Feb 22, 2018 · 19 comments
Closed

daxpy 10x slower on macOS (Haswell) #1470

simonbyrne opened this issue Feb 22, 2018 · 19 comments
Milestone

Comments

@simonbyrne
Copy link

simonbyrne commented Feb 22, 2018

Calls to daxpy seem to be about 10x slower on macOS than when run inside a Linux VM on the same machine (both give the config as "USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell").

This affects a lot of code which depends on it (e.g. dgebrd).

downstream: JuliaLang/LinearAlgebra.jl#501

@martin-frbg
Copy link
Collaborator

Nobody here currently has a Mac I fear. From anecdotal evidence in previous issues (#730, JuliaLang/julia#901) you could try removing the .align 16 in kernel/x86_64/daxpy_microk_haswell-2.c, or try with the current "develop" tree as that happens to have that microkernel replaced with the non-AVX2 Sandybridge one for unrelated reasons (the respective microkernel is #included at the top of kernel/x86_64/daxpy.c)

@brada4
Copy link
Contributor

brada4 commented Feb 23, 2018

Can you check if virtual machine has AVX2 inside in /proc/cpuinfo (i.e haswell or sandybridge is used)

@simonbyrne
Copy link
Author

Can you check if virtual machine has AVX2 inside in /proc/cpuinfo (i.e haswell or sandybridge is used)

Yes, it does.

@simonbyrne
Copy link
Author

Using the current develop branch (e3a80e6) doesn't fix it either.

@martin-frbg
Copy link
Collaborator

Does killing any and all .align16 in the daxpy microkernel file change anything ?

@simonbyrne
Copy link
Author

Yes, commenting that out in daxpy_microk_sandy-2.c seems to fix it.

@ViralBShah
Copy link
Contributor

ViralBShah commented Feb 23, 2018

Is it possible to perhaps have a new release once we fix this, given that we haven't had one for a long time? We're happy (in the Julia community) to try out an RC and give feedback before release.

@simonbyrne
Copy link
Author

Ah, does it have anything to do with this commit comment?

According to the Mac developer docs:

align_expression is a power of 2 between 0 and 15 (for example, the argument of .align 3 means 2 ^ 3 (8)–byte alignment)

As I understand, p2align should be consistent across platforms.

@martin-frbg
Copy link
Collaborator

@simonbyrne yes, sorry for not digging down to the original source earlier. And thanks for the pointer to p2align, certainly looks cleaner than adding ifdefs around each and every .align (though the underlying issue seems to be an Apple flaw, if their align is identical to p2align).
@ViralBShah I hope so, but xianyi has not been seen here for quite some time, and I still believe only he can do releases

@simonbyrne
Copy link
Author

No worries. Thanks for all your effort maintaining OpenBLAS.

@ViralBShah
Copy link
Contributor

Thank you. We might just apply the patch for now in that case.

@ViralBShah
Copy link
Contributor

I have emailed @xianyi about adding more people as admin and perhaps even moving the project to an OpenBLAS organization to help maintain it better.

@martin-frbg
Copy link
Collaborator

PR merged for Haswell and Sandybridge, from wikipedia I suspect Nehalem may also be needed for 2010-12 models of Mac Pro and iMac ?

@simonbyrne
Copy link
Author

I believe so: I think we build macOS Julia for Nehalem, perhaps @staticfloat can confirm?

@staticfloat
Copy link
Contributor

We build with DYNAMIC_ARCH=1 on OSX, so we build everything and architecture is chosen at runtime.

@martin-frbg
Copy link
Collaborator

PR incoming. I now understand that in the early assembly kernels from libGoto this issue was already catered for by using ALIGN_ macros from common_x86_64.h

@martin-frbg
Copy link
Collaborator

Actually according to the Github documentation I should have sufficient rights for creating a release. As that was never discussed as a possible "duty", I would still prefer to do this only if contact with xianyi cannot be established. Also I do not have access to the sourceforge repository used for providing precompiled windows version, or the openblas.net project page that links to it.

@simonbyrne
Copy link
Author

The current website seems to be built from @xianyi's personal GitHub page:
https://github.com/xianyi/xianyi.github.com/tree/master/OpenBLAS

@martin-frbg
Copy link
Collaborator

Tagging for 0.3.0 as xianyi created that milestone recently, hopefully the release will happen soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants