-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test fails on macOS Sequoia (ARM M1) #1073
Comments
git bisecting locates the failing COMPLEX16 test to 2a87758 (as seen on my hardware/OS combo) |
I believe the four failing double precision calculations are also seen in CI. Bisecting on the output of From examining CI it doesn't look like these numerical outputs are used to pass/fail the test suite though. |
Argh. Thanks for the report. Shoo. That's going to be a tough one to debug. We'll have a look. |
In make.inc, Line 77 in 9128e20
you should be able to change BLASLIB to whatever you want. So, for example, you should be able to link with "macOS Accelerate for BLAS". I assume cmake as a similar option. |
Curiously, I got an additional numerical&other error in REAL (SGS failure with INFO=9 from SGGES3), and a single numerical error in COMPLEX instead of the one in COMPLEX16 when I reproduced it on the M1 in the GCC Compile Farm. (Using the homebrew gfortran-14.2.0_1 as well, only the system clang is at 14.0.0 - XCode 14.2 commandline tools are installed only). Our friendly neighborhood "minor testing failures" at work again ? |
I also note that the "rough bisect" pointed to the new NRM2 routines, which are also implicated in the SVG (dgesdd) divergence errors discussed in #672 |
The reason I started running these tests is because SciPy uses Accelerate as the underlying BLAS. We've just noticed that our test suite has fails with macOS15, whereas they weren't on macOS14. With macOS 14 --> 15 Apple updated Accelerate. Part of the change was a bump in LAPACK from 3.9.1 to 3.11. I've given Apple some feedback that we're experiencing problems, but the reproducer is currently showing how the SciPy test suite fails. I was trying to see if I could find another way of demonstrating that there was an issue, short of trying to come up with a specific fortran program. I therefore thought to try building this project against Accelerate, running the test suite, and if there were fails to point the Apple engineers towards something they may find easier to digest. I haven't succeeded in building and testing this project against Accelerate, but I did get as far as running the suite as-is. This highlighted the issues so I thought I'd report them. |
Hey @andyfaff sorry for the late response! I gave this a quick look and I am unable to reproduce the errors you are experiencing. I have tried on my AMD linux machine with a Ryzen 5 7640U processor. In addition I tried in a macOS VM built from a recovery image using an intel CPU. I have attached a screenshot of the same versions of gcc and gfortran installed with brew with no failures. Looking at your log, it seems that the errors are in the routines And based on https://netlib.org/lapack/installation.hints I am unsure what these failures could mean, is this what @martin-frbg was talking about? I appreciate any extra insight! |
They're visible on macOS Sequoia (os=15.1), with XCode 16.1. One way of you visualising the errors would be in CI, but it'll require |
The errors are not reproducible on x86_64 or in emulation, but they (and/or similar ones) are on actual M1 hardware. That's why I wrote that grumpy response about known accuracy issues in the testsuite. |
@jprhyne see also https://netlib.org/lapack/faq#_how_do_i_interpret_lapack_testing_failures (and I think one of the older issues about failed tests had a detailed explanation why some tests will always report a huge absolute error when they fail) |
Restoring the "old" dnrm2.f does fix the 4+4 errors in double precision, and restoring scnrm2.f removes the single one I got in COMPLEX (rather than COMPLEX16 as reported above). I'm still seeing a single error in SHS with |
Thank you so much, @martin-frbg, for having narrowed down the bug. This is fantastic work. In the dgesdd bug, the matrix is so big that it is hard to understand what is going on. This case, however, is much smaller and we may have a change of understanding the convergence failure
https://gist.github.com/andyfaff/ff02543e7ec9561b28d8a7c6702d43a0#file-testing_results-txt-L1404 Do you think you could just print the matrices? I hope that getting every step in Also, is there a difference if you change the compiler optimization level? I assume that you used the default Edit: |
@Developer-Ecosystem-Engineering you may be interested in this parallel issue that I experienced after opening the one in scipy. |
Thanks @andyfaff. We don't pull in BLAS at the moment, this is unlikely an issue on our end.
^ |
@angsch sorry, forgot to add that - errors reduce to a single failure in DOUBLE at or below
|
@martin-frbg Sorry for asking one more time: Did you reduce all files to |
Ah, sorry again - that is with |
so same reduction in error count (to the one in DDRGES3) with only nrm2 compiled at |
the decisive difference between |
Description
I cloned the repository to my machine (macOS Sequoia, M1. XCode 16.1),
The summary of the test suite (full log here):
On a related note, are there guidelines on how to direct the make file to use the macOS Accelerate for BLAS instead of the inbuilt librefblas?
Checklist
The text was updated successfully, but these errors were encountered: