-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling zpotrs_ introduces nans in unrelated results of cpotrf_ with Nehalem kernels #695
Comments
Can confirm for single- and multithreaded builds on Nehalem as well - using separate variables for the n and info in the zpotrs call (in case one of them were inadvertently IN+OUT) did not help. valgrind does not complain about uninitialized arrays in calls for either function. Have not tried with completely stock lapack in place of the optimized potr functions. |
In JuliaLang/LinearAlgebra.jl#260 it was reported that this happens on every core type except sandy bridge, haswell, and the recent AMD families. Comparing Cygwin 64's build (dynamic arch, but only blas from openblas, lapack uses netlib) vs Julia's mingw-w64 build (dynamic arch including openblas' lapack - also ILP64 and renamed symbols, but that doesn't appear to matter here):
|
Dumping the first few members of the sa array in potrf_U_single.c right before the call to POTF2_U Adding "feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);" to the test case, I get the first invalid operation trap here: Sorry if this is all just a red herring. |
Barking up a few more trees I now see that the call to cdotc_k() at line 77 of zpotf2_U.c suddenly returns a NaN on the second invocation of cpotrf_ when loop variable j==2 (up to this point, the ajj[0] produced by this function matched the values seen during the first invocation of cpotrf_, and array "a" appears to be undamaged on entry). This seems to implicate zdot_sse.S, that was last changed to fix spurious NaNs in relation to JuliaLang/julia#189. (Un)fortunately my ignorance of x86 assembly language prevents me from venturing further. |
Sorry again, zdot is just another victim, the culprit now seems to be GEMV_U (i.e. cgemv_t.S) that starts introducing NaNs into a[22],a[23],a[42],a[43],a[62],a[63],a[82],a[83] during the j==1 iteration of the aforementioned loop in zpotf2_U.c (all other values in this instance of the a matrix, and all values in "a" before and after invocations of GEMV_U up to that point being identical to those occuring during the first cpotrf_() call). |
BTW matzeri's fix for JuliaLang/julia#697 has no effect on this problem, although it also touches gemv. |
Replacing kernel/x86_64/cgemv_t.S with the pre-23965f1 version did not fix anything either, while forcing the use of the generic C implementation (setting CGEMVTKERNEL = ../arm/zgemv_t.c in KERNEL.NEHALEM as seen in KERNEL.generic) did lead to correct results. |
I do reproduce this bug with |
Perhaps it depends on what gets left in the buffer - Nehalem seems to have optimized trsm kernels (that appear to be referenced by the potrf functions) while Sandybridge and Haswell use the generic c implementations. Not sure when I get around to testing this hypothesis (and it would not explain why zeroing the buffer on entry to cgem as in JuliaLang/julia#746 does not help) |
Turns out to have nothing to do with trsm kernel variants - and I can actually take the entire KERNEL.SANDYBRIDGE setup file and substitute it for KERNEL.NEHALEM (most of the definitions therein being obviously unrelated to potrf/gemv), and the test case still fails in the same way unless I also make it use the generic CGEMVTKERNEL . I hope that some of my ramblings prove to be useful in debugging. |
@martin-frbg , I plan to set CGEMVTKERNEL to C kernel by default. Then, we improve the performance for some CPU architectures. |
This was running on a Haswell with Ubuntu 14.04. I have 4 separate arrays here,
A1
,A2
,A3
, andB
. The values inA1
andA3
are exactly equal, but running the same call tocpotrf_
before vs after an unrelatedzpotrs_
call onA2
andB
produces different results. The second call has somehow introduced some unexpectedNaN
s. If I comment out thezpotrs_
call then there are noNaN
s.Found in, and reduced from, JuliaLang/LinearAlgebra.jl#260
The text was updated successfully, but these errors were encountered: