Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalapack queried LWORK for PXGESVD not always correct for large matrices #105

Open
john-young opened this issue Dec 10, 2024 · 3 comments

Comments

@john-young
Copy link

We are running into an issue where the returned value of LWORK by PXGESVD when queried is not always correct.

This issue is caused by the fact that the queried value is returned as a floating-point number but when the check is made for the actual computational call, the calculated LWORK used in the comparison is an integer that is never converted to a floating-point value. For large LWORK, not all integers can be represented exactly as an integer. In some cases, when the integer is converted to a floating-point value, the floating-point representation is smaller than the actual integer. So, when you pass it back in for the actual computation, the comparison check fails (and the work array is not sufficiently long).

For example, in PCGESVD, the code is
LWMIN = 1 + 2SIZEB + MAX(WATOBD,WBDTOSVD)
WORK(1) = CMPLX(LWMIN,0D+00) ! WHY DOUBLE PRECISION HERE ANYWAY?
RWORK(1) = REAL(1+4
SIZEB)
IF (WANTU.NE.1 .AND. .NOT. (LSAME(JOBU,'N'))) THEN
INFO = -1
ELSE IF (WANTVT.NE.1 .AND. .NOT. (LSAME(JOBVT,'N'))) THEN
INFO = -2
ELSE IF (LWORK.LT.LWMIN .AND. LWORK.NE.-1) THEN
INFO = -19
END IF

It is possible that the conversion to WORK(1) is actually less than LWMIN. Then, when LWORK is passed in for the actual computation, the check LWORK .LT.LWMIN will fail.

The failure is observed in single precision (PSGESVD/PCGESVD) but not, so far, in double precision (PDGESVD/PZGESVD). However, the issue should technically be there in double precision as well if somehow the work array could be big enough. Single precision can only represent all integers exactly up to 16777217 (2^24+1)and double precision can only represent all integers exactly up to (2^53+1).

As an example, for PCGESVD we had a matrix where LWMIN=20308525 but WORK(1)=20308524. The actual computational call produced an error.

Thanks,
John

@nchampag
Copy link

The attached contains a Fortran program that demonstrates the problem being encountered. The output of the program with LWORK = 20308525 is below:

LWORK = 20308525
REAL(LWORK) = 0.203085240000000E+08
DBLE(LWORK) = 0.203085250000000E+08
CMPLX(LWORK) = 0.203085240000000E+08 0.000000000000000E+00
DMPLX(LWORK) = 0.203085250000000E+08 0.000000000000000E+00

Note that the single-precision values are one less than the actual value. Furthermore, the difference may be worse for larger values. For LWORK = 203085253, the output is

LWORK = 203085253
REAL(LWORK) = 0.203085248000000E+09
DBLE(LWORK) = 0.203085253000000E+09
CMPLX(LWORK) = 0.203085248000000E+09 0.000000000000000E+00
DMPLX(LWORK) = 0.203085253000000E+09 0.000000000000000E+00

where the single-precision values are now five less than the actual value.

main.zip

@nchampag
Copy link

Just noticed that this is similar to issue #99.

@nchampag
Copy link

As mentioned here, there are routines that roundup LWORK that should be called in ScaLAPACK routines where LWORK is computed when LWORK is -1 (e.g., SROUNDUP_LWORK.

A workaround for this issue that has worked, so far, is to use the statement nested in the conditional present in SROUNDUP_LWORK to roundup LWORK immediately after LWORK is computed in a ScaLAPACK routine when LWORK is -1 when called. The conditional isn't used since we don't know the actual value of LWORK computed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants