-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a recursive xLARFT #1080
Conversation
…moved the previous version into VARIANTS
Merge branch 'Reference-LAPACK:master' into larft
SRC/dormqr.f
Outdated
@@ -309,8 +309,7 @@ SUBROUTINE DORMQR( SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, | |||
* Form the triangular factor of the block reflector | |||
* H = H(i) H(i+1) . . . H(i+ib-1) | |||
* | |||
CALL DLARFT( 'Forward', 'Columnwise', NQ-I+1, IB, A( I, | |||
$ I ), | |||
CALL DLARFT( 'Forward', 'Columnwise', NQ-I+1, IB, A( I, I ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that works, the last comma of the line (the one after A(I,I)) seems to be the 72nd character of the line, so I think that changes work. I am more worried than anything about these types of changes. See #1079 where @hjjvandam proposes a fix to get some lines below 72 characters when we add _64 here and there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is best to leave the PR focused on LARFT. so please do not commit changes on GELQF, GERQF, etc. Only [S,D,,C,Z]LARFT. If you want to do a separate commit to improve some formatting in some routines here and there, sure. But please separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I didn't think about the extensions. So in the failings that I have elsewhere, I'll keep the line length at or below 68 and hopefully that'll resolve those issues.
Thanks for the point to the PR!
SRC/la_xisnan.mod
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we want the .mod file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that could potentially be added to a gitignore file as I notice the file is created after I run 'make' so it seems to be a product of some compilation process. Not sure if its because of some machine tests that are failing (things like testing NaNs and Max behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Is this something that could potentially be added to a gitignore file." Good idea. Feel welcome to submit a separate pull request with this feature.
SRC/la_constants.mod
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we want the .mod file
*> \author Univ. of Colorado Denver | ||
*> \author NAG Ltd. | ||
* | ||
*> \ingroup larft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the old version shall live in VARIANTS, should the group be updated for the documentation? The other files in this folder carry "variant" in the name. Also, could you please add the variants to the build system? SRC/VARIANTS/Makefile
and SRC/VARIANTS/README
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the build system catch! It should be in there for the most recent commit I've made. I'm also hoping that I ironed out all the line length related issues (mentioned above)
Hey Y'all,
I've been working on a recursive implementation of the xLARFT routines. I have some figures below that show some increased performance for the DLARFT routine on its own as well as some performance graphs for when we use the DLARFT subroutine within DGEQRF.
One can argue that instead of using DGEQR2+DLARFT in DGEQRF one should use DGEQRT3 which is true, but DLARFT is useful in many other settings where a routine akin to DGEQRT3 does not exist.
Long story short: when we increase the block size of our routine calls (K in xLARFT), we see some drastic improvement in the performance of DLARFT. The performance improvement for DGEQRF are less impressive since DLARFT (with a reasonably small block size) represents a small portion of DGEQRF.
For the experiments, I used the HPC at my university, and to give context to the actual performance values, I have
included a figure that highlights the performance for square matrix-matrix multiplication.
flopMMM.pdf
newBlocksizePerf.pdf
Note: I previously attached an incorrect file for the above. Please disregard. Sorry for any confusion!
varyBlocksizeDlarftPerf.pdf
And if there is an interest in the methodology for how these graphs were generated or anything along
those lines, my development of these routines can be found in my xor-xungr repository
The main algorithm for the recursive xLARFT recurses down to the tree node (the case of N=1 or K=1), and
not much investigation was done about if an earlier bailout (to the old method or another one) could
increase performance, so this is something that could be investigated but through some brief toying around on
my machine, I did not see any real difference when using optimized BLAS (AOCL)
Since this method will be changing how xLARFT is being done, I also moved the existing version as a variant titled
'LL-LVL2'.
I appreciate any feedback on this approach or anything else done!
Checklist