Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a recursive xLARFT #1080

Merged
merged 21 commits into from
Dec 3, 2024
Merged

Adding a recursive xLARFT #1080

merged 21 commits into from
Dec 3, 2024

Conversation

jprhyne
Copy link
Contributor

@jprhyne jprhyne commented Nov 30, 2024

Hey Y'all,

I've been working on a recursive implementation of the xLARFT routines. I have some figures below that show some increased performance for the DLARFT routine on its own as well as some performance graphs for when we use the DLARFT subroutine within DGEQRF.

One can argue that instead of using DGEQR2+DLARFT in DGEQRF one should use DGEQRT3 which is true, but DLARFT is useful in many other settings where a routine akin to DGEQRT3 does not exist.

Long story short: when we increase the block size of our routine calls (K in xLARFT), we see some drastic improvement in the performance of DLARFT. The performance improvement for DGEQRF are less impressive since DLARFT (with a reasonably small block size) represents a small portion of DGEQRF.

For the experiments, I used the HPC at my university, and to give context to the actual performance values, I have
included a figure that highlights the performance for square matrix-matrix multiplication.

flopMMM.pdf

newBlocksizePerf.pdf
Note: I previously attached an incorrect file for the above. Please disregard. Sorry for any confusion!

varyBlocksizeDlarftPerf.pdf

And if there is an interest in the methodology for how these graphs were generated or anything along
those lines, my development of these routines can be found in my xor-xungr repository

The main algorithm for the recursive xLARFT recurses down to the tree node (the case of N=1 or K=1), and
not much investigation was done about if an earlier bailout (to the old method or another one) could
increase performance, so this is something that could be investigated but through some brief toying around on
my machine, I did not see any real difference when using optimized BLAS (AOCL)

Since this method will be changing how xLARFT is being done, I also moved the existing version as a variant titled
'LL-LVL2'.

I appreciate any feedback on this approach or anything else done!
Checklist

  • The documentation has been updated.
  • If the PR solves a specific issue, it is set to be closed on merge.

SRC/dormqr.f Outdated
@@ -309,8 +309,7 @@ SUBROUTINE DORMQR( SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC,
* Form the triangular factor of the block reflector
* H = H(i) H(i+1) . . . H(i+ib-1)
*
CALL DLARFT( 'Forward', 'Columnwise', NQ-I+1, IB, A( I,
$ I ),
CALL DLARFT( 'Forward', 'Columnwise', NQ-I+1, IB, A( I, I ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that works, the last comma of the line (the one after A(I,I)) seems to be the 72nd character of the line, so I think that changes work. I am more worried than anything about these types of changes. See #1079 where @hjjvandam proposes a fix to get some lines below 72 characters when we add _64 here and there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is best to leave the PR focused on LARFT. so please do not commit changes on GELQF, GERQF, etc. Only [S,D,,C,Z]LARFT. If you want to do a separate commit to improve some formatting in some routines here and there, sure. But please separate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't think about the extensions. So in the failings that I have elsewhere, I'll keep the line length at or below 68 and hopefully that'll resolve those issues.

Thanks for the point to the PR!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we want the .mod file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that could potentially be added to a gitignore file as I notice the file is created after I run 'make' so it seems to be a product of some compilation process. Not sure if its because of some machine tests that are failing (things like testing NaNs and Max behavior)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Is this something that could potentially be added to a gitignore file." Good idea. Feel welcome to submit a separate pull request with this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we want the .mod file

*> \author Univ. of Colorado Denver
*> \author NAG Ltd.
*
*> \ingroup larft
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the old version shall live in VARIANTS, should the group be updated for the documentation? The other files in this folder carry "variant" in the name. Also, could you please add the variants to the build system? SRC/VARIANTS/Makefile and SRC/VARIANTS/README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the build system catch! It should be in there for the most recent commit I've made. I'm also hoping that I ironed out all the line length related issues (mentioned above)

@langou langou merged commit 6ec7f2b into Reference-LAPACK:master Dec 3, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants