-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor custom gemm heuristics #56
Conversation
…ns. Disabling the now obsolete LLMM1 path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good assuming performance testing shows no regressions.
weights.shape[0], | ||
dtype=inp_view.dtype, | ||
device='cuda') | ||
_custom_C.wvSpltK(weights, inp_view, out, n, self.cu_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something that needs to be changed right now but we probably want to refactor this eventually so that the MP core count is done at the C++ level: IMO not good decomposition to have it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Moving custom skinny gemm heuristic before hipblas or rocblas solutions.
Disabling the now obsolete LLMM1 path which is fully covered by the new kernel