Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qiang navi4x fp8 llama #9674

Closed
wants to merge 454 commits into from
Closed

Conversation

qli88
Copy link

@qli88 qli88 commented Oct 24, 2024

[Misc] Add FP8 support for Llama model family on Navi4x

amd-hhashemi and others added 30 commits June 18, 2024 18:47
* adds wvSpltK optimization for skinny gemm.


---------

Co-authored-by: Hashem Hashemi <[email protected]>
Fix 8K decode latency jump issue.
* add quantization_weights_path for fp8 weights

* fix lint

* fix lint

* change to quantized_weights_path

* fix lint
* Moving custom skinni gemm heuristic before hipblas or rocblas solutions. Disabling the now obsolete LLMM1 path

* Simplified the decision logic

* Added back one case when LLMM1 can be used. Defaulting to adding bias separately

* Moved bias addition inside tgemm
* [Kernel] Enable custome AR on ROCm

* Install amdsmi in Docker in preparation for custom all reduce

(cherry picked from commit f6cfb9bf31e9feeefbdedecf2165f80dd0564b75)

* Fix for yapf

* Linting and small fixes to vLLM syntax

(cherry picked from commit 2cf8103bfb0afce59b28a06c5bbe905983c42728)

---------

Co-authored-by: Matthew Wong <[email protected]>
* Fix 1-hop XGMI detection

* Fix numpy versioning
* adding input type

* merge gradlib_fp8 to gradlib

* using fp8

* fix lint

* fix lint
* Wokaround for SWDEV-470361. Calling the version of setProblem that does not cause integer overflow on large gemm shapes

* clang-format
This reverts commit 2a3cbf9, reversing
changes made to 367aa5a.
* Enabling some basic tests for ROCm 6.2

Use strict xfail for ROCm 6.2 test repairs

* Use lenient xfail instead

---------

Co-authored-by: Alexei V. Ivanov <[email protected]>
….2 metrics test (#73)

* Dockerfile updates: base image; preemptive uninstalls

* Remove ROCm 6.2 xfails from metrics test
Let's hope float64 internal to pandas dataframe is good enough.
[Build/CI] tests for rocm/vllm:main as of 2024-06-28
* fix gradlib fp8 output

* add condition check for existing tune result

* fix linter

* fix import order

* fix lint
* Initializing hipblaslt workspace for fp8 gemms

* make workspace size configurable

* assign default value for worksapce pointer

* fix clang-format

* fix clang-format

---------

Co-authored-by: Gregory Shtrasberg <[email protected]>
* update tuning script to match new api

* add mi308 configs for TP=8,4,2

* nit: ruff isort and argparse fix
* nit: make yapf happy
* nit: yapf happy-2
* remove elementwise kernel

* fix lint
dhonnappa-amd and others added 17 commits October 14, 2024 10:39
* cuda graph + num-scheduler-steps bug fix

* cuda graph + num-scheduler-steps bug fix

* linting
* fix code path logic to load mllama model

* fix lint error

* fix lint error

---------

Co-authored-by: tjtanaa <[email protected]>
* prefix-enabled FA perf issue

* split ENC, DEC/ENC_DEC

* lint
* add option to adjust partition size

* changed CPA partition size to 256 in rocm attention backend

* support context length 128K with partition size 256
* Not important variation to create a dummy PR for CI testing.

* Skipy & numba updates in a timely manner.

* fixing environment variables

* Changing the installation to use requirements-rocm.txt

---------

Co-authored-by: Alexei Ivanov <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
…llama3.2 (#241)

* improved handling of output to be the same as before

* after merge correction

---------

Co-authored-by: Aleksandr Malyshev <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@qli88 qli88 closed this Oct 24, 2024
@qli88 qli88 deleted the qiang-navi4x-fp8-llama branch October 25, 2024 00:00
@qli88 qli88 restored the qiang-navi4x-fp8-llama branch October 25, 2024 00:01
@qli88 qli88 deleted the qiang-navi4x-fp8-llama branch October 25, 2024 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.