Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qiang navi4x fp8 llama #9674

Closed
wants to merge 454 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
454 commits
Select commit Hold shift + click to select a range
131b217
adds wvSpltK optimization for skinny gemm. (#54)
amd-hhashemi Jun 18, 2024
3c86a03
fix 8k issue by changing max-context/seq len to 32k
lcskrishna Jun 19, 2024
719bf9d
Merge pull request #55 from ROCm/cl/fix-8k-issue
lcskrishna Jun 19, 2024
93aab3c
Adding quantized_weights_path arg for fp8 weights (#57)
charlifu Jun 19, 2024
4460294
Refactor custom gemm heuristics (#56)
gshtras Jun 20, 2024
b02fcb2
wvSpltK fix for 10GB+ output tensors
Jun 21, 2024
3e9dac6
Use uint64_t instead of unsigned long for clarity (#62)
mawong-amd Jun 21, 2024
c455e9c
fix for oob LDS fill in wvSpltK slm version (#63)
amd-hhashemi Jun 21, 2024
fa78403
[Kernel] Enable custom AR on ROCm (#27)
wenkaidu Jun 24, 2024
17e6307
fix error (#65)
charlifu Jun 24, 2024
3e7b0b6
Fix numpy and XGMI 1-hop detection (#67)
mawong-amd Jun 25, 2024
3200953
Fix linting (#68)
mawong-amd Jun 25, 2024
367aa5a
Merging fp8_gemm_tuner.py to gemm_tuner.py (#66)
charlifu Jun 25, 2024
014a9fc
Enabling some basic tests for ROCm 6.2
Alexei-V-Ivanov-AMD Jun 27, 2024
2a3cbf9
Merge branch 'main' of github.com:ROCm/vllm
Alexei-V-Ivanov-AMD Jun 27, 2024
616baa9
Wokaround for SWDEV-470361 (#69)
gshtras Jun 27, 2024
596d58c
Revert "Merge branch 'main' of github.com:ROCm/vllm" (#72)
mawong-amd Jun 28, 2024
cce6281
[2/2] Using xfail instead of skip for ROCm 6.2 tests (#70)
mawong-amd Jun 28, 2024
e162af9
Dockerfile updates: base image, preemptive uninstalls; restore ROCm 6…
mawong-amd Jun 28, 2024
1ee620e
return int64 type for solidx in tuning results (#74)
charlifu Jun 28, 2024
98105d5
CI tests for rocm/vllm:main as of 2024-06-28
Alexei-V-Ivanov-AMD Jun 28, 2024
270de2d
.
Alexei-V-Ivanov-AMD Jun 28, 2024
d6e7862
Merge pull request #77 from ROCm/qa_rocm_vllm_tests
Alexei-V-Ivanov-AMD Jun 28, 2024
52df169
Fix gradlib fp8 output (#76)
charlifu Jul 1, 2024
e45129d
Allocate workspace for hipblaslt fp8 gemm. (#78)
charlifu Jul 2, 2024
ec9e784
Mixtral moe tuning for mi308 (#80)
divakar-amd Jul 8, 2024
15d6f77
remove elementwise kernel
charlifu Jul 9, 2024
1584c3b
fix lint
charlifu Jul 9, 2024
c3e8349
Remove elementwise kernel before each fp8 gemm (#81)
charlifu Jul 9, 2024
9635554
Fix the Parameter flag
HaiShaw Jul 9, 2024
18902de
Merge branch 'main' into charlifu/avoid_tensor_creation_before_each_gemm
HaiShaw Jul 9, 2024
4a64124
Merge pull request #82 from ROCm/charlifu/avoid_tensor_creation_befor…
HaiShaw Jul 9, 2024
cddc83f
add TP=1 moe tuning for mixtral-8x7B (#84)
divakar-amd Jul 10, 2024
5fedcf5
Mixtral-8x22B tuning mi308x (#85)
divakar-amd Jul 11, 2024
5630555
larger input lens tuning 8x7B-TP=1,2,4 and 8x22B-TP=2,4 (#86)
divakar-amd Jul 12, 2024
cf39ee7
internal ci steps
adityagoel14 Jul 16, 2024
51a2b7d
internal ci run script
adityagoel14 Jul 16, 2024
5ceb541
changing deafult directory
adityagoel14 Jul 16, 2024
647655f
removed rocminfo for testing
adityagoel14 Jul 16, 2024
ec622aa
fixed root directory~
adityagoel14 Jul 16, 2024
59b1049
cloning rocm/vllm in each test
adityagoel14 Jul 17, 2024
743fc74
fixing directory bug
adityagoel14 Jul 17, 2024
86da58a
fixing directory bug
adityagoel14 Jul 17, 2024
bf30bae
fixing run script
adityagoel14 Jul 17, 2024
82ac4cb
Squashed revert of internal CI changes
mawong-amd Jul 17, 2024
0cc3e5f
changed dockerfile and default directories
adityagoel14 Jul 17, 2024
5389e91
forwarded tests/ examples/ benchmarks/ .buildkite/test-pipeline.yaml …
adityagoel14 Jul 18, 2024
1257c4f
fixed working_dir
adityagoel14 Jul 18, 2024
57916d8
forwarding does not work
adityagoel14 Jul 18, 2024
37c5acc
Reduce csv writes (#92)
charlifu Jul 18, 2024
6eb4d90
Initial Cython compiler / perf opt support (#98)
bensander Jul 22, 2024
6182b7d
ruff
gshtras Jul 22, 2024
ee6b689
PoC WiP using sync engine in server mode
gshtras Jul 23, 2024
e26dc14
Sampling params from request
gshtras Jul 23, 2024
def6171
TP>1 with MP hack
gshtras Jul 24, 2024
6f13165
Covering a weird case of an empty output
gshtras Jul 24, 2024
bb8bc74
Running a few engine iterations in between yields
gshtras Jul 24, 2024
bc4e239
Added command line args to the endpoint
gshtras Jul 25, 2024
f238fee
Bringing back tgemm. Configurable port/host
gshtras Jul 25, 2024
40c436d
Using lifespan instead of event. Cleanup
gshtras Jul 25, 2024
a438692
Type fix
gshtras Jul 25, 2024
5d4f7d8
update the env for no tuning mode for TunnableOps
hongxiayang Jul 25, 2024
68b6696
Returning the first token for each request immediately
gshtras Jul 25, 2024
dc5e1d1
Returning immediately also when a request is completed
gshtras Jul 25, 2024
eb8af02
Configurable yielding after first token and completed request
gshtras Jul 25, 2024
295b568
Logic fix
gshtras Jul 25, 2024
02679fe
fix the type error due to the miss-use of the logging module (#105)
liligwu Jul 26, 2024
db65e58
Refactor and cleanup
gshtras Jul 26, 2024
36b8194
Remove comment that aged like milk
gshtras Jul 26, 2024
fcab51c
Update Dockerfile.rocm
shajrawi Jul 26, 2024
8b91fea
Update a couple more comments
gshtras Jul 26, 2024
c23efe7
Update a couple more comments
gshtras Jul 26, 2024
edde7cc
Merge pull request #107 from ROCm/Triton-workaround
shajrawi Jul 26, 2024
b6a17a7
Merge pull request #106 from ROCm/greg/fast_server
shajrawi Jul 26, 2024
942ea24
converts wvSpltK reduce to pure dpp for further perf uplift. (#64)
amd-hhashemi Jul 26, 2024
a203871
Internal CI
adityagoel14 Jul 26, 2024
c73c75d
Small CI Bug
adityagoel14 Jul 26, 2024
304efb3
Temporarily revert "Fix 8K decode latency jump issue. " (#108)
mawong-amd Jul 26, 2024
9c69c19
adding a simple model invocation involving fp8 calculation/storage
Alexei-V-Ivanov-AMD Jul 27, 2024
fe5828d
Adding pytest wrapper.
Alexei-V-Ivanov-AMD Jul 27, 2024
f344924
.
Alexei-V-Ivanov-AMD Jul 27, 2024
9a1e2b5
.
Alexei-V-Ivanov-AMD Jul 27, 2024
6be15ab
.
Alexei-V-Ivanov-AMD Jul 27, 2024
94dd71d
.
Alexei-V-Ivanov-AMD Jul 27, 2024
c329175
.
Alexei-V-Ivanov-AMD Jul 27, 2024
b174e58
.
Alexei-V-Ivanov-AMD Jul 27, 2024
6a2d00e
.
Alexei-V-Ivanov-AMD Jul 27, 2024
0aa3ef8
.
Alexei-V-Ivanov-AMD Jul 27, 2024
98c2e72
Update test-pipeline.yaml
Alexei-V-Ivanov-AMD Jul 27, 2024
200fbea
Merge pull request #109 from ROCm/simple_fp8_inference_example
Alexei-V-Ivanov-AMD Jul 27, 2024
904b5b8
Adding bf16 output dtype for fp8 gemm (#111)
charlifu Jul 29, 2024
a6414b8
Running server and LLM in different processes (#110)
gshtras Jul 29, 2024
3e480e9
Fixed single GPU issue without setting up mp. Added toggles for serve…
gshtras Aug 2, 2024
42b1b9a
Add distributed executor backend to benchmark scripts (#118)
mawong-amd Aug 2, 2024
5fac73f
Add weight padding for moe (#119)
charlifu Aug 2, 2024
c034d5d
[BugFix] Fix navi build after many custom for MI kernels added (#116)
maleksan85 Aug 6, 2024
98f31cd
add emtpy_cache() after each padding (#120)
charlifu Aug 6, 2024
30f12f0
[FIX] Gradlib OOM on Navi and sometimes on MI (#124)
maleksan85 Aug 8, 2024
8608888
save shape when fp8 solution not found (#123)
charlifu Aug 8, 2024
f49dff3
Fix unit test for moe by adding padding (#128)
charlifu Aug 12, 2024
dd1a208
Llama3.1 (#129)
gshtras Aug 12, 2024
674da1d
chat/completions endpoint (#121)
gshtras Aug 13, 2024
636ff01
Optimize custom all reduce (#130)
iotamudelta Aug 14, 2024
d5bf9bc
Add BF16 support to custom PA (#133)
sanyalington Aug 14, 2024
4132cbe
Making check for output match in original types. It saves some memory…
maleksan85 Aug 14, 2024
4d2dda6
Make CAR ROCm 6.1 compatible. (#137)
iotamudelta Aug 14, 2024
e7c3a5c
Car revert (#140)
gshtras Aug 15, 2024
5945822
Using the correct datatypes for streaming non-chat completions (#134)
gshtras Aug 15, 2024
6a8793d
Adding UNREACHABLE_CODE macro for non MI300 and MI250 cards (#138)
maleksan85 Aug 15, 2024
7382dd5
gfx90a typo fix (#142)
maleksan85 Aug 16, 2024
cfab178
wvsplitk templatized and better tuned for MI300 (#132)
amd-hhashemi Aug 16, 2024
c1860d6
[Bugfix] Dockerfile.rocm (#141)
zstreet87 Aug 16, 2024
7c5fd50
Update test-template.j2 (#145)
okakarpa Aug 19, 2024
aa36718
Adding Triton implementations awq_dequantize and awq_gemm to ROCm (#136)
rasmith Aug 20, 2024
280db50
Adding fp8 padding (#144)
charlifu Aug 21, 2024
4e9830e
[hegeman/AWQ] Torch Int-4 AWQ Dequantization and Configuration Option…
hegemanjw4amd Aug 21, 2024
c7a3a47
Add buildkit requirement for building docker images (#149)
hongxiayang Aug 22, 2024
56217ad
cupy build fix for SWDEV-475036 (#147)
hongxiayang Aug 22, 2024
906be8e
fix outdated env for turning off triton flash attention (#151)
hongxiayang Aug 23, 2024
68db66a
Nccl env for performance (#152)
hongxiayang Aug 26, 2024
07b6b14
Render experiments (#159)
okakarpa Aug 28, 2024
9f7e830
Merge remote-tracking branch 'upstream/main'
gshtras Aug 28, 2024
5fe12cf
Merge remote-tracking branch 'upstream/main' into v5.5_upstream_merge_rc
gshtras Aug 28, 2024
056baed
Workaround PyTorch IPC handle issue (#161)
wenkaidu Aug 28, 2024
f5bfb03
Merge remote-tracking branch 'origin/main' into v5.5_upstream_merge_rc
gshtras Aug 28, 2024
b6ae399
moe exports required for test_moe_rocm. Type fix in sync_llm. Linting
gshtras Aug 29, 2024
384c141
Merge remote-tracking branch 'upstream/main' into v5.5_upstream_merge_rc
gshtras Aug 29, 2024
23aa669
Post merge regression fix
gshtras Aug 29, 2024
c24a495
rocm6.3 fix for docker build and debug option for gpu code (#157)
maleksan85 Aug 29, 2024
d28eaef
Merge remote-tracking branch 'upstream/main' into v5.5_upstream_merge_rc
gshtras Aug 29, 2024
216cfb1
Merge remote-tracking branch 'origin/main' into v5.5_upstream_merge_rc
gshtras Aug 29, 2024
a50159e
fp8 bulk convert is no longer experimental
gshtras Aug 29, 2024
65d921d
Temporary fix for fp8 tp>1 and scaled_mm for different torch versions
gshtras Aug 29, 2024
4e36cd9
Removed redundant checks for awq dequantize as in hip it always uses …
gshtras Aug 29, 2024
8295ea0
linter and unused import
gshtras Aug 29, 2024
be9f84e
Initial support for compressed-tensors quantization
gshtras Sep 3, 2024
05e67ab
Picking fixes from https://github.com/ROCm/vllm/pull/163/files by @ma…
gshtras Sep 3, 2024
7fd46eb
Merge remote-tracking branch 'upstream/main' into v5.5_upstream_merge_rc
gshtras Sep 3, 2024
7edb2fd
Update Dockerfile to 6.2, update ROCm components, remove Cython (#166)
mawong-amd Sep 4, 2024
46c5fed
Linters and adapting the sync server to upstream API changes
gshtras Sep 4, 2024
abcdce9
More linting
gshtras Sep 4, 2024
6d33657
Merge pull request #167 from ROCm/v5.5_upstream_merge_rc
gshtras Sep 4, 2024
c0a41fd
fnuz support for fbgemm fp8 (#169)
gshtras Sep 4, 2024
598314e
Merge remote-tracking branch 'upstream/main'
gshtras Sep 5, 2024
8032519
Fixing mypy after a rushed merge (#171)
gshtras Sep 5, 2024
b3fc9f4
[fix] moe padding for reading correct tuned config (#172)
divakar-amd Sep 6, 2024
dc1d65a
Merge remote-tracking branch 'upstream/main'
gshtras Sep 9, 2024
4a7f8d6
Merge pull request #174 from ROCm/upstream_merge_24_9_9
gshtras Sep 9, 2024
963f312
Restoring deleted .buildkite/test-template.j2 (#177)
Alexei-V-Ivanov-AMD Sep 10, 2024
5cf1c75
Support commandr on ROCm (#180)
shajrawi Sep 11, 2024
dc948ab
Correct type hint (#173)
gshtras Sep 11, 2024
78e6e0f
update custom PA kernel with support for fp8 kv cache dtype (#87)
sanyalington Sep 12, 2024
b1c3273
Support Grok-1 (#181)
kkHuang-amd Sep 12, 2024
b53c35d
Adding MLPerf optimization to 0.6.0 (#182)
charlifu Sep 12, 2024
164ce38
6.2 dockerfile (#176)
gshtras Sep 13, 2024
72d0cfb
[Grok1] fix the name of input scale factor for autofp8 run (#183)
kkHuang-amd Sep 13, 2024
daddc14
[bugfix] add multi-step advance_step to ROCmFlashAttentionMetadata
SolitaryThinker Sep 13, 2024
306f21f
add rocm to MULTI_STEP_ATTENTION_BACKENDS
SolitaryThinker Sep 13, 2024
0958045
[Grok-1] fix the run-time error "Can't pickle <class 'transformers_mo…
kkHuang-amd Sep 16, 2024
0f397c3
Merge remote-tracking branch 'upstream/main'
gshtras Sep 16, 2024
b0a39a4
New llm_engine output format
gshtras Sep 16, 2024
30a9875
Merge remote-tracking branch 'st/ms-rocm-advance-step' into upstream_…
gshtras Sep 16, 2024
c27753d
Fix tests - disable marlin_fiest_moe; fix rocm_paged attention
gshtras Sep 16, 2024
ad9026c
Merge pull request #187 from ROCm/upstream_merge_24_09_16
gshtras Sep 16, 2024
c68c242
remove redundant slice; match decode PA partition size with csrc (#188)
sanyalington Sep 17, 2024
6bd99d2
refactor dbrx experts to use FusedMoe layer (#186)
divakar-amd Sep 17, 2024
507c005
disable moe padding by default and enable fp8 padding by default (#190)
charlifu Sep 17, 2024
7d3690c
Enabling Splitting HW by Buildkite Agents (#191)
Alexei-V-Ivanov-AMD Sep 17, 2024
54e0441
Revert "remove redundant slice; match decode PA partition size with c…
gshtras Sep 18, 2024
40581f4
[Grok-1] 1. upload moe configuration file for moe kernel optimization…
kkHuang-amd Sep 18, 2024
d21cf99
Removing the original text in reminder_comment.yml (#195)
Alexei-V-Ivanov-AMD Sep 18, 2024
a67b65b
Fix PA custom and PA v2 tests and partition sizes (#196)
mawong-amd Sep 18, 2024
7094103
Adding P3L measurement to the benchmarks collection tools. (#197)
Alexei-V-Ivanov-AMD Sep 19, 2024
9d8035b
Swapping the order of sampling operations in the conditional selector…
Alexei-V-Ivanov-AMD Sep 19, 2024
0e80e85
remove redundant slice when chunked prefill feature is disabled (#201)
sanyalington Sep 20, 2024
bae9170
Fixing P3L incompatibility with cython. (#200)
Alexei-V-Ivanov-AMD Sep 20, 2024
87acddd
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_9_23
gshtras Sep 23, 2024
7e2ac48
isort
gshtras Sep 23, 2024
1f0d319
Bias and more metadata in gradlib and tuned gemm (#202)
gshtras Sep 23, 2024
6e370fc
Bias and more metadata in gradlib and tuned gemm (#202)
gshtras Sep 23, 2024
cebe70c
Merge remote-tracking branch 'origin/main' into upstream_merge_24_9_23
gshtras Sep 23, 2024
57ea101
Merge pull request #203 from ROCm/upstream_merge_24_9_23
gshtras Sep 23, 2024
48c0cb4
With chunked prefil, for large prompts, the sampler can encounter a z…
gshtras Sep 23, 2024
cc2039c
Revert "[Kernel] changing fused moe kernel chunk size default to 32k …
gshtras Sep 25, 2024
a5d87a1
re-enable avoid torch slice fix when chunked prefill is disabled (#209)
sanyalington Sep 26, 2024
5c50fca
add block_manager_v2.py into setup_cython: block_manager_v2 is used w…
sanyalington Sep 26, 2024
9858710
extend moe padding to DUMMY weights (#211)
divakar-amd Sep 26, 2024
c5b1012
Merge remote-tracking branch 'upstream/main' into main
gshtras Sep 27, 2024
1adaa9a
Add setuptools-scm requirement to requirements-rocm since we don't us…
gshtras Sep 27, 2024
b79f9f4
[Int4-AWQ] Fix AWQ Marlin check for ROCm (#206)
hegemanjw4amd Sep 27, 2024
aac2e0b
Merge branch 'main' into upstream_merge_24_09_27_0.6.2
gshtras Sep 27, 2024
a87da2b
RPD Profiling (#208)
dllehr-amd Sep 27, 2024
8850323
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Sep 27, 2024
0a5881d
Cythonize vllm build (#214)
maleksan85 Sep 27, 2024
3d2bd9b
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Sep 27, 2024
956b831
Fix Dockerfile.rocm (#215)
gshtras Sep 27, 2024
4f57e44
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Sep 27, 2024
2d7ab9e
fix dbrx weight loader (#212)
divakar-amd Oct 1, 2024
f49394a
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Oct 2, 2024
030374b
Merge pull request #213 from ROCm/upstream_merge_24_09_27_0.6.2
gshtras Oct 2, 2024
47d6392
Make rpdtracer import only when required (#216)
Rohan138 Oct 3, 2024
4cb422f
Improve profiling setup and documentation, sync benchmarks with main …
AdrianAbeyta Oct 3, 2024
4075b35
Installing the requirements before invoking setup.py since it now imp…
gshtras Oct 3, 2024
2550f14
llama3.2 + cross attn test (#220)
maleksan85 Oct 4, 2024
1992aa8
Factor out common weight loading code
DarkLight1337 Oct 8, 2024
e81645d
Fix EAGLE model loading
DarkLight1337 Oct 8, 2024
4ef043b
Improve efficiency
DarkLight1337 Oct 8, 2024
e723680
Rename
DarkLight1337 Oct 8, 2024
c60e921
Update LLaVA-NeXT-Video
DarkLight1337 Oct 8, 2024
89bde53
Optimize CAR for ROCm (#225)
iotamudelta Oct 8, 2024
9f12890
Automatic loading and save memory
DarkLight1337 Oct 8, 2024
10b5b0e
Rename
DarkLight1337 Oct 8, 2024
ce08df5
Update docstring
DarkLight1337 Oct 8, 2024
df687ac
Simplify
DarkLight1337 Oct 8, 2024
98bf417
Cleanup
DarkLight1337 Oct 8, 2024
decc7a4
Fully enable recursive loading
DarkLight1337 Oct 8, 2024
e59201a
Clarify
DarkLight1337 Oct 8, 2024
b51fe69
Custom PA perf improvements (#222)
sanyalington Oct 8, 2024
f538ab9
Fix incorrect semantics
DarkLight1337 Oct 8, 2024
f077865
Move function
DarkLight1337 Oct 8, 2024
56e4a33
Update error message
DarkLight1337 Oct 8, 2024
85c63c8
Fix Ultravox loading
DarkLight1337 Oct 8, 2024
42a3253
spacing
DarkLight1337 Oct 8, 2024
b21ccdf
Merge remote-tracking branch 'upstream/main'
gshtras Oct 8, 2024
e5a7def
Merge remote-tracking branch 'upstream/main' into main
gshtras Oct 8, 2024
3e72cae
Merge remote-tracking branch 'upstream/fix-weight-loading' into main
gshtras Oct 8, 2024
674b2a5
Merge remote-tracking branch 'origin/main' into upstream_merge_24_10_08
gshtras Oct 8, 2024
390efcb
Fix server
gshtras Oct 8, 2024
8fa419f
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Oct 8, 2024
a466f09
Upstream merge 24 10 08 (#226)
gshtras Oct 9, 2024
968345a
customPA write fp8 small ctx fix; enable customPA write fp8 by defaul…
sanyalington Oct 9, 2024
1ec8aaf
Added sccache timeout for vllm build (#230)
maleksan85 Oct 11, 2024
0e0e968
Add fp8 for dbrx (#231)
charlifu Oct 14, 2024
35e2c54
Update Buildkite env variable (#232)
dhonnappa-amd Oct 14, 2024
82cfa5a
cuda graph + num-scheduler-steps bug fix (#236)
seungrokj Oct 16, 2024
1658370
[Model] [BUG] Fix code path logic to load mllama model (#234)
tjtanaa Oct 16, 2024
6e79dcf
Merge remote-tracking branch 'origin/main' into upstream_merge_24_10_21
gshtras Oct 21, 2024
b10dad1
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Oct 21, 2024
634d9b0
yapf
gshtras Oct 21, 2024
e0b6bb4
prefix-enabled FA perf issue (#239)
seungrokj Oct 22, 2024
af76c9d
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 22, 2024
1eefd1e
Custom PA Partition size 256 to improve performance (#238)
sanyalington Oct 22, 2024
a594c0c
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 22, 2024
16cedce
[Build/CI] Minor changes to fix internal CI process. (#235)
Alexei-V-Ivanov-AMD Oct 22, 2024
87e3970
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 22, 2024
69d5e1d
[BUGFIX] Restored handling of ROCM FA output as before adaptation of …
maleksan85 Oct 23, 2024
be448fb
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 23, 2024
2a3f461
Merge pull request #240 from ROCm/upstream_merge_24_10_21
gshtras Oct 23, 2024
46aa3d2
Using the correct datatype on prefix prefill for fp8 kv cache (#242)
gshtras Oct 23, 2024
fe6f613
Add fp8 support for Llama model family on Navi4x
qli88 Oct 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .buildkite/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ set -o pipefail
echo "--- Confirming Clean Initial State"
while true; do
sleep 3
if grep -q clean /opt/amdgpu/etc/gpu_state; then
if grep -q clean ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}; then
echo "GPUs state is \"clean\""
break
fi
Expand Down Expand Up @@ -44,11 +44,11 @@ cleanup_docker

echo "--- Resetting GPUs"

echo "reset" > /opt/amdgpu/etc/gpu_state
echo "reset" > ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}

while true; do
sleep 3
if grep -q clean /opt/amdgpu/etc/gpu_state; then
if grep -q clean ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}; then
echo "GPUs state is \"clean\""
break
fi
Expand Down Expand Up @@ -139,8 +139,9 @@ if [[ $commands == *"--shard-id="* ]]; then
fi
done
else
echo "Render devices: $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES"
docker run \
--device /dev/kfd --device /dev/dri \
--device /dev/kfd $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES \
--network host \
--shm-size=16gb \
--rm \
Expand Down
38 changes: 38 additions & 0 deletions .buildkite/test-template.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{% set docker_image = "public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT" %}
{% set docker_image_amd = "rocm/vllm-ci:$BUILDKITE_COMMIT" %}
{% set default_working_dir = "vllm/tests" %}
{% set hf_home = "/root/.cache/huggingface" %}

steps:
- label: ":docker: build image"
depends_on: ~
commands:
- "docker build --build-arg max_jobs=16 --tag {{ docker_image_amd }} -f Dockerfile.rocm --progress plain ."
- "docker push {{ docker_image_amd }}"
key: "amd-build"
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 5
- exit_status: -10 # Agent was lost
limit: 5
agents:
queue: amd

{% for step in steps %}
{% if step.mirror_hardwares and "amd" in step.mirror_hardwares %}
- label: "AMD: {{ step.label }}"
depends_on:
- "amd-build"
agents:
queue: amd
commands:
- bash .buildkite/run-amd-test.sh "cd {{ (step.working_dir or default_working_dir) | safe }} ; {{ step.command or (step.commands | join(" && ")) | safe }}"
env:
DOCKER_BUILDKIT: "1"
priority: 100
soft_fail: true
{% endif %}
{% endfor %}
21 changes: 0 additions & 21 deletions .github/workflows/reminder_comment.yml

This file was deleted.

41 changes: 38 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ set(PYTHON_SUPPORTED_VERSIONS "3.8" "3.9" "3.10" "3.11" "3.12")
set(CUDA_SUPPORTED_ARCHS "7.0;7.5;8.0;8.6;8.9;9.0")

# Supported AMD GPU architectures.
set(HIP_SUPPORTED_ARCHS "gfx906;gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100")
set(HIP_SUPPORTED_ARCHS "gfx906;gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1200")

#
# Supported/expected torch versions for CUDA/ROCm.
Expand Down Expand Up @@ -152,13 +152,33 @@ else()
"${${VLLM_GPU_LANG}_SUPPORTED_ARCHS}")
endif()

#
# Setting up debug flags for pleasant debug experience.
#
set(CMAKE_${VLLM_GPU_LANG}_FLAGS_DEBUG "${CMAKE_${VLLM_GPU_LANG}_FLAGS_DEBUG} -O0 -ggdb3")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -O0 -ggdb3")

#
# Query torch for additional GPU compilation flags for the given
# `VLLM_GPU_LANG`.
# The final set of arches is stored in `VLLM_GPU_FLAGS`.
#
get_torch_gpu_compiler_flags(VLLM_GPU_FLAGS ${VLLM_GPU_LANG})

#
# Get supported FP8 format based on GPU arches
#
get_supported_fp8_format(FP8_FORMAT ${VLLM_GPU_LANG} "${VLLM_GPU_ARCHES}")
if(${FP8_FORMAT} STREQUAL "E4M3FN")
message(STATUS "FP8 format: E4M3FN")
list(APPEND VLLM_GPU_FLAGS "-DUSE_CUDA_FP8_FORMAT")
elseif(${FP8_FORMAT} STREQUAL "E4M3FNUZ")
message(STATUS "FP8 format: E4M3FNUZ")
list(APPEND VLLM_GPU_FLAGS "-DUSE_HIP_FP8_FORMAT")
elseif(${FP8_FORMAT} STREQUAL "CONFLICT")
message(FATAL_ERROR "Target architectures support different types of FP8 formats!")
endif()

#
# Set nvcc parallelism.
#
Expand All @@ -178,7 +198,14 @@ set(FETCHCONTENT_BASE_DIR "${PROJECT_ROOT_DIR}/.deps")
message(STATUS "FetchContent base directory: ${FETCHCONTENT_BASE_DIR}")

#
# Define other extension targets
# Set rocm version dev int.
#
if(VLLM_GPU_LANG STREQUAL "HIP")
list(APPEND VLLM_GPU_FLAGS "-DROCM_VERSION=${ROCM_VERSION_DEV_INT}")
endif()

#
# Define extension targets
#

#
Expand Down Expand Up @@ -381,6 +408,11 @@ if(VLLM_GPU_LANG STREQUAL "CUDA")
# if CUDA endif
endif()

if(VLLM_GPU_LANG STREQUAL "HIP")
list(APPEND VLLM_EXT_SRC
"csrc/custom_all_reduce.cu")
endif()

message(STATUS "Enabling C extension.")
define_gpu_extension_target(
_C
Expand Down Expand Up @@ -453,7 +485,10 @@ if(VLLM_GPU_LANG STREQUAL "HIP")
#
set(VLLM_ROCM_EXT_SRC
"csrc/rocm/torch_bindings.cpp"
"csrc/rocm/attention.cu")
"csrc/rocm/attention.cu"
"csrc/rocm/custom_kernels.cu"
"csrc/rocm/fused_kernels.cu"
"csrc/rocm/custom.cu")

define_gpu_extension_target(
_rocm_C
Expand Down
Loading