[Feature] vLLM ARM Enablement for AARCH64 CPUs #9228

sanketkaleoss · 2024-10-10T06:17:01Z

Description
This PR enables support of vLLM for AARCH64 architecture. Motivated by the requirements from (#176, #5741, etc), I implemented this PR which enables the ARM path for CPU inference.

ARM Compatibility: Modified the build scripts and configuration files to ensure compatibility with ARM processors. It currently supports float32, fp16 and bfloat16 datatypes.

Motivation
Enabling vLLM on ARM architecture broadens its usability, allowing it to run on a wider range of devices, including those with ARM processors. This enhancement is crucial for expanding the reach and applicability of vLLM in various use cases.

Checklist

Code changes have been tested on ARM devices (Graviton3).

Modifications

Added NEON intrinsics for enabling vLLM on ARM
Updated requirements-cpu.txt
Added ARM path in cpu_extension.cmake

Performance Results

Model : facebook/opt-125m
Datatype : float32

github-actions · 2024-10-10T06:17:12Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

sanketkaleoss · 2024-10-22T05:13:24Z

@youkaichao @ChipKerchner @mgoin @pathorn @manojnkumar I have added a new feature that allows vLLM to run on ARM CPU backend. I have tested on AWS Graviton 3E. Please have a look at this PR.

mgoin

Nice work @sanketkaleoss! This seems reasonable to me as a base implementation. It seems only compute in fp32 is supported, is that right?
It would be good to update the cpu installation documentation with how to build and to also add a new Dockerfile.arm

mgoin · 2024-10-22T10:55:57Z

cmake/cpu_extension.cmake

+    message(STATUS "ARMv8 architecture detected")
+    list(APPEND CXX_COMPILE_FLAGS
+        "-mcpu=native"
+        "-march=armv8.6-a"


Why was this specific arch chosen?

armv8.6-a is the first architecture that supports advance SIMD instructions, bf16 support and SVE support. That's why I chose this specific arch.

This is fine for now, but for instance I think Apple's M1 CPU uses ARMv8.4-A so we should consider supporting older versions

I see, noted.

According to wikichip, Graviton3 is also ARMv8.4A -- is the Graviton3E different? I haven't found any documentation on it

@tlrmchlsmth You're right, graviton 3 has armv8.4. The code needs 8.6 only for bf16 dependencies. But if I just use "-mcpu=native" or "-march=armv8.4-a+bf16" the code works fine on Graviton3 instances. What do you suggest here?

If -march=armv8.4-a+bf16 works, then I suggest using that for this PR.

In general, I think it's best to go with a fairly minimal ISA version, and then explicitly specify what ISA extensions we want to use, especially because ARM has many extensions that are optional for several ISA versions before they become mandatory

BTW, the M1 Mac (and I think M2) does not support BF16 NEON so unfortunately this wouldn't help there.

Hi @mgoin , @tlrmchlsmth I've added the dockerfile and documentation. Changed the arm arch to native. Please have a look.

sanketkaleoss · 2024-10-22T11:29:25Z

Nice work @sanketkaleoss! This seems reasonable to me as a base implementation. It seems only compute in fp32 is supported, is that right? It would be good to update the cpu installation documentation with how to build and to also add a new Dockerfile.arm

Thanks @mgoin for the review. It supports fp32 and bf16 as of now. Sure, I'll work on updating the documentation and adding Dockerfile too.

youkaichao · 2024-10-24T03:24:54Z

this is great, I will hand it over to @mgoin for detailed review.

it would be better if we can support macos m chips as well, they are also arm chips.

sanketkaleoss · 2024-10-24T05:18:04Z

this is great, I will hand it over to @mgoin for detailed review.

it would be better if we can support macos m chips as well, they are also arm chips.

Thanks @youkaichao . I'll add that to future work.

tlrmchlsmth · 2024-10-25T14:18:00Z

cmake/cpu_extension.cmake

+    message(STATUS "ARMv8 architecture detected")
+    list(APPEND CXX_COMPILE_FLAGS
+        "-mcpu=native"
+        "-march=armv8.6-a"


If -march=armv8.4-a+bf16 works, then I suggest using that for this PR.

In general, I think it's best to go with a fairly minimal ISA version, and then explicitly specify what ISA extensions we want to use, especially because ARM has many extensions that are optional for several ISA versions before they become mandatory

tlrmchlsmth · 2024-10-25T14:26:13Z

cmake/cpu_extension.cmake

+    message(STATUS "ARMv8 architecture detected")
+    list(APPEND CXX_COMPILE_FLAGS
+        "-mcpu=native"
+        "-march=armv8.6-a"


BTW, the M1 Mac (and I think M2) does not support BF16 NEON so unfortunately this wouldn't help there.

tlrmchlsmth · 2024-10-25T14:27:23Z

csrc/cpu/cpu_types_arm.hpp

+
+namespace vec_op {
+
+// FIXME: FP16 is not fully supported in Torch-CPU


@sanketkaleoss do you have plans to support FP16 in a future PR? I see that it's partially implemented.

Do you know what the problem with FP16 in Torch-CPU is?

I'm not sure what's the problem there. I can add the support in future PR if torch-CPU supports FP16 in future.

caijijuhe · 2024-10-30T11:51:37Z

Hello, I followed your Dockerfile.arm encountered during the build process。
ERROR: ld.so: object '/usr/lib/aarch64-linux-gnu/libtcmalloc_minimal.so.4' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Can you share the built image

tlrmchlsmth · 2024-11-06T00:35:21Z

cmake/cpu_extension.cmake

 elseif (POWER9_FOUND OR POWER10_FOUND)
    message(STATUS "PowerPC detected")
    # Check for PowerPC VSX support
    list(APPEND CXX_COMPILE_FLAGS
        "-mvsx"
        "-mcpu=native"
        "-mtune=native")
+
+elseif (ASIMD_FOUND)
+    message(STATUS "ARMv8 architecture detected")


Since it could be ARMv9:

Suggested change

message(STATUS "ARMv8 architecture detected")

message(STATUS "ARMv8 or later architecture detected")

or maybe explicitly call out NEON instead

tlrmchlsmth · 2024-11-06T00:40:11Z

cmake/cpu_extension.cmake

+        "-mcpu=native"
+        "-mtune=native"


Native is OK for local development, however the binaries won't be portable at all.

I actually thought you were on the right track before this change. We really ought to explicitly specify a fairly minimal base ARM architecture (maybe ARMv8.4) and then explicitly the set of extensions that we need to build (BF16, and maybe FP16 and DotProd?)

I see, I'll try to set it as armv8.4 as a base implementation. Then, depending on the flags it would run bf16 or not.

Could you try using "-march=armv8.4-a+bf16+dotprod+fp16"?

Just for your guys reference, on arm64 -mcpu= acts as both specifying the appropriate architecture and tuning and it's generally better to use that vs -march if you're building for a specific CPU, you can find more details from here if you are running any test on Graviton: https://github.com/aws/aws-graviton-getting-started/blob/main/c-c++.md

Could you try using "-march=armv8.4-a+bf16+dotprod+fp16"?

Yes, I tried it and it works. It even works with "-march=armv8.2-a+bf16+dotprod+fp16" making it compatible with Graviton2 as suggested by @ddynwzh1992 .

mergify · 2024-11-06T09:10:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. @sanketkaleoss please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sanketkaleoss · 2024-11-12T11:13:25Z

@mgoin @tlrmchlsmth I have created separate paths for FP32 and BF16 datatypes. This code now works on any arm machine from armv8.2-a onwards. Even if BF16 support is not present, it will choose FP32 path. Please have a look.

mergify · 2024-11-17T11:34:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sanketkaleoss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sanketkaleoss · 2024-11-18T08:22:52Z

Hi @mgoin @tlrmchlsmth @ddynwzh1992 , I've implemented the suggested changes and now ARM CPU backend path supports FP32, FP16 and BF16 datatypes. Support for MAC devices has been added. I've tested on -march = armv8.2-a onwards and it runs fine even without the added extensions. Please review the changes.

ShawnD200 · 2024-11-18T10:54:02Z

Even if BF16 support is not present, it will choose FP32 path.

Maybe I missed something, how will it choose FP32 path if bp16 not supported? Thank you.

sanketkaleoss · 2024-11-18T11:00:07Z

Even if BF16 support is not present, it will choose FP32 path.

Maybe I missed something, how will it choose FP32 path if bp16 not supported? Thank you.

Earlier it used to give compilation error even if we just want to use FP32 and not BF16, as there were no separate paths for FP32 and Bf16 in x86 backend case. Now, it will compile successfully even if the system doesn't have BF16 extension. User can run by selecting FP32 dtype in that case.

sanketkaleoss · 2024-11-19T05:49:23Z

@mgoin @tlrmchlsmth @WoosukKwon @youkaichao @DarkLight1337 Your attention is required on this PR. Thanks.

ShawnD200 · 2024-11-19T14:26:48Z

Even if BF16 support is not present, it will choose FP32 path.

Maybe I missed something, how will it choose FP32 path if bp16 not supported? Thank you.

Earlier it used to give compilation error even if we just want to use FP32 and not BF16, as there were no separate paths for FP32 and Bf16 in x86 backend case. Now, it will compile successfully even if the system doesn't have BF16 extension. User can run by selecting FP32 dtype in that case.

I thought you meant BF16 is supported without its hardware feature.

Edited: from Apple M2, BF16 has been supported.

ShawnD200 · 2024-11-19T14:34:39Z

csrc/cpu/cpu_types_arm.hpp

+        if (remainder > 0) {
+            float16x8_t temp = reg.val[full_blocks];
+            for (int i = 0; i < remainder; ++i) {
+                reinterpret_cast<__fp16*>(ptr)[full_blocks * 8 + i] = vgetq_lane_f16(temp, i);


Hello, are you sure this line can compile? The index must be a compile-time constant, no?

Thanks for the review. It compiles on my system as of now.

Thanks for the reply. I found that with gcc v12.3 (Ubuntu), optimization O1 and above does not require const (while O0 does), with clang v15.0 (macOS), I haven't been able to compile it.

Thanks for the reply. I found that with gcc v12.3 (Ubuntu), optimization O1 and above does not require const (while O0 does), with clang v15.0 (macOS), I haven't been able to compile it.

I see, thanks for the information.

mergify · 2024-11-19T17:35:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sanketkaleoss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Sanket Kale <[email protected]>

…ture to native Signed-off-by: Sanket Kale <[email protected]>

Signed-off-by: Sanket Kale <[email protected]>

sanketkaleoss · 2024-11-20T05:54:20Z

This essentially looks good to me, left a few comments. Please resolve the merge conflict and I can set the ready tag for CI to run.

I was able to test and run this on my M1 Macbook - nice work!

Server:

docker build -f Dockerfile.arm -t vllm-cpu-env --shm-size=4g . 
docker run -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host vllm-cpu-env --model Qwen/Qwen2.5-0.5B-Instruct --dtype float16

Client:

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer DUMMY" \
  -d '{
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the best pie?"
      }
    ]
  }'

Output:

{"id":"chatcmpl-210cbdc345ce48d9988a9aced2f83300","object":"chat.completion","created":1732058836,"model":"Qwen/Qwen2.5-0.5B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"As an AI language model, I don't have personal preferences or tastes, but I can provide information about different types of pies based on popular opinion.\n\nThe best pie for everyone depends on personal taste, preferences, and dietary needs. Here are some popular types of pies:\n\n1. Pumpkin pie: It is a classic pie that uses pumpkins as a main ingredient. It is often served with crème fraîche, whipped cream, or chocolate ganache.\n\n2. Apple pie: Apple pie is a sweet and fruity pie that is popular in the autumn season. It uses apples as a main ingredient and is often served with whipped cream or chocolate ganache.\n\n3. Blueberry pie: Blueberry pie is a classic pie that uses blueberries as a main ingredient. It is often served with whipped cream or chocolate ganache.\n\n4. Chocolate pie: Chocolate pie is a sweet and rich pie that is popular in the winter season. It uses chocolate as a main ingredient and is often served with whipped cream or chocolate ganache.\n\n5. Lemon pie: Lemon pie is a fresh and zesty pie that is popular in the summer season. It uses lemons as a main ingredient and is often served with whipped cream or chocolate ganache.\n\nUltimately, the best pie for everyone depends on their personal preferences and dietary needs.","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":25,"total_tokens":290,"completion_tokens":265,"prompt_tokens_details":null},"prompt_logprobs":null}

Thanks for the detailed review @mgoin ! I have rebased and implemented the suggested changes. Please proceed with the CI checks.

mgoin

Sorry for losing track, merged with main to see if green

sanketkaleoss · 2024-11-26T03:42:13Z

Sorry for losing track, merged with main to see if green

No worries, looking forward to the future work on this PR.

Signed-off-by: Sanket Kale <[email protected]> Co-authored-by: Sanket Kale <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

animalnots · 2024-12-05T01:39:24Z

@sanketkaleoss Hi, thank you for the PR. Does additional work needs to be done to support Qwen/Qwen2-VL-2B-Instruct ?

sidharthrajaram · 2024-12-13T02:11:55Z

@animalnots Did you figure that out? There's an issue related to it: #11154

Signed-off-by: Sanket Kale <[email protected]> Co-authored-by: Sanket Kale <[email protected]> Co-authored-by: mgoin <[email protected]>

animalnots · 2024-12-14T10:12:32Z

@animalnots Did you figure that out? There's an issue related to it: #11154

I ended up using this

satvikahuja/Easy-qwen2vlm2b-4macbook#1

Vllm never worked and instead would output "!!!!!!!!!..."

sanketkaleoss · 2024-12-16T05:14:56Z

@sanketkaleoss Hi, thank you for the PR. Does additional work needs to be done to support Qwen/Qwen2-VL-2B-Instruct ?

Hi, sorry for the late reply. Does the model work with x86 CPU path?

sanketkaleoss changed the title ~~vLLM ARM Enablement for AARCH64 CPUs~~ [Feature] vLLM ARM Enablement for AARCH64 CPUs Oct 22, 2024

mgoin reviewed Oct 22, 2024

View reviewed changes

tlrmchlsmth reviewed Oct 25, 2024

View reviewed changes

mergify bot added ci/build documentation Improvements or additions to documentation labels Nov 4, 2024

tlrmchlsmth reviewed Nov 6, 2024

View reviewed changes

mergify bot added the needs-rebase label Nov 6, 2024

sanketkaleoss force-pushed the vllm_arm_enablement branch from 9589161 to 3d46d2e Compare November 14, 2024 12:54

mergify bot removed the needs-rebase label Nov 14, 2024

mergify bot added the needs-rebase label Nov 17, 2024

sanketkaleoss force-pushed the vllm_arm_enablement branch from 47adfa6 to 4330083 Compare November 18, 2024 06:41

mergify bot removed the needs-rebase label Nov 18, 2024

sanketkaleoss force-pushed the vllm_arm_enablement branch from 9fb51cb to 73726e1 Compare November 18, 2024 09:25

ShawnD200 mentioned this pull request Nov 18, 2024

[Hardware][CPU] Add ARM CPU backend #9957

Closed

ShawnD200 reviewed Nov 19, 2024

View reviewed changes

Sanket Kale added 12 commits November 20, 2024 10:56

Added Dockerfile for ARM architecture support

85df3b5

Signed-off-by: Sanket Kale <[email protected]>

Added Documenataion for ARM architecture support and updated architec…

56054cb

…ture to native Signed-off-by: Sanket Kale <[email protected]>

Updated documentation

17896d3

Signed-off-by: Sanket Kale <[email protected]>

Added compatibility with macos by creating separate fp32 and bf16 paths

85a453f

Signed-off-by: Sanket Kale <[email protected]>

Fixed some missing conditions

82e09f8

Signed-off-by: Sanket Kale <[email protected]>

fixed dco test

f7d2182

Signed-off-by: Sanket Kale <[email protected]>

Rebased and resolved merge conflicts

9853844

Signed-off-by: Sanket Kale <[email protected]>

Updated documentation

7c94120

Signed-off-by: Sanket Kale <[email protected]>

Added FP16 compatibility and updated torch version

9b2cb73

Signed-off-by: Sanket Kale <[email protected]>

Resolved merge conflicts

64f63d5

Signed-off-by: Sanket Kale <[email protected]>

Resolve failed formatting checks

dea09ba

Signed-off-by: Sanket Kale <[email protected]>

Changed flag name and modified compile flag declaration

f1bf96a

Signed-off-by: Sanket Kale <[email protected]>

sanketkaleoss force-pushed the vllm_arm_enablement branch from 73726e1 to f1bf96a Compare November 20, 2024 05:42

mergify bot removed the needs-rebase label Nov 20, 2024

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2024

sanketkaleoss and others added 2 commits November 22, 2024 14:07

Merge branch 'main' into vllm_arm_enablement

a380312

Merge branch 'main' into vllm_arm_enablement

9463d4f

mgoin approved these changes Nov 25, 2024

View reviewed changes

youkaichao merged commit a6760f6 into vllm-project:main Nov 26, 2024
71 of 73 checks passed

This was referenced Nov 26, 2024

Does vllm support the Mac/Metal/MPS? #1441

Closed

[Feature]: add macos installation script #10658

Open

cdoern mentioned this pull request Dec 11, 2024

vLLM support for safe tensors on macOS instructlab/instructlab#2068

Closed


		namespace vec_op {

		// FIXME: FP16 is not fully supported in Torch-CPU

	message(STATUS "ARMv8 architecture detected")
	message(STATUS "ARMv8 or later architecture detected")

[Feature] vLLM ARM Enablement for AARCH64 CPUs #9228

[Feature] vLLM ARM Enablement for AARCH64 CPUs #9228

Conversation

sanketkaleoss commented Oct 10, 2024 • edited by github-actions bot Loading

github-actions bot commented Oct 10, 2024

sanketkaleoss commented Oct 22, 2024 • edited Loading

mgoin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanketkaleoss commented Oct 22, 2024

youkaichao commented Oct 24, 2024

sanketkaleoss commented Oct 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caijijuhe commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanketkaleoss Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

mergify bot commented Nov 6, 2024

sanketkaleoss commented Nov 12, 2024

mergify bot commented Nov 17, 2024

sanketkaleoss commented Nov 18, 2024

ShawnD200 commented Nov 18, 2024

sanketkaleoss commented Nov 18, 2024

sanketkaleoss commented Nov 19, 2024

ShawnD200 commented Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Nov 19, 2024

sanketkaleoss commented Nov 20, 2024

mgoin left a comment

Choose a reason for hiding this comment

sanketkaleoss commented Nov 26, 2024

animalnots commented Dec 5, 2024

sidharthrajaram commented Dec 13, 2024

animalnots commented Dec 14, 2024

sanketkaleoss commented Dec 16, 2024

sanketkaleoss commented Oct 10, 2024 •

edited by github-actions bot

Loading

sanketkaleoss commented Oct 22, 2024 •

edited

Loading

sanketkaleoss Nov 7, 2024 •

edited

Loading

ShawnD200 commented Nov 19, 2024 •

edited

Loading