[Usage]: vLLM model service crashes when adding OpenAI-API-compatible model in dify， model id: Qwen/Qwen2-VL-7B-Instruct #11154

jackathere · 2024-12-13T01:45:15Z

Your current environment

vLLM API server version 0.6.4.post2
docker vllm-cpu-env
model "Qwen/Qwen2-VL-7B-Instruct"

How would you like to use vllm

vLLM model service crashes when adding OpenAI-API-compatible model in dify，
model id: Qwen/Qwen2-VL-7B-Instruct

Error message：

INFO 12-13 01:12:19 engine.py:267] Added request chatcmpl-093873b3235846bfb6500cd5807b39be.
ERROR 12-13 01:12:19 engine.py:135] RuntimeError("shape '[0, -1, 128]' is invalid for input of size 71680")
ERROR 12-13 01:12:19 engine.py:135] Traceback (most recent call last):
ERROR 12-13 01:12:19 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 12-13 01:12:19 engine.py:135] self.run_engine_loop()
......

ERROR 12-13 01:12:19 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 01:12:19 engine.py:135] return forward_call(*args, **kwargs)
ERROR 12-13 01:12:19 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 825, in forward
ERROR 12-13 01:12:19 engine.py:135] query = query.view(num_tokens, -1, self.head_size)
ERROR 12-13 01:12:19 engine.py:135] RuntimeError: shape '[0, -1, 128]' is invalid for input of size 71680
CRITICAL 12-13 01:12:19 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 192.168.0.200:37164 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

sidharthrajaram · 2024-12-13T01:58:43Z

Encountering a similar issue when running vLLM ARM container with Qwen2-VL-2B-Instruct:

ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 175, in forward
ERROR 12-13 01:47:18 engine.py:135]     q, k = self.rotary_emb(positions, q, k)
ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 01:47:18 engine.py:135]     return self._call_impl(*args, **kwargs)
ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 01:47:18 engine.py:135]     return forward_call(*args, **kwargs)
ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 825, in forward
ERROR 12-13 01:47:18 engine.py:135]     query = query.view(num_tokens, -1, self.head_size)
ERROR 12-13 01:47:18 engine.py:135] RuntimeError: shape '[0, -1, 128]' is invalid for input of size 38400

axel7083 · 2024-12-13T17:34:40Z

I also tried with the https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct model and facing the same error on x86 cpu inside a podman container

�INFO 12-13 17:31:41 chat_utils.py:331] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
�INFO 12-13 17:31:41 logger.py:37] Received request chatcmpl-5b79409428c54c9083f984162792c2f1: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the capital of France?<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=32742, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 12-13 17:31:41 engine.py:267] Added request chatcmpl-5b79409428c54c9083f984162792c2f1.
CRITICAL 12-13 17:31:42 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     10.88.0.1:43240 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 12-13 17:31:42 engine.py:135] RuntimeError("shape '[0, -1, 128]' is invalid for input of size 39936")
ERROR 12-13 17:31:42 engine.py:135] Traceback (most recent call last):
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 12-13 17:31:42 engine.py:135]     self.run_engine_loop()
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 196, in run_engine_loop
ERROR 12-13 17:31:42 engine.py:135]     request_outputs = self.engine_step()
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 214, in engine_step
ERROR 12-13 17:31:42 engine.py:135]     raise e
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 205, in engine_step
ERROR 12-13 17:31:42 engine.py:135]     return self.engine.step()
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1405, in step
ERROR 12-13 17:31:42 engine.py:135]     outputs = self.model_executor.execute_model(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/cpu_executor.py", line 201, in execute_model
ERROR 12-13 17:31:42 engine.py:135]     output = self.driver_method_invoker(self.driver_worker,
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/cpu_executor.py", line 298, in _driver_method_invoker
ERROR 12-13 17:31:42 engine.py:135]     return getattr(driver, method)(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 343, in execute_model
ERROR 12-13 17:31:42 engine.py:135]     output = self.model_runner.execute_model(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 12-13 17:31:42 engine.py:135]     return func(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/cpu_model_runner.py", line 532, in execute_model
ERROR 12-13 17:31:42 engine.py:135]     hidden_states = model_executable(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 1372, in forward
ERROR 12-13 17:31:42 engine.py:135]     hidden_states = self.language_model.model(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 168, in __call__
ERROR 12-13 17:31:42 engine.py:135]     return self.forward(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 340, in forward
ERROR 12-13 17:31:42 engine.py:135]     hidden_states, residual = layer(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 247, in forward
ERROR 12-13 17:31:42 engine.py:135]     hidden_states = self.self_attn(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 175, in forward
ERROR 12-13 17:31:42 engine.py:135]     q, k = self.rotary_emb(positions, q, k)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 825, in forward
ERROR 12-13 17:31:42 engine.py:135]     query = query.view(num_tokens, -1, self.head_size)
ERROR 12-13 17:31:42 engine.py:135] RuntimeError: shape '[0, -1, 128]' is invalid for input of size 39936
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

DarkLight1337 · 2024-12-23T11:12:39Z

Sorry for missing this, can you post an example image that results in this error?

cc @Isotr0py

axel7083 · 2024-12-23T11:50:47Z

Sorry for missing this, can you post an example image that results in this error?

quay.io/rh-ee-astefani/vllm:cpu-1734105797

$: podman run \
 -v $HF_HUB_CACHE/models--Qwen--Qwen2-VL-2B-Instruct:/cache/models--Qwen--Qwen2-VL-2B-Instruct \
 quay.io/rh-ee-astefani/vllm:cpu-1734105797 \ 
 --model=/cache/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/47592516d3e709cd9c194715bc76902241c5edea

DarkLight1337 · 2024-12-23T11:55:07Z

I don't have podman, can you just upload the image here?

Edit: by image I mean the image that's being passed into the model, not the image of the container.

axel7083 · 2024-12-23T12:05:22Z

Edit: by image I mean the image that's being passed into the model, not the image of the container.

Oh yeah I tried without image first and that was were the error was happening

curl --location 'http://localhost:46717/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "/cache/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/47592516d3e709cd9c194715bc76902241c5edea",
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "What is the capital of France?",
      "role": "user"
    }
  ]
}'

ℹ️ Using an image it works fine

DarkLight1337 · 2024-12-23T14:03:57Z

I think this should be solved in #11396, please try it out.

Isotr0py · 2024-12-23T14:14:01Z

Seems that this is related to the input_position in cpu_model_runner, because it still uses mrope_position for text-only inputs.

Isotr0py · 2024-12-23T14:34:22Z

This should be fixed by #11434, please have a try :)

jackathere added the usage How to use vllm label Dec 13, 2024

sidharthrajaram mentioned this issue Dec 13, 2024

[Feature] vLLM ARM Enablement for AARCH64 CPUs #9228

Merged

Isotr0py mentioned this issue Dec 23, 2024

[Bugfix][Hardware][CPU] Fix CPU input_positions creation for text-only inputs with mrope #11434

Merged

Isotr0py closed this as completed in #11434 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: vLLM model service crashes when adding OpenAI-API-compatible model in dify， model id: Qwen/Qwen2-VL-7B-Instruct #11154

[Usage]: vLLM model service crashes when adding OpenAI-API-compatible model in dify， model id: Qwen/Qwen2-VL-7B-Instruct #11154

jackathere commented Dec 13, 2024

sidharthrajaram commented Dec 13, 2024

axel7083 commented Dec 13, 2024

DarkLight1337 commented Dec 23, 2024 •

edited

Loading

axel7083 commented Dec 23, 2024

DarkLight1337 commented Dec 23, 2024 •

edited

Loading

axel7083 commented Dec 23, 2024 •

edited

Loading

DarkLight1337 commented Dec 23, 2024

Isotr0py commented Dec 23, 2024

Isotr0py commented Dec 23, 2024

[Usage]: vLLM model service crashes when adding OpenAI-API-compatible model in dify， model id: Qwen/Qwen2-VL-7B-Instruct #11154

[Usage]: vLLM model service crashes when adding OpenAI-API-compatible model in dify， model id: Qwen/Qwen2-VL-7B-Instruct #11154

Comments

jackathere commented Dec 13, 2024

Your current environment

How would you like to use vllm

Before submitting a new issue...

sidharthrajaram commented Dec 13, 2024

axel7083 commented Dec 13, 2024

DarkLight1337 commented Dec 23, 2024 • edited Loading

axel7083 commented Dec 23, 2024

DarkLight1337 commented Dec 23, 2024 • edited Loading

axel7083 commented Dec 23, 2024 • edited Loading

DarkLight1337 commented Dec 23, 2024

Isotr0py commented Dec 23, 2024

Isotr0py commented Dec 23, 2024

DarkLight1337 commented Dec 23, 2024 •

edited

Loading

DarkLight1337 commented Dec 23, 2024 •

edited

Loading

axel7083 commented Dec 23, 2024 •

edited

Loading