Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: vLLM model service crashes when adding OpenAI-API-compatible model in dify, model id: Qwen/Qwen2-VL-7B-Instruct #11154

Closed
1 task done
jackathere opened this issue Dec 13, 2024 · 9 comments · Fixed by #11434
Labels
usage How to use vllm

Comments

@jackathere
Copy link

Your current environment

vLLM API server version 0.6.4.post2
docker vllm-cpu-env
model "Qwen/Qwen2-VL-7B-Instruct"

How would you like to use vllm

vLLM model service crashes when adding OpenAI-API-compatible model in dify,
model id: Qwen/Qwen2-VL-7B-Instruct


Error message:

INFO 12-13 01:12:19 engine.py:267] Added request chatcmpl-093873b3235846bfb6500cd5807b39be.
ERROR 12-13 01:12:19 engine.py:135] RuntimeError("shape '[0, -1, 128]' is invalid for input of size 71680")
ERROR 12-13 01:12:19 engine.py:135] Traceback (most recent call last):
ERROR 12-13 01:12:19 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 12-13 01:12:19 engine.py:135] self.run_engine_loop()
......

ERROR 12-13 01:12:19 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 01:12:19 engine.py:135] return forward_call(*args, **kwargs)
ERROR 12-13 01:12:19 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 825, in forward
ERROR 12-13 01:12:19 engine.py:135] query = query.view(num_tokens, -1, self.head_size)
ERROR 12-13 01:12:19 engine.py:135] RuntimeError: shape '[0, -1, 128]' is invalid for input of size 71680
CRITICAL 12-13 01:12:19 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 192.168.0.200:37164 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@jackathere jackathere added the usage How to use vllm label Dec 13, 2024
@sidharthrajaram
Copy link

Encountering a similar issue when running vLLM ARM container with Qwen2-VL-2B-Instruct:

ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 175, in forward
ERROR 12-13 01:47:18 engine.py:135]     q, k = self.rotary_emb(positions, q, k)
ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 01:47:18 engine.py:135]     return self._call_impl(*args, **kwargs)
ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 01:47:18 engine.py:135]     return forward_call(*args, **kwargs)
ERROR 12-13 01:47:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 825, in forward
ERROR 12-13 01:47:18 engine.py:135]     query = query.view(num_tokens, -1, self.head_size)
ERROR 12-13 01:47:18 engine.py:135] RuntimeError: shape '[0, -1, 128]' is invalid for input of size 38400

@axel7083
Copy link

I also tried with the https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct model and facing the same error on x86 cpu inside a podman container

�INFO 12-13 17:31:41 chat_utils.py:331] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
�INFO 12-13 17:31:41 logger.py:37] Received request chatcmpl-5b79409428c54c9083f984162792c2f1: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the capital of France?<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=32742, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 12-13 17:31:41 engine.py:267] Added request chatcmpl-5b79409428c54c9083f984162792c2f1.
CRITICAL 12-13 17:31:42 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     10.88.0.1:43240 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 12-13 17:31:42 engine.py:135] RuntimeError("shape '[0, -1, 128]' is invalid for input of size 39936")
ERROR 12-13 17:31:42 engine.py:135] Traceback (most recent call last):
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 12-13 17:31:42 engine.py:135]     self.run_engine_loop()
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 196, in run_engine_loop
ERROR 12-13 17:31:42 engine.py:135]     request_outputs = self.engine_step()
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 214, in engine_step
ERROR 12-13 17:31:42 engine.py:135]     raise e
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 205, in engine_step
ERROR 12-13 17:31:42 engine.py:135]     return self.engine.step()
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1405, in step
ERROR 12-13 17:31:42 engine.py:135]     outputs = self.model_executor.execute_model(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/cpu_executor.py", line 201, in execute_model
ERROR 12-13 17:31:42 engine.py:135]     output = self.driver_method_invoker(self.driver_worker,
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/cpu_executor.py", line 298, in _driver_method_invoker
ERROR 12-13 17:31:42 engine.py:135]     return getattr(driver, method)(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 343, in execute_model
ERROR 12-13 17:31:42 engine.py:135]     output = self.model_runner.execute_model(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 12-13 17:31:42 engine.py:135]     return func(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/cpu_model_runner.py", line 532, in execute_model
ERROR 12-13 17:31:42 engine.py:135]     hidden_states = model_executable(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 1372, in forward
ERROR 12-13 17:31:42 engine.py:135]     hidden_states = self.language_model.model(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 168, in __call__
ERROR 12-13 17:31:42 engine.py:135]     return self.forward(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 340, in forward
ERROR 12-13 17:31:42 engine.py:135]     hidden_states, residual = layer(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 247, in forward
ERROR 12-13 17:31:42 engine.py:135]     hidden_states = self.self_attn(
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 175, in forward
ERROR 12-13 17:31:42 engine.py:135]     q, k = self.rotary_emb(positions, q, k)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 12-13 17:31:42 engine.py:135]     return self._call_impl(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 12-13 17:31:42 engine.py:135]     return forward_call(*args, **kwargs)
�ERROR 12-13 17:31:42 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 825, in forward
ERROR 12-13 17:31:42 engine.py:135]     query = query.view(num_tokens, -1, self.head_size)
ERROR 12-13 17:31:42 engine.py:135] RuntimeError: shape '[0, -1, 128]' is invalid for input of size 39936
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 23, 2024

Sorry for missing this, can you post an example image that results in this error?

cc @Isotr0py

@axel7083
Copy link

Sorry for missing this, can you post an example image that results in this error?

quay.io/rh-ee-astefani/vllm:cpu-1734105797

$: podman run \
 -v $HF_HUB_CACHE/models--Qwen--Qwen2-VL-2B-Instruct:/cache/models--Qwen--Qwen2-VL-2B-Instruct \
 quay.io/rh-ee-astefani/vllm:cpu-1734105797 \ 
 --model=/cache/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/47592516d3e709cd9c194715bc76902241c5edea

@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 23, 2024

I don't have podman, can you just upload the image here?

Edit: by image I mean the image that's being passed into the model, not the image of the container.

@axel7083
Copy link

axel7083 commented Dec 23, 2024

Edit: by image I mean the image that's being passed into the model, not the image of the container.

Oh yeah I tried without image first and that was were the error was happening

curl --location 'http://localhost:46717/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "/cache/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/47592516d3e709cd9c194715bc76902241c5edea",
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "What is the capital of France?",
      "role": "user"
    }
  ]
}'

ℹ️ Using an image it works fine

@DarkLight1337
Copy link
Member

I think this should be solved in #11396, please try it out.

@Isotr0py
Copy link
Collaborator

Seems that this is related to the input_position in cpu_model_runner, because it still uses mrope_position for text-only inputs.

@Isotr0py
Copy link
Collaborator

This should be fixed by #11434, please have a try :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
5 participants