Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: LoRA support for qwen2-vl Models #11255

Open
1 task done
xlg-go opened this issue Dec 17, 2024 · 12 comments
Open
1 task done

[Feature]: LoRA support for qwen2-vl Models #11255

xlg-go opened this issue Dec 17, 2024 · 12 comments

Comments

@xlg-go
Copy link

xlg-go commented Dec 17, 2024

🚀 The feature, motivation and pitch

I fine-tuned a qwen2-vl-7b model using llama factory, deployed it with AsyncLLMEngine, and loaded the LoRA adapter using lora_request. However, the inference results are significantly worse compared to the merged model.

image

It would be great if we can have the support for LoRA for multimodal models as our team wants to use multiple LoRAs and merging the LoRA adapters to original model weights is not feasible for us. We are short on time for this project and as far as I can tell no other framework supports LoRA in this way. Also we need outlines for structured generation so vLLM (being the most user friendly, stable and mature framework ) is our best bet now. Can we get a timeline when will this be supported ? Also are there any workarounds possible until this feature is officially supported ?

Thank you for your adaptation.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@jeejeelee
Copy link
Collaborator

vllm supports LoRA for multimodal models, but it only supports adding LoRA adapter to language backbone. The quickest approach would be to just only add LoRA to the language backbone and retrain it.

@xlg-go
Copy link
Author

xlg-go commented Dec 17, 2024

vllm supports LoRA for multimodal models, but it only supports adding LoRA adapter to language backbone. The quickest approach would be to just only add LoRA to the language backbone and retrain it.

the inference effect of the LoRA adapter is poor. Is it because the LoRA wasn't added to the vision backbone?

@jeejeelee
Copy link
Collaborator

Yep, could you share your lora configuration? I want to double check it

@xlg-go
Copy link
Author

xlg-go commented Dec 17, 2024

Yep, could you share your lora configuration? I want to double check it

Is it adapter_config.json?
engine_args: max_lora_rank=32, enable_lora=True

{
  "alpha_pattern": {},
  "auto_mapping": null,
  "base_model_name_or_path": "./ms_cache/hub/Qwen/Qwen2-VL-7B-Instruct",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layer_replication": null,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 32,
  "lora_dropout": 0.15,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "rank_pattern": {},
  "revision": null,
  "target_modules": "^(?!.*visual).*(?:o_proj|up_proj|v_proj|down_proj|k_proj|q_proj|gate_proj).*",
  "task_type": "CAUSAL_LM",
  "use_dora": false,
  "use_rslora": false
}

@jeejeelee
Copy link
Collaborator

It looks like your lora was indeed only added to the LLM. Is there a big difference in your results?

@xlg-go
Copy link
Author

xlg-go commented Dec 17, 2024

It looks like your lora was indeed only added to the LLM. Is there a big difference in your results?

Yep, the inference results differ significantly! I want to use qwen2-vl to perform OCR on the image to recognize the characters.

So, I guess it's because the LoRA wasn't added to the vision backbone, and I might not need the language backbone that much; the vision backbone is the key.

@jeejeelee
Copy link
Collaborator

Your actual lora was added to the visual backbone, and when using vllm for inference, you found that the results had big differences compared to the merged model?

@xlg-go
Copy link
Author

xlg-go commented Dec 17, 2024

Your actual lora was added to the visual backbone, and when using vllm for inference, you found that the results had big differences compared to the merged model?

yes!
wait a moment!!
Doesn't vLLM currently only support adding LoRA to the language backbone?

@xlg-go
Copy link
Author

xlg-go commented Dec 18, 2024

@jeejeelee hi~ Do you have any thoughts on adapting the vision backbone as well?

@jeejeelee
Copy link
Collaborator

Doesn't vLLM currently only support adding LoRA to the language backbone?

Yes, currently vLLM only supports adding LoRA to the language backbone.

@jeejeelee
Copy link
Collaborator

@jeejeelee hi~ Do you have any thoughts on adapting the vision backbone as well?

No, we haven't. We did some lora experiments before and found that for VL models, adapting vision backbone didn't show significant benefits. Though this might be due to our limited experiments

@xlg-go
Copy link
Author

xlg-go commented Dec 18, 2024

I understand. Thanks for sharing your findings and the insights from your LoRA experiments. While it's disappointing the vision backbone adaptation didn't yield significant benefits in your tests, it's valuable data nonetheless. Perhaps with further research and more extensive experiments, different approaches might prove fruitful in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants