-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: LoRA support for qwen2-vl Models #11255
Comments
vllm supports LoRA for multimodal models, but it only supports adding LoRA adapter to language backbone. The quickest approach would be to just only add LoRA to the language backbone and retrain it. |
the inference effect of the LoRA adapter is poor. Is it because the LoRA wasn't added to the vision backbone? |
Yep, could you share your lora configuration? I want to double check it |
Is it adapter_config.json?
|
It looks like your lora was indeed only added to the LLM. Is there a big difference in your results? |
Yep, the inference results differ significantly! I want to use qwen2-vl to perform OCR on the image to recognize the characters. So, I guess it's because the LoRA wasn't added to the vision backbone, and I might not need the language backbone that much; the vision backbone is the key. |
Your actual lora was added to the visual backbone, and when using vllm for inference, you found that the results had big differences compared to the merged model? |
yes! |
@jeejeelee hi~ Do you have any thoughts on adapting the vision backbone as well? |
Yes, currently vLLM only supports adding LoRA to the language backbone. |
No, we haven't. We did some lora experiments before and found that for VL models, adapting vision backbone didn't show significant benefits. Though this might be due to our limited experiments |
I understand. Thanks for sharing your findings and the insights from your LoRA experiments. While it's disappointing the vision backbone adaptation didn't yield significant benefits in your tests, it's valuable data nonetheless. Perhaps with further research and more extensive experiments, different approaches might prove fruitful in the future. |
🚀 The feature, motivation and pitch
I fine-tuned a qwen2-vl-7b model using llama factory, deployed it with AsyncLLMEngine, and loaded the LoRA adapter using lora_request. However, the inference results are significantly worse compared to the merged model.
It would be great if we can have the support for LoRA for multimodal models as our team wants to use multiple LoRAs and merging the LoRA adapters to original model weights is not feasible for us. We are short on time for this project and as far as I can tell no other framework supports LoRA in this way. Also we need outlines for structured generation so vLLM (being the most user friendly, stable and mature framework ) is our best bet now. Can we get a timeline when will this be supported ? Also are there any workarounds possible until this feature is officially supported ?
Thank you for your adaptation.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: