[VLM] Support caching in merged multi-modal processor #11396

DarkLight1337 · 2024-12-21T08:00:54Z

Compared to #11341, this PR moves the caching-related logic (e.g. checking if image is in the cache, and merging it with the processed outputs) outside of the main loop of applying the HF processor.

To enable this, the merged multi-modal processor for each model needs to define the "schema" of the output MultiModalKwargs (in BaseMultiModelProcessor._get_mm_fields_config), which defines how to obtain the kwargs corresponding to each item (MultiModalField.build_items) from the HF outputs, and also how to merge the kwargs of newly processed items with the cached results (MultiModalField.reduce).

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2024-12-21T08:01:04Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2024-12-24T09:55:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2024-12-24T10:56:28Z

docs/source/models/supported_models.md

@@ -748,8 +748,7 @@ vLLM currently only supports adding LoRA to the language backbone of multimodal
 ```

 ```{note}
-To use {code}`TIGER-Lab/Mantis-8B-siglip-llama3`, you have to install their GitHub repo ({code}`pip install git+https://github.com/TIGER-AI-Lab/Mantis.git`)
-and pass {code}`--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
+To use {code}`TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass {code}`--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.


We now use HF's LlavaProcessor + our own prompt replacements to replicate the logic of MLlavaProcessor, so users don't have to install their GitHub anymore.

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 28 commits December 19, 2024 17:42

Refactor multi-modal processor to support caching

faa9b84

Signed-off-by: DarkLight1337 <[email protected]>

Clean up

9711a15

Signed-off-by: DarkLight1337 <[email protected]>

Fix cached result being mutated

29e3fcd

Signed-off-by: DarkLight1337 <[email protected]>

Rename

ab64e85

Signed-off-by: DarkLight1337 <[email protected]>

Fix docs

81215a2

Signed-off-by: DarkLight1337 <[email protected]>

Fix a typo

cf52b3b

Signed-off-by: DarkLight1337 <[email protected]>

Fix unhandled sampling rate in initialization

a4a8eb9

Signed-off-by: DarkLight1337 <[email protected]>

format

c48f7c5

Signed-off-by: DarkLight1337 <[email protected]>

Change the delimiter

b84ff42

Signed-off-by: DarkLight1337 <[email protected]>

Fix extra dimension

c3f1bde

Signed-off-by: DarkLight1337 <[email protected]>

Update

32e5197

Signed-off-by: DarkLight1337 <[email protected]>

Use the inner processor to enable fine-grained caching

7264d4e

Signed-off-by: DarkLight1337 <[email protected]>

Make the cache optional

02ea829

Signed-off-by: DarkLight1337 <[email protected]>

Fix invalid kwargs being passed to tokenizer

b981a9d

Signed-off-by: DarkLight1337 <[email protected]>

Fix Phi3V prompt replacement

5dde7d0

Signed-off-by: DarkLight1337 <[email protected]>

Refine

7339ab8

Signed-off-by: DarkLight1337 <[email protected]>

Enable fine-grained caching for audio models

509411d

Signed-off-by: DarkLight1337 <[email protected]>

Add fallback

c0454f5

Signed-off-by: DarkLight1337 <[email protected]>

Fix typo

d50ef03

Signed-off-by: DarkLight1337 <[email protected]>

Fix video processor for Qwen2-VL

81f7d61

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into mm-processor-cache

13eede3

Fix a bunch of type errors

affbc5c

Signed-off-by: DarkLight1337 <[email protected]>

Fix qwen2-vl

b4ddfb1

Signed-off-by: DarkLight1337 <[email protected]>

Fix

4b3db32

Signed-off-by: DarkLight1337 <[email protected]>

Simplify Pixtral-HF

dafbc7f

Signed-off-by: DarkLight1337 <[email protected]>

Cleanup

38aaff8

Signed-off-by: DarkLight1337 <[email protected]>

Fix Pixtral-HF

5fcb5d6

Signed-off-by: DarkLight1337 <[email protected]>

Enable caching outside the processing loop

f86e148

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from ywang96 December 21, 2024 08:00

This was referenced Dec 23, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

[Usage]: vLLM model service crashes when adding OpenAI-API-compatible model in dify， model id: Qwen/Qwen2-VL-7B-Instruct #11154

Closed

Try to fix Phi3V and Ultravox

ea6f8b5

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the mm-fields branch from bbd5997 to ea6f8b5 Compare December 23, 2024 14:04

DarkLight1337 added 3 commits December 23, 2024 14:05

Remove benchmark

10ae755

Signed-off-by: DarkLight1337 <[email protected]>

Fix token mismatch in Phi3V and Ultravox

85c5e2c

Signed-off-by: DarkLight1337 <[email protected]>

Update max image tokens

4873ff8

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the mm-fields branch from 6337c08 to a679c5b Compare December 23, 2024 17:48

Strictly check the number of placeholder tokens

4dbb5a3

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the mm-fields branch from a679c5b to 4dbb5a3 Compare December 23, 2024 17:52

DarkLight1337 added 4 commits December 23, 2024 18:21

Fix doc failure

6dbae81

Signed-off-by: DarkLight1337 <[email protected]>

Test and fix Mantis processor

fb51c9b

Signed-off-by: DarkLight1337 <[email protected]>

Fix embedding inputs

91cbd63

Signed-off-by: DarkLight1337 <[email protected]>

Update entrypoints tests

6bee6ba

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from robertgshaw2-neuralmagic and simon-mo as code owners December 24, 2024 05:49

mergify bot added the ci/build label Dec 24, 2024

DarkLight1337 added 4 commits December 24, 2024 08:20

Merge branch 'main' into mm-fields

cfa2ce8

Clean up

fa54292

Signed-off-by: DarkLight1337 <[email protected]>

Avoid extra placeholder in phi3v

cbf79be

Signed-off-by: DarkLight1337 <[email protected]>

Fix OOM

9cd38b1

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot added the needs-rebase label Dec 24, 2024

Fix mantis processor

14dcdd5

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the mm-fields branch from 64996bb to 14dcdd5 Compare December 24, 2024 10:48

Merge branch 'main' into mm-fields

b8bd2d4

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Dec 24, 2024

DarkLight1337 commented Dec 24, 2024

View reviewed changes

DarkLight1337 added 2 commits December 24, 2024 11:00

Remove redundant code

5045d93

Signed-off-by: DarkLight1337 <[email protected]>

Still need Mantis repo for testing

4cac998

Signed-off-by: DarkLight1337 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Support caching in merged multi-modal processor #11396

[VLM] Support caching in merged multi-modal processor #11396

DarkLight1337 commented Dec 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 21, 2024

mergify bot commented Dec 24, 2024

DarkLight1337 Dec 24, 2024 •

edited

Loading

[VLM] Support caching in merged multi-modal processor #11396

Are you sure you want to change the base?

[VLM] Support caching in merged multi-modal processor #11396

Conversation

DarkLight1337 commented Dec 21, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 21, 2024

mergify bot commented Dec 24, 2024

DarkLight1337 Dec 24, 2024 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 commented Dec 21, 2024 •

edited by github-actions bot

Loading

DarkLight1337 Dec 24, 2024 •

edited

Loading