add support for 4bit inference #47

iMountTai · 2023-08-02T07:20:06Z

Description

This PR adds support for 4-bit inference, which can effectively reduce the VRAM. The VRAM after loading the model is as follows (in Tesla P40) :

load_in_4bit	load_in_8bit	fp16
4999M	7989M	13759M

Usage
Just add the --load_in_4bit to the launching command. For example:

python scripts/inference/inference_hf.py \
    --base_model path_to_merged_llama2_or_alpaca2_hf_dir \
    --with_prompt \
    --load_in_4bit \
    --interactive

Related Issue

None.

airaria · 2023-08-02T07:46:02Z

Update requirements.txt
Since this is a breaking change, please make sure all the occurrences of load_in_8bit in Wiki and scripts are replaced with load_in_kbit 8
Can load_in_kbit be added to openai_api_server.py and openai_api_server_vllm.py?

iMountTai · 2023-08-02T09:12:07Z

@airaria

The requirements and wiki have been updated, and the load_in_8bit setting has been retained.
load_in_kbit has been added to openai_api_server.py. openai_api_server_vllm.py does not surpport load_in_kbit

airaria · 2023-08-02T09:34:40Z

TODO: Update wiki w.r.t. this PR. @iMountTai

iMountTai added 2 commits August 2, 2023 15:03

add support for 4bit inference

0cccfbe

modify requirements about bitsandbytes==0.41.0

881003e

iMountTai requested a review from airaria August 2, 2023 07:26

Reserve load_in_8bit argument

045de90

iMountTai and others added 4 commits August 2, 2023 17:22

modify message information

5dc6f68

Update gradio_demo.py

ea3ee07

Update inference_hf.py

4f0f514

Update openai_api_server.py

d66efde

airaria approved these changes Aug 2, 2023

View reviewed changes

ymcui merged commit 7b3e5ab into ymcui:main Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for 4bit inference #47

add support for 4bit inference #47

iMountTai commented Aug 2, 2023 •

edited

Loading

airaria commented Aug 2, 2023

iMountTai commented Aug 2, 2023 •

edited

Loading

airaria commented Aug 2, 2023

add support for 4bit inference #47

add support for 4bit inference #47

Conversation

iMountTai commented Aug 2, 2023 • edited Loading

Description

Related Issue

airaria commented Aug 2, 2023

iMountTai commented Aug 2, 2023 • edited Loading

airaria commented Aug 2, 2023

iMountTai commented Aug 2, 2023 •

edited

Loading

iMountTai commented Aug 2, 2023 •

edited

Loading