Adding quantization_weights_path for fp8 weights #57

charlifu · 2024-06-19T20:42:39Z

Current fp8 gemm computation is using quantization_param_path to load quantized fp8 weights, but this arg is also being use by fp8 kv cache. This PR adds another arg named quantization_weights_path for the quantized safetensor file to load the fp8 weights.

gshtras · 2024-06-19T20:53:09Z

Looks good, just rename to quantized_weights_path please

charlifu added 3 commits June 19, 2024 20:22

add quantization_weights_path for fp8 weights

a9be7c9

fix lint

a7971a4

fix lint

f088187

charlifu requested a review from gshtras June 19, 2024 20:48

charlifu added 2 commits June 19, 2024 20:55

change to quantized_weights_path

3857ba2

fix lint

c745722

charlifu merged commit 93aab3c into main Jun 19, 2024
13 checks passed

charlifu deleted the charlifu/fix_arg_for_quant_param branch August 5, 2024 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding quantization_weights_path for fp8 weights #57

Adding quantization_weights_path for fp8 weights #57

charlifu commented Jun 19, 2024 •

edited

Loading

gshtras commented Jun 19, 2024

Adding quantization_weights_path for fp8 weights #57

Adding quantization_weights_path for fp8 weights #57

Conversation

charlifu commented Jun 19, 2024 • edited Loading

gshtras commented Jun 19, 2024

charlifu commented Jun 19, 2024 •

edited

Loading