llama.cpp: update GGUF models (with imatrix) #510

ymcui · 2024-01-23T05:47:36Z

Description

Recently, llama.cpp introduced importance matrix-aware quantization, which yields further improvements on PPL.
Before quantization, the important matrices are calculated through imatrix application. We use Chinese segmentation training data PKU, and iterate over 100 batches to obtain the imatrix.

During quantization, specify --imatrix with the generated imatrix file to allow im-aware quantization. Note that the process will be longer than without imatrix.

Currently, we have converted all available models (only for K-quants). You can download them directly from our Hugging Face model hub. The model name with -im suffix represents the newly converted im-aware models. These models can be used directly without further actions.

The followings are several benchmarks (PPL). Generally speaking, im-quantized models are better but not always.

Chinese-Alpaca-2-7B-RLHF-GGUF

Quant	original	imatrix (`-im`)
Q2_K	10.5211 +/- 0.14139	11.9331 +/- 0.16168
Q3_K	8.9748 +/- 0.12043	8.8238 +/- 0.11850
Q4_0	8.7843 +/- 0.11854	-
Q4_K	8.4643 +/- 0.11341	8.4226 +/- 0.11302
Q5_0	8.4563 +/- 0.11353	-
Q5_K	8.3722 +/- 0.11236	8.3336 +/- 0.11192
Q6_K	8.3207 +/- 0.11184	8.3047 +/- 0.11159
Q8_0	8.3100 +/- 0.11173	-

Chinese-LLaMA-2-13B-GGUF

Quant	original	imatrix (`-im`)
Q2_K	14.4701 +/- 0.26107	17.4275 +/- 0.31909
Q3_K	10.1620 +/- 0.18277	9.7486 +/- 0.17744
Q4_0	9.8633 +/- 0.17792	-
Q4_K	9.2735 +/- 0.16793	9.2734 +/- 0.16792
Q5_0	9.3553 +/- 0.16945	-
Q5_K	9.1767 +/- 0.16634	9.1594 +/- 0.16590
Q6_K	9.1326 +/- 0.16546	9.1478 +/- 0.16583
Q8_0	9.1394 +/- 0.16574	-

Related Issue

None.

update gguf models (with imatrix)

44fcfc8

ymcui merged commit ae46bcd into main Jan 23, 2024
1 check passed

ymcui deleted the imatrix branch January 23, 2024 05:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp: update GGUF models (with imatrix) #510

llama.cpp: update GGUF models (with imatrix) #510

ymcui commented Jan 23, 2024

llama.cpp: update GGUF models (with imatrix) #510

llama.cpp: update GGUF models (with imatrix) #510

Conversation

ymcui commented Jan 23, 2024

Description

Chinese-Alpaca-2-7B-RLHF-GGUF

Chinese-LLaMA-2-13B-GGUF

Related Issue