ggml : do not use ARM features not included in the build #10457

slaren · 2024-11-22T19:24:54Z

gustrd · 2024-12-03T17:30:10Z

Excuse me @slaren , is it possible to do the same for the Android ARM build? I observed the same regression with q4_0_4_4 at the Snapdragon8G1.

slaren · 2024-12-04T00:21:36Z

I am not aware of any issues with Android, this should work in the same way with all the ARM platforms.

gustrd · 2024-12-04T02:18:54Z

I recently built the latest version of llama.cpp on Android and noticed a significant slowdown when using the q4_0_4_4 quantization format. It seems this slowdown occurs because the new version automatically converts q4_0_4_4 to the q4_0_4_8 format.

However, when I use the new IQ4_NL format, performance remains fast. From what I can tell, the automatic conversion in this case transforms it back into into a similar to q4_0_4_4.

slaren · 2024-12-04T02:28:17Z

This is not correct, only Q4_0 (and IQ4_NL) is converted to other types.

gustrd · 2024-12-04T02:43:39Z

You're absolutely right! When I run with q4_0, it gets repacked into the correct format, and the token generation speed is consistent.

I'm not sure why q4_0_4_4 has become slower, but it doesn’t seem like a major issue since it’s no longer necessary. Perhaps adding a deprecation warning for q4_0_4_4 could help clarify this for users.

That said, I did notice some performance loss during prompt processing compared to the old version using q4_0_4_4. I plan to benchmark this further and open a separate issue to provide more details. Thanks again for your hard work and patience—it’s greatly appreciated!

…0457)

ggml : do not use ARM features not included in the build

c7db5b2

ggerganov approved these changes Nov 23, 2024

View reviewed changes

slaren merged commit 55ed008 into master Nov 23, 2024
55 checks passed

slaren deleted the sl/fix-arm-features branch November 23, 2024 13:41

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

ggml : do not use ARM features not included in the build (ggerganov#1…

50a43a6

…0457)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : do not use ARM features not included in the build #10457

ggml : do not use ARM features not included in the build #10457

slaren commented Nov 22, 2024

gustrd commented Dec 3, 2024

slaren commented Dec 4, 2024

gustrd commented Dec 4, 2024

slaren commented Dec 4, 2024 •

edited

Loading

gustrd commented Dec 4, 2024

ggml : do not use ARM features not included in the build #10457

ggml : do not use ARM features not included in the build #10457

Conversation

slaren commented Nov 22, 2024

gustrd commented Dec 3, 2024

slaren commented Dec 4, 2024

gustrd commented Dec 4, 2024

slaren commented Dec 4, 2024 • edited Loading

gustrd commented Dec 4, 2024

slaren commented Dec 4, 2024 •

edited

Loading