ggml : add option for controlling work distribution across threads #291

ggerganov · 2023-06-25T09:06:48Z

And comment: ggerganov/llama.cpp#1507 (comment)

I guess we can extend ggml to be able to choose work chunk distribution method - either at compile time, or via a context parameter. We can factor out the range selections from the ggml forward implementations to make implementation more concise and extensible in the future

Another thing to be investigated is the usage of sched_yield() and potentially making it user configurable:

ggerganov/whisper.cpp@09a6325

The text was updated successfully, but these errors were encountered:

IsaacDynamo · 2023-09-16T16:10:54Z

Making this configurable would also be nice for the cuBLAS backend. When the whole model fits on the GPU, increasing the number of threads doesn't improve token/sec eval time.

But it does increase the CPU load on the system due to the busy loop. Even with n_thread = 1 , I suspect that a lot of CPU cycles are wasted in the busy loop.

So a yield flag would be a great addition to give the user control.

A busy-loop with a fallback to a yield might also be a good 'automatic' solution, that could be used as default.

ggerganov added refactoring Refactoring performance Speed related topics labels Jun 25, 2023

ggerganov added this to ggml : roadmap Jun 25, 2023

ggerganov moved this to Todo in ggml : roadmap Jun 25, 2023

ggerganov mentioned this issue Jun 26, 2023

ggml : add macros for accessing tensors to reduce code duplication #292

Closed

mqy mentioned this issue Jun 27, 2023

Example work stealing chunked task allocator for issue #291 ggerganov/llama.cpp#2026

Draft

ggerganov mentioned this issue Jul 9, 2023

Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage ggerganov/llama.cpp#1632

Closed

ggerganov mentioned this issue Sep 28, 2023

sync : ggml (Metal F32 support + reduce ggml-alloc size) ggerganov/llama.cpp#3192

Merged

ggerganov mentioned this issue Jan 3, 2024

ggml : do not sched_yield when calling BLAS ggerganov/llama.cpp#4761

Merged

ggerganov closed this as completed in ggerganov/llama.cpp#4761 Jan 5, 2024

ggerganov moved this from Todo to Done in ggml : roadmap Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : add option for controlling work distribution across threads #291

ggml : add option for controlling work distribution across threads #291

ggerganov commented Jun 25, 2023 •

edited

Loading

IsaacDynamo commented Sep 16, 2023

ggml : add option for controlling work distribution across threads #291

ggml : add option for controlling work distribution across threads #291

Comments

ggerganov commented Jun 25, 2023 • edited Loading

IsaacDynamo commented Sep 16, 2023

ggerganov commented Jun 25, 2023 •

edited

Loading