llama : add check for KV cache shifts #10401

ggerganov · 2024-11-19T10:02:05Z

Disallow context shifts for models that do not support it (such as DeepSeek V2). Add

bool llama_kv_cache_can_shift(struct llama_context * ctx);

ggml-ci

slaren · 2024-11-19T11:52:58Z

src/llama.cpp

+bool llama_kv_cache_can_shift(struct llama_context * ctx) {
+    return ctx->model.arch != LLM_ARCH_DEEPSEEK2; // not supported due to MLA
+}


Should this return false for recurrent models as well? Not sure what's the logic there, but llama_kv_cache_update_internal silently ignores models with LLAMA_ROPE_TYPE_NONE.

Yes, it's likely needed to return false for recurrent models.

The reason to do nothing in llama_kv_cache_update_internal when the rope type is none is because when we apply shifts to the KV cache using functions like llama_kv_cache_seq_add(), we do 2 things:

Update the positions of the KV cells - i.e. just modify the meta data in llama_kv_cell

Re-rope the data in the KV cells

The later step is necessary only if the data is roped. For ALiBi models for example, we should not apply this second step, but in theory we still support "shifting" the KV cache for those models, since the positional information is in the KQ mask.

ggml-ci

llama : add check for KV cache shifts

c0f1bb3

ggml-ci

danbev approved these changes Nov 19, 2024

View reviewed changes

llama : restore comment [no ci]

029e609

ggerganov merged commit 8e752a7 into master Nov 19, 2024
1 check passed

ggerganov deleted the gg/llama-can-shift branch November 19, 2024 11:29

slaren reviewed Nov 19, 2024

View reviewed changes

ggerganov mentioned this pull request Nov 19, 2024

llama : handle KV shift for recurrent models #10402

Merged

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

llama : add check for KV cache shifts (ggerganov#10401)

a9069ec

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add check for KV cache shifts #10401

llama : add check for KV cache shifts #10401

ggerganov commented Nov 19, 2024

slaren Nov 19, 2024

ggerganov Nov 19, 2024

llama : add check for KV cache shifts #10401

llama : add check for KV cache shifts #10401

Conversation

ggerganov commented Nov 19, 2024

slaren Nov 19, 2024

Choose a reason for hiding this comment

ggerganov Nov 19, 2024

Choose a reason for hiding this comment