AnyLearning v0.1.41 - Context handling improvements

Latest

Latest

vietanhdev released this 08 Dec 05:07

· 9 commits to main since this release

32eac5f

Llama.Assistant.RAG.Demo.-.1080p.mp4

🔧 Changes

Utilize llama cpp KV cache mechanism to make faster inference. See (1)
Summarize the chat history when it is about to exceeds the context length
Recursive check and update missing setting from te DEFAULT CONFIG
Add validators (type, min, max value) for input fields in the setting dialog

(1) llama cpp's KV cache check prefix of your chat history to reuse the K-V cache. For example:

Generated sequence so far = "ABCDEF"
If we modify the chat history somehow like: "ABCDXT". Then it matches prefix and reuses the cache for "ABCD" and newly computes the Key and Value vectors for "XT", then generates new responses.
-> So we need to make the most of this mechanism by keep the history prefix as fixed as possible

Assets 4