Skip to content

AnyLearning v0.1.41 - Context handling improvements

Latest
Compare
Choose a tag to compare
@vietanhdev vietanhdev released this 08 Dec 05:07
· 9 commits to main since this release
Llama.Assistant.RAG.Demo.-.1080p.mp4

🔧 Changes

  • Utilize llama cpp KV cache mechanism to make faster inference. See (1)
  • Summarize the chat history when it is about to exceeds the context length
  • Recursive check and update missing setting from te DEFAULT CONFIG
  • Add validators (type, min, max value) for input fields in the setting dialog

(1) llama cpp's KV cache check prefix of your chat history to reuse the K-V cache. For example:

  • Generated sequence so far = "ABCDEF"
  • If we modify the chat history somehow like: "ABCDXT". Then it matches prefix and reuses the cache for "ABCD" and newly computes the Key and Value vectors for "XT", then generates new responses.
    -> So we need to make the most of this mechanism by keep the history prefix as fixed as possible