3.0.0-beta.45 (2024-09-19)

improve performance of parallel evaluation from multiple contexts (#309) (4b3ad61)
Llama 3.1 chat wrapper standard chat history (#309) (4b3ad61)
adapt to llama.cpp sampling refactor (#309) (4b3ad61)
Llama 3 Instruct function calling (#309) (4b3ad61)
don't preload prompt in the chat command when using --printTimings or --meter (#309) (4b3ad61)
more stable Jinja template matching (#309) (4b3ad61)

inspect estimate command (#309) (4b3ad61)
move seed option to the prompt level (#309) (4b3ad61)
Functionary v3 support (#309) (4b3ad61)
Mistral chat wrapper (#309) (4b3ad61)
improve Llama 3.1 chat template detection (#309) (4b3ad61)
change autoDisposeSequence default to false (#309) (4b3ad61)
move download, build and clear commands to be subcommands of a source command (#309) (4b3ad61)
simplify TokenBias (#309) (4b3ad61)
better threads default value (#309) (4b3ad61)
make LlamaEmbedding an object (#309) (4b3ad61)
HF_TOKEN env var support for reading GGUF file metadata (#309) (4b3ad61)
TemplateChatWrapper: custom history template for each message role (#309) (4b3ad61)
more helpful inspect gpu command (#309) (4b3ad61)
all tokenizer tokens iterator (#309) (4b3ad61)
failed context creation automatic remedy (#309) (4b3ad61)
abort generation support in CLI commands (#309) (4b3ad61)
--gpuLayers max and --contextSize max flag support for inspect estimate command (#309) (4b3ad61)
extract all prebuilt binaries to external modules (#309) (4b3ad61)
updated docs (#309) (4b3ad61)
combine model downloaders (#309) (4b3ad61)
feat(electron example template): update badge, scroll anchoring, table support (#309) (4b3ad61)

Shipped with llama.cpp release b3785

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

Provide feedback