v3.0.0
✨ node-llama-cpp
3.0 is here! ✨
Read about the release in the blog post
3.0.0 (2024-09-24)
Features
- function calling (#139) (5fcdf9b)
- get embedding for text (#144) (4cf1fba)
- async model and context loading (#178) (315a3eb)
- token biases (#196) (3ad4494)
- automatic batching (#104) (4757af8)
- prompt completion engine (#225) (95f4645)
- model compatibility warnings (#225) (95f4645)
- Vulkan support (#171) (d161bcd)
- Windows on Arm prebuilt binary (#181) (f3b7f81)
- change the default log level to warn (#191) (b542b53)
pull
command (#214) (453c162)inspect gpu
command (#175) (5a70576)inspect gguf
command (#182) (35e6f50)inspect estimate
command (#309) (4b3ad61)inspect measure
command (#182) (35e6f50)init
command to scaffold a new project from a template (withnode-typescript
andelectron-typescript-react
templates) (#217) (d6a0f43)- move
download
,build
andclear
commands to be subcommands of asource
command (#309) (4b3ad61) - move
seed
option to the prompt level (#309) (4b3ad61) TemplateChatWrapper
: custom history template for each message role (#309) (4b3ad61)- Llama 3.1 support (#273) (e3e0994)
- Mistral chat wrapper (#309) (4b3ad61)
- Functionary v3 support (#309) (4b3ad61)
- Phi-3 support (#273) (e3e0994)
- extract all prebuilt binaries to external modules (#309) (4b3ad61)
- parallel function calling (#225) (95f4645)
- preload prompt (#225) (95f4645)
onTextChunk
option (#273) (e3e0994)- flash attention (#264) (c2e322c)
- debug mode (#217) (d6a0f43)
- load LoRA adapters (#217) (d6a0f43)
- split gguf files support (#214) (453c162)
stopOnAbortSignal
andcustomStopTriggers
onLlamaChat
andLlamaChatSession
(#214) (453c162)- Llama 3 support (#205) (ef501f9)
--gpu
flag in generation CLI commands (#205) (ef501f9)specialTokens
parameter onmodel.detokenize
(#205) (ef501f9)- interactively select a model from CLI commands (#191) (b542b53)
- automatically adapt to current free VRAM state (#182) (35e6f50)
- GGUF file metadata info on
LlamaModel
(#182) (35e6f50) - use the
tokenizer.chat_template
header from thegguf
file when available - use it to find a better specialized chat wrapper or useJinjaTemplateChatWrapper
with it as a fallback (#182) (35e6f50) - simplify generation CLI commands:
chat
,complete
,infill
(#182) (35e6f50) - gguf parser (#168) (bcaab4f)
- use the best compute layer available by default (#175) (5a70576)
- more guardrails to prevent loading an incompatible prebuilt binary (#175) (5a70576)
- completion and infill (#164) (ede69c1)
- support configuring more options for
getLlama
when using"lastBuild"
(#164) (ede69c1) - get VRAM state (#161) (46235a2)
chatWrapper
getter on aLlamaChatSession
(#161) (46235a2)- minP support (#162) (47b476f)
- chat syntax aware context shifting (#139) (5fcdf9b)
- stateless
LlamaChat
(#139) (5fcdf9b) LlamaText
util (#139) (5fcdf9b)- show
llama.cpp
release in GitHub releases (#142) (36c779d) - model metadata overrides (#273) (e3e0994)
Shipped with llama.cpp
release b3808
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)