Releases · airockchip/rknn-llm · GitHub

05 Nov 07:41

yhcvb

release-v1.1.2 Latest

Latest

Fix inference error in chatglm3 model
Fix inference issue with embedding input
Support exporting llm model in MiniCPMV

Assets 2

18 Oct 10:17

yhcvb

release-v1.1.1

Fixed the inference error in the minicpm3 mode
Fixed the runtime error in rkllm_server_demo.
Added the rkllm-toolkit installation package for Python 3.10.
Supported gguf model conversion when tie_word_embeddings is set to true.

Assets 2

11 Oct 08:53

yhcvb

release-v1.1.0

Added support for grouped quantization (w4a16 group sizes of 32/64/128, w8a8 group sizes of 128/256/512).
Added gdq algorithm to improve 4-bit quantization accuracy.
Added hybrid quantization algorithm, supporting a combination of grouped and non-grouped quantization based on specified ratios.
Added support for Llama3, Gemma2, and Minicpm3 models.
Added support for gguf model conversion (currently supports q4_0 and fp16 only).
Added support for LoRa models.
Added storage and loading of prompt cache
Added PC-side emulation accuracy testing and inference interface support for rkllm-toolkit.
Fixed catastrophic forgetting issue when the token count exceeds max_context.
Optimized prefill speed.
Optimized generate speed.
Optimized model initialization time
Added support for four input interfaces: prompt, embedding, token, and multimodal.

Assets 2

09 May 09:37

yhcvb

release-v1.0.1

Optimize model conversion memory occupation
Optimize inference memory occupation
Increase prefill speed
Reduce initialization time
Improve quantization accuracy
Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
Add Server invocation
Add inference interruption interface
Add logprob and token_id to the return value

Assets 2