Skip to content

Releases: airockchip/rknn-llm

release-v1.1.2

05 Nov 07:41
Compare
Choose a tag to compare
  • Fix inference error in chatglm3 model
  • Fix inference issue with embedding input
  • Support exporting llm model in MiniCPMV

release-v1.1.1

18 Oct 10:17
Compare
Choose a tag to compare
  • Fixed the inference error in the minicpm3 mode
  • Fixed the runtime error in rkllm_server_demo.
  • Added the rkllm-toolkit installation package for Python 3.10.
  • Supported gguf model conversion when tie_word_embeddings is set to true.

release-v1.1.0

11 Oct 08:53
Compare
Choose a tag to compare
  • Added support for grouped quantization (w4a16 group sizes of 32/64/128, w8a8 group sizes of 128/256/512).
  • Added gdq algorithm to improve 4-bit quantization accuracy.
  • Added hybrid quantization algorithm, supporting a combination of grouped and non-grouped quantization based on specified ratios.
  • Added support for Llama3, Gemma2, and Minicpm3 models.
  • Added support for gguf model conversion (currently supports q4_0 and fp16 only).
  • Added support for LoRa models.
  • Added storage and loading of prompt cache
  • Added PC-side emulation accuracy testing and inference interface support for rkllm-toolkit.
  • Fixed catastrophic forgetting issue when the token count exceeds max_context.
  • Optimized prefill speed.
  • Optimized generate speed.
  • Optimized model initialization time
  • Added support for four input interfaces: prompt, embedding, token, and multimodal.

release-v1.0.1

09 May 09:37
Compare
Choose a tag to compare
  • Optimize model conversion memory occupation
  • Optimize inference memory occupation
  • Increase prefill speed
  • Reduce initialization time
  • Improve quantization accuracy
  • Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
  • Add Server invocation
  • Add inference interruption interface
  • Add logprob and token_id to the return value