Skip to content

Releases: ggerganov/llama.cpp

b2948

20 May 13:44
6bf9b66
Compare
Choose a tag to compare
[SYCL] Update SYCL upscale operation (#7321)

* Update SYCL upscale operation

* Formatting

* Remove messages

b2946

20 May 11:08
213e90e
Compare
Choose a tag to compare
ggml-opencl, llama: using reserve() if count already known (#7272)

b2945

20 May 10:34
65c5820
Compare
Choose a tag to compare
ggml : add loongarch lsx and lasx support (#6454)

* add loongarch lsx and lasx optimize code

* Add loongarch compilation support to makefile

* revert stb_image.h

* opt bytes_from_nibbles_32 and sum_i16_pairs_float

* fix undeclared

* format code

* update

* update 2

---------

Co-authored-by: Jinyang He <[email protected]>

b2943

20 May 07:08
e932094
Compare
Choose a tag to compare
server : return error on too large embedding input (#7389)

b2941

20 May 04:33
33c8d50
Compare
Choose a tag to compare
Add provisions for windows support for BF16 code including CMake prov…

b2940

20 May 00:29
d359f30
Compare
Choose a tag to compare
llama : remove MPI backend (#7395)

b2939

19 May 23:34
1ea2a00
Compare
Choose a tag to compare
quantize : fix --keep-split check (#7374)

b2938

19 May 23:15
f030ec1
Compare
Choose a tag to compare
Vulkan Embedding Fix (#7360)

* Fix empty Vulkan host buffers

Add fp32 fp16 matmul shader

Fix matmul shader alignment

* Remove deprecated tensor->backend uses

* Fix Vulkan validation errors on embedding models with no offloaded layers

* Fix Vulkan llava segfault when not offloading layers

b2937

19 May 22:19
e4e6f67
Compare
Choose a tag to compare
ggml : fix another case of quants nans (#7387)

b2936

19 May 22:19
5ca49cb
Compare
Choose a tag to compare
ggml: implement quantized KV cache for FA (#7372)