Releases · MarcusDunn/llama.cpp

31 Mar 22:09

c50a82c

b2582 Latest

Latest

readme : update hot topics

Assets 18

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-03-31T22:09:30Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-03-31T22:09:37Z
llama-b1-bin-macos-arm64.zip

37.2 MB 2024-03-31T22:09:46Z
llama-b1-bin-macos-x64.zip

41.9 MB 2024-03-31T22:09:47Z
llama-b2582-bin-win-arm64-x64.zip

4.33 MB 2024-03-31T22:09:49Z
llama-b2582-bin-win-avx-x64.zip

4.84 MB 2024-03-31T22:09:50Z
llama-b2582-bin-win-avx2-x64.zip

4.81 MB 2024-03-31T22:09:50Z
llama-b2582-bin-win-avx512-x64.zip

4.82 MB 2024-03-31T22:09:51Z
llama-b2582-bin-win-clblast-x64.zip

6.01 MB 2024-03-31T22:09:52Z
llama-b2582-bin-win-cuda-cu11.7.1-x64.zip

30.9 MB 2024-03-31T22:09:52Z
Source code (zip)

2024-03-31T08:56:30Z
Source code (tar.gz)

2024-03-31T08:56:30Z

28 Feb 18:15

github-actions

b2293

08c5ee8

b2293

llama : remove deprecated API (#5770)

ggml-ci

Assets 14

26 Jan 17:16

github-actions

b1978

5f1925a

b1978

scripts : move run-with-preset.py from root to scripts folder

Assets 12

21 Dec 21:44

github-actions

b1680

afefa31

b1680

ggml : change ggml_scale to take a float instead of tensor (#4573)

* ggml : change ggml_scale to take a float instead of tensor

* ggml : fix CPU implementation

* tests : fix test-grad0

ggml-ci

Assets 12

21 Dec 00:13

github-actions

b1663

799fc22

b1663

CUDA: Faster Mixtral prompt processing (#4538)

* CUDA: make MoE tensors contiguous for batch size>1

* Update ggml-cuda.cu

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

Assets 12

19 Dec 23:14

github-actions

b1662

328b83d

b1662

ggml : fixed check for _MSC_VER (#4535)

Co-authored-by: Eric Sommerlade <[email protected]>

Assets 12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: MarcusDunn/llama.cpp

b2582

b2293

b1978

b1680

b1663

b1662