Releases · ggerganov/llama.cpp

22 Jan 17:45

12c2bdf

b4529 Latest

Latest

server : fix draft context not being released (#11354)

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-22T17:45:35Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-22T17:45:43Z
llama-b4529-bin-macos-arm64.zip

19.8 MB 2025-01-22T17:45:52Z
llama-b4529-bin-macos-x64.zip

21.3 MB 2025-01-22T17:45:53Z
llama-b4529-bin-ubuntu-x64.zip

23.2 MB 2025-01-22T17:45:54Z
llama-b4529-bin-win-avx-x64.zip

13.8 MB 2025-01-22T17:45:55Z
llama-b4529-bin-win-avx2-x64.zip

13.8 MB 2025-01-22T17:45:56Z
llama-b4529-bin-win-avx512-x64.zip

13.9 MB 2025-01-22T17:45:57Z
llama-b4529-bin-win-cuda-cu11.7-x64.zip

152 MB 2025-01-22T17:45:58Z
llama-b4529-bin-win-cuda-cu12.4-x64.zip

151 MB 2025-01-22T17:46:02Z
Source code (zip)

2025-01-22T16:44:40Z
Source code (tar.gz)

2025-01-22T16:44:40Z

22 Jan 17:14

github-actions

b4528

c64d2be

b4528

`minja`: sync at https://github.com/google/minja/commit/0f5f7f2b3770e…

Assets 23

22 Jan 12:34

github-actions

b4527

96f4053

b4527

Adding logprobs to /v1/completions (#11344)

Signed-off-by: Jiri Podivin <[email protected]>

Assets 23

22 Jan 10:40

github-actions

b4526

a94f3b2

b4526

`common`: utils to split / join / repeat strings (from json converter…

Assets 23

22 Jan 08:15

github-actions

b4525

3e3357f

b4525

llava : support Minicpm-omni (#11289)

* init

* add readme

* update readme

* no use make

* update readme

* update fix code

* fix editorconfig-checker

* no change convert py

* use clip_image_u8_free

Assets 23

21 Jan 15:00

github-actions

b4524

6171c9d

b4524

Add Jinja template support (#11016)

* Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcbfbd4e71229736640322b31c7f9

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to https://github.com/google/minja/commit/b8437df626ac6cd0ce3b333b3c74ed1129c19f25

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 23

21 Jan 14:59

github-actions

b4523

e28245f

b4523

export-lora : fix tok_embd tensor (#11330)

Assets 23

21 Jan 14:49

github-actions

b4522

6da5bec

b4522

rpc : better caching of the base buffer pointer (#11331)

There is no need to use map, just store the base pointer in the buffer
context.

Assets 23

21 Jan 10:04

github-actions

b4521

2e2f8f0

b4521

linenoise.cpp refactoring (#11301)

More RAII mainly

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

21 Jan 07:29

github-actions

b4520

2139667

b4520

metal : fix out-of-bounds write (#11314)

ggml-ci

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4529

b4528

b4527

b4526

b4525

b4524

b4523

b4522

b4521

b4520