Releases: MarcusDunn/llama.cpp
Releases · MarcusDunn/llama.cpp
b2582
b2293
llama : remove deprecated API (#5770) ggml-ci
b1978
scripts : move run-with-preset.py from root to scripts folder
b1680
ggml : change ggml_scale to take a float instead of tensor (#4573) * ggml : change ggml_scale to take a float instead of tensor * ggml : fix CPU implementation * tests : fix test-grad0 ggml-ci
b1663
CUDA: Faster Mixtral prompt processing (#4538) * CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
b1662
ggml : fixed check for _MSC_VER (#4535) Co-authored-by: Eric Sommerlade <[email protected]>