Release 0.2.0 · Mozer/talk-llama-fast

Added support for gemma-2 and mistral-nemo.
Added multiple gpu support. Don't set those 3 params if you have just 1 gpu.

--main-gpu 0 - set main gpu id with kv-cache: 0, 1, ...
--split-mode none - none or layer. split-mode tensor is not supported
--tensor-split 0.5,0.5 - how to split layers or tensors per gpus, array of floats.

Added instruct mode with presets. It is optional and experimental. There are still some bugs.

--instruct-preset gemma where gemma is the name of the file \instruct_presets\gemma.json

Instruct mode helps to make responses longer and smarter. You can find correct instruct-preset for each model at the model card on huggingface or in sillytavern - formatting - instruct mode sequences.

Example dialogue in assisttant.txt should also be formatted using instruct mode tags. I added gemma and mistral instruct presets. And added some bats to run gemma and nemo in instruct mode.

Added -debug to print whole context dialogue after each LLM response. Useful to see if there's something wrong with formatting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.2.0