-
Added support for gemma-2 and mistral-nemo.
-
Added multiple gpu support. Don't set those 3 params if you have just 1 gpu.
--main-gpu 0
- set main gpu id with kv-cache: 0, 1, ...
--split-mode none
- none
or layer
. split-mode tensor is not supported
--tensor-split 0.5,0.5
- how to split layers or tensors per gpus, array of floats.
- Added instruct mode with presets. It is optional and experimental. There are still some bugs.
--instruct-preset gemma
where gemma is the name of the file \instruct_presets\gemma.json
Instruct mode helps to make responses longer and smarter. You can find correct instruct-preset for each model at the model card on huggingface or in sillytavern - formatting - instruct mode sequences.
Example dialogue in assisttant.txt should also be formatted using instruct mode tags. I added gemma and mistral instruct presets. And added some bats to run gemma and nemo in instruct mode.
- Added
-debug
to print whole context dialogue after each LLM response. Useful to see if there's something wrong with formatting.