Skip to content

0.2.0

Latest
Compare
Choose a tag to compare
@Mozer Mozer released this 21 Jul 18:44
· 2 commits to master since this release
  1. Added support for gemma-2 and mistral-nemo.

  2. Added multiple gpu support. Don't set those 3 params if you have just 1 gpu.

--main-gpu 0 - set main gpu id with kv-cache: 0, 1, ...
--split-mode none - none or layer. split-mode tensor is not supported
--tensor-split 0.5,0.5 - how to split layers or tensors per gpus, array of floats.

  1. Added instruct mode with presets. It is optional and experimental. There are still some bugs.

--instruct-preset gemma where gemma is the name of the file \instruct_presets\gemma.json

Instruct mode helps to make responses longer and smarter. You can find correct instruct-preset for each model at the model card on huggingface or in sillytavern - formatting - instruct mode sequences.

Example dialogue in assisttant.txt should also be formatted using instruct mode tags. I added gemma and mistral instruct presets. And added some bats to run gemma and nemo in instruct mode.

  1. Added -debug to print whole context dialogue after each LLM response. Useful to see if there's something wrong with formatting.