Add the Command R chat format #1382

euxoa · 2024-04-25T10:22:06Z

No description provided.

CISC · 2024-04-25T12:02:51Z

This should not strictly be necessary as recent GGUFs have the chat format embedded (which will be automatically applied through Jinja2ChatFormatter), I've submitted a request in older repos on HF to be updated (and many of them have already done so).

If you have an outdated GGUF and don't wish to redownload it you can update your local file using the gguf-new-metadata.py script in llama.cpp/gguf-py/scripts and the latest Command R tokenizer_config.json from HF:

python gguf-new-metadata.py input.gguf output.gguf --chat-template-config tokenizer_config.json

uncodecomplexsystems · 2024-04-25T13:23:42Z

@CISC
There are some arguments however:

As you said yourself, there is a lot (the vast majority to be honest) of GGUFs that don't have this yet
lama-cpp-python already offers a lot of chat formats. llama-cpp also introduced the command-r chat format. As Command-R (Plus) is currently the most capable open models (or tie with llama3) I think it makes a lot of reason to merge this.
It's just a minor merge to an existing function.
Would really help a lot of people.

As soon as more GGUFs have the formats embedded the situation changes. But right I now this merge would just be super helpful. The model is a powerhouse for the open weights community.

Merge would be <3 <3 <3

CISC · 2024-04-25T15:53:07Z

@uncodecomplexsystems As you say, it's just a minor merge, I'm not opposed to it, I'm just saying it's not strictly necessary. :)

euxoa · 2024-04-25T18:00:29Z

If you have an outdated GGUF and don't wish to redownload it you can update your local file [...]

Thanks, I didn't know that!

I have various GGUFs for Qwen-1.5, Command R, and Llama 3's, and the automatic setup of the chat format looks like this:

>>> for mname in model_names:
...    llm = Llama(f"llms/{mname}", n_gpu_layers=-1, logits_all=False, n_ctx=4096, verbose=False)
...    print(mname, llm.chat_format)
... 
c4ai-command-r-v01-Q5_K_M.gguf llama-2
Meta-Llama-3-8B-Instruct.Q5_K_M.gguf None
Meta-Llama-3-70B-Instruct.Q3_K_M.gguf llama-3
qwen1_5-14b-chat-q4_k_m.gguf chatml
qwen1_5-32b-chat-q4_k_m.gguf None
qwen1_5-72b-chat-q3_k_m.gguf chatml
mixtral-instruct-8x7b-q4k-medium.gguf mistral-instruct

I thought those with None were fails, but do they actually get their chat format correctly from the template?

And confusingly, Command R kind of works with the chatml format and probably even with the default llama-2 format, but then in tests suffers from poorer prompt following, and oddly sometimes outputs tags in place of named entities.

CISC · 2024-04-25T19:14:17Z

I thought those with None were fails, but do they actually get their chat format correctly from the template?

Yes, None means it found an embedded template (that is not recognized as any specific template, enable verbose and it will output the full template), if no template can be guessed or found it will fall back to llama-2, see llama.py.

uncodecomplexsystems · 2024-04-29T10:39:48Z

Based on the inactivity both in this PR and the phi3 one I suppose your stance @abetlen is to not merge any more new chat templates into llama-cpp-python, right?
I think it's important to know. Thx!

CISC · 2024-04-29T10:57:56Z

@uncodecomplexsystems Patience, I'm sure there's just a lot going on (here or elsewhere) right now.

khimaros · 2024-04-29T17:24:35Z

it's worth noting that llama.cpp/examples/server now has an OpenAI API compatible endpoint its own chat template handling, which i believe is based on the llama_chat_template_apply() API in llama.cpp. there are a few PRs and issues seeking a more general solution:

ggerganov/llama.cpp#6822
ggerganov/llama.cpp#6834
ggerganov/llama.cpp#4216
ggerganov/llama.cpp#6726
ggerganov/llama.cpp#5922
ggerganov/llama.cpp#6391

Add the Command R chat format

eae6dc9

jmtatsch mentioned this pull request Apr 30, 2024

Add command-r support like llamacpp has #1279

Open

hanishkvc mentioned this pull request May 4, 2024

Generic Chat templating code with text/json file based config; main chat updated to drive its in-prefix, in-suffix and reverse-prompt from same; chat-apply-template equivalent c-api to allow use by other codes also ggerganov/llama.cpp#6834

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the Command R chat format #1382

Add the Command R chat format #1382

euxoa commented Apr 25, 2024

CISC commented Apr 25, 2024

uncodecomplexsystems commented Apr 25, 2024

CISC commented Apr 25, 2024

euxoa commented Apr 25, 2024 •

edited

Loading

CISC commented Apr 25, 2024 •

edited

Loading

uncodecomplexsystems commented Apr 29, 2024

CISC commented Apr 29, 2024

khimaros commented Apr 29, 2024 •

edited

Loading

Add the Command R chat format #1382

Are you sure you want to change the base?

Add the Command R chat format #1382

Conversation

euxoa commented Apr 25, 2024

CISC commented Apr 25, 2024

uncodecomplexsystems commented Apr 25, 2024

CISC commented Apr 25, 2024

euxoa commented Apr 25, 2024 • edited Loading

CISC commented Apr 25, 2024 • edited Loading

uncodecomplexsystems commented Apr 29, 2024

CISC commented Apr 29, 2024

khimaros commented Apr 29, 2024 • edited Loading

euxoa commented Apr 25, 2024 •

edited

Loading

CISC commented Apr 25, 2024 •

edited

Loading

khimaros commented Apr 29, 2024 •

edited

Loading