Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Command R chat format #1382

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add the Command R chat format #1382

wants to merge 1 commit into from

Conversation

euxoa
Copy link

@euxoa euxoa commented Apr 25, 2024

No description provided.

@CISC
Copy link
Contributor

CISC commented Apr 25, 2024

This should not strictly be necessary as recent GGUFs have the chat format embedded (which will be automatically applied through Jinja2ChatFormatter), I've submitted a request in older repos on HF to be updated (and many of them have already done so).

If you have an outdated GGUF and don't wish to redownload it you can update your local file using the gguf-new-metadata.py script in llama.cpp/gguf-py/scripts and the latest Command R tokenizer_config.json from HF:

python gguf-new-metadata.py input.gguf output.gguf --chat-template-config tokenizer_config.json

@uncodecomplexsystems
Copy link

@CISC
There are some arguments however:

  • As you said yourself, there is a lot (the vast majority to be honest) of GGUFs that don't have this yet
  • lama-cpp-python already offers a lot of chat formats. llama-cpp also introduced the command-r chat format. As Command-R (Plus) is currently the most capable open models (or tie with llama3) I think it makes a lot of reason to merge this.
  • It's just a minor merge to an existing function.
  • Would really help a lot of people.

As soon as more GGUFs have the formats embedded the situation changes. But right I now this merge would just be super helpful. The model is a powerhouse for the open weights community.

Merge would be <3 <3 <3

@CISC
Copy link
Contributor

CISC commented Apr 25, 2024

@uncodecomplexsystems As you say, it's just a minor merge, I'm not opposed to it, I'm just saying it's not strictly necessary. :)

@euxoa
Copy link
Author

euxoa commented Apr 25, 2024

If you have an outdated GGUF and don't wish to redownload it you can update your local file [...]

Thanks, I didn't know that!

I have various GGUFs for Qwen-1.5, Command R, and Llama 3's, and the automatic setup of the chat format looks like this:

>>> for mname in model_names:
...    llm = Llama(f"llms/{mname}", n_gpu_layers=-1, logits_all=False, n_ctx=4096, verbose=False)
...    print(mname, llm.chat_format)
... 
c4ai-command-r-v01-Q5_K_M.gguf llama-2
Meta-Llama-3-8B-Instruct.Q5_K_M.gguf None
Meta-Llama-3-70B-Instruct.Q3_K_M.gguf llama-3
qwen1_5-14b-chat-q4_k_m.gguf chatml
qwen1_5-32b-chat-q4_k_m.gguf None
qwen1_5-72b-chat-q3_k_m.gguf chatml
mixtral-instruct-8x7b-q4k-medium.gguf mistral-instruct

I thought those with None were fails, but do they actually get their chat format correctly from the template?

And confusingly, Command R kind of works with the chatml format and probably even with the default llama-2 format, but then in tests suffers from poorer prompt following, and oddly sometimes outputs tags in place of named entities.

@CISC
Copy link
Contributor

CISC commented Apr 25, 2024

I thought those with None were fails, but do they actually get their chat format correctly from the template?

Yes, None means it found an embedded template (that is not recognized as any specific template, enable verbose and it will output the full template), if no template can be guessed or found it will fall back to llama-2, see llama.py.

@uncodecomplexsystems
Copy link

Based on the inactivity both in this PR and the phi3 one I suppose your stance @abetlen is to not merge any more new chat templates into llama-cpp-python, right?
I think it's important to know. Thx!

@CISC
Copy link
Contributor

CISC commented Apr 29, 2024

@uncodecomplexsystems Patience, I'm sure there's just a lot going on (here or elsewhere) right now.

@khimaros
Copy link
Contributor

khimaros commented Apr 29, 2024

it's worth noting that llama.cpp/examples/server now has an OpenAI API compatible endpoint its own chat template handling, which i believe is based on the llama_chat_template_apply() API in llama.cpp. there are a few PRs and issues seeking a more general solution:

ggerganov/llama.cpp#6822
ggerganov/llama.cpp#6834
ggerganov/llama.cpp#4216
ggerganov/llama.cpp#6726
ggerganov/llama.cpp#5922
ggerganov/llama.cpp#6391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants