-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Jinja template support #11016
Add Jinja template support #11016
Conversation
Feel free to add the option to llama-run for basic testing also @ochafik |
|
||
namespace minja { | ||
|
||
class chat_template { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea to be able to #include "chat-template.hpp"
in main is to forward declare json
here without #include <json.hpp>
, only define the prototype of class chat_template
here. Then we will need a new file chat-template.cpp
that hold the actual implementation, including #include <json.hpp>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not sure if this even works, but we can do in another PR, just noting my idea here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to keep minja header-only for now, but happy to explore options as follow up :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work, thanks! Let's wait for @ggerganov to do another pass, then I think it's good to go!
@@ -4,22 +4,26 @@ | |||
|
|||
server = ServerPreset.tinyllama2() | |||
|
|||
|
|||
@pytest.fixture(scope="module", autouse=True) | |||
@pytest.fixture(autouse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exceptionally good at pytest
so maybe I'm missing something. Could you explain why scope="module"
is removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope=module was making the ServerProcess server instance shared between all the test in the module (file). Even though it's stopped in stop_server_after_each_test, it carried previous settings over to subsequent tests, spilling server flags over (became more important w/ jinja & chat_template params)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for the explanation. Seem like module=scope is not what I wanted. I want the fixture to only affect single file, since the idea is that one test unit uses one model
] | ||
) | ||
def test_chat_completion(model, system_prompt, user_prompt, max_tokens, re_content, n_prompt, n_predicted, finish_reason): | ||
def test_chat_completion(model, system_prompt, user_prompt, max_tokens, re_content, n_prompt, n_predicted, finish_reason, jinja, chat_template): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: we can also add a "slow" test that can test tool call with a big model like Hermes or Qwen (see an example in test_infill.py
). I'll have a look in the next few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe absolutely, this is coming in #9639 or another subpart of it (tool call parsing + conditional grammars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I wonder if there's any reason to override the LLAMA_CACHE to tmp in server tests? I've been struggling with disk space on my MBP 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly to provide a way to isolate tests if user have multiple clones of llama.cpp source code on the machine. Maybe you can symlink that tmp directory to an external storage ?
…llama_vocab::impl::token_get_attr)
common/minja.hpp
Outdated
static std::string normalize_newlines(const std::string & s) { | ||
#ifdef _WIN32 | ||
static const std::regex nl_regex("\r\n"); | ||
return std::regex_replace(s, nl_regex, "\n"); | ||
#else | ||
return s; | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what was the original purpose for this, but I think it can be removed, as well as the definition of ENDL
to \r\n
in win32. It shouldn't make a difference with stringstream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped ENDL + 1 usage of this function (at end of rendering; one is still needed to shield the parser from CRLFs), thanks!
Small thing to note is that some jinja templates are not "linear", meaning each conversation turn is not self-contained, but can modify the content before it. For example, the new deepseek-r1 distilled has The consequence is that it will break A solution is to also track the cached token at token level (not conversation level), which I introduced here #11203 , @ericcurtin feel free to port this to |
Thanks everyone for the insightful reviews! More from #9639 to come soon :-) |
Not sure if this is a special case or the template is broken, but when I load minimax-text-01 (my work-in-progress) with the following template:
with this PR llama.cpp crashes during model loading:
|
Hey @fairydreaming , thanks for testing & reporting! Your template contain an exotic
I could certainly make the error more informative though, feel free to file something on https://github.com/google/minja to that end (and/or any feature request). Looking forward to testing your model, good luck with it! |
@ochafik I did some research and it seems to be a custom keyword introduced in HF transformers: huggingface/transformers#30650 Fortunately among all the models I have currently on disk only MiniMax-Text-01 uses this. |
@fairydreaming thanks for researching that, will track support in google/minja#28 |
Subset of #9639 with just the Jinja templating support.
Proper tool support (grammar constraints, lazy grammar triggering, tool call parsing & stop reason) will come in a follow up PR.
--jinja
flag to llama-server, llama-cli, llama-run--chat-template-file
flag to llama-server, llama-cli (related: Added chat template support to llama-run #11215 )tokenizer.chat_template
(ortokenizer.chat_template.tool_use
if defined, only when the request has tools).trim_blocks = true, lstrip_blocks = true
)Example usage:
show output
TODO: