Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-apply chat template in SequenceGenerator and SequenceGeneratorAdapter, if available #1019

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

leloykun
Copy link
Contributor

@leloykun leloykun commented Jul 5, 2024

This PR auto-applies chat templates by default when using instruct/chat models. Doesn't support LlamaCPP for now tho.


Why?

Instruct/Chat models tend to be annoyingly template dependent (i.e. they perform worse if the prompts don't follow the chat template used in the finetuning step). And the more and longer they are finetuned, the worse the problem gets. Hence this PR.

Also see this issue #987 raised by @lapp0


Interface changes

This PR has minimal impact on the interface. It just changes the default behavior of the generation step.

However, this feature can be disabled either on the creation of the generator:

generator = outlines.generate.choice(model, ["Positive", "Negative"], apply_chat_template=False)

Or when calling the generator

answer = generator(prompt, apply_chat_template=False)

Copy link
Contributor

@lapp0 lapp0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for implementing this!

There's some things I'd like to see before this is ready for merge, could you please address the following

  • Test cases which ensure application of the chat template works properly for a standard case, for a missing chat template
  • Consider that some users of Outlines are relying on the current behavior and make apply_chat_template=False the default. Perhaps we can warn the user as well if they're not setting the parameter.
  • Add apply_chat_template explanation to the docs
  • Use apply_chat_template=True in all models.transformers generator examples

from outlines.models.transformers import TransformerTokenizer

if isinstance(prompts, str):
prompts = [prompts]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the signature should be List[str] -> List[str] and raise an error if a list isn't passed. In transformers.py, this function is called after the prompts are normalized to 2D anyways.

)
return prompts
tokenizer: "TransformerTokenizer" = model.tokenizer
if getattr(tokenizer.tokenizer, "chat_template", None) is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**model_specific_params,
):
"""Return a text generator from a prompt or a list of prompts."""
if apply_chat_template is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this pythonic, we should have one obvious way of applying a chat template. IMO the argument should only be accepted in the constructor.

)

def apply_chat_template(self, prompt: str) -> str:
messages = self.get_messages(prompt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have this be part of class Tokenizer and raise NotImplementedError by default.

@rlouf
Copy link
Member

rlouf commented Jul 5, 2024

I don't think this design is coherent with the rest of the library; we want to avoid kwargs as much as we possibly can. Here I would simply add a apply_chat_template to the model instance.

@leloykun
Copy link
Contributor Author

leloykun commented Jul 7, 2024

@lapp0 @rlouf

How do you guys think should the interface look like?


Here, I mirrored Huggingface's Pipeline interface where we can specify configs/args in the constructor and (optionally) override them in the model call. I like it cuz it's more flexible. But yah, it does make things a bit more complicated and less pythonic.

I did a quick look at other libraries, and it seems that they either (1) don't auto-apply the chat templates at all or (2) have a separate code path for base & instruct/chat models (e.g. TGI & MixEval). I think there are two reasons why:

  1. It's hard to know which models are base models and which are instruct/chat models. I thought checking whether chat_template is None or not would suffice. But some chat models apparently just leave them out (especially the older ones & third-party finetunes). Additionally, transformers' PreTrainedTokenizerBase base class has a default_chat_template property--so, if I'm not mistaken, we can run tokenizer.apply_chat_template on all tokenizers without erroring out. And

  2. Some models don't support system prompts and it's hard to know which ones do and which ones don't.

So yah, for now, we need a way to somehow let the library know that whether we're dealing with a base model or an instruct/chat model. Worst case is we might also need to ask the user to specify the system prompt. But if we're gonna force them to go all that trouble anyway, we might as well not do this by default.

I think a good compromise is to (1) not apply the chat templates by default but (2) warn the user if the chat template is specified in the tokenizer config but is not being used.

@rlouf
Copy link
Member

rlouf commented Jul 7, 2024

answer = generator(
   model.apply_chat_template(prompt)
)

It would substantially simplify the code in this PR as well.

@alonsosilvaallende
Copy link
Contributor

alonsosilvaallende commented Jul 13, 2024

I like this pull request that adds the possibility to use the tokenizer's apply_chat_template, but I wonder if it's a good idea to make it the default behavior. I have had very bad experiences with apply_chat_template where it add spaces or remove spaces when it shouldn't or even worse in function calling cases where it completely ignores the functions given without even raising errors (see this example). Many people might complain about something which is outside the control of Outlines.

@rlouf
Copy link
Member

rlouf commented Jul 13, 2024

Indeed, we are not going to make it the default behavior. Users should be able to inspect the modified prompt before they pass it to the generator.

@rlouf
Copy link
Member

rlouf commented Jul 18, 2024

@leloykun are you still working on this?

@leloykun
Copy link
Contributor Author

@rlouf yup! I just deprioritized this in favor of other stuff; I'll get back to this soon

btw, thanks for the feedback, @alonsosilvaallende!

@Bit0r
Copy link

Bit0r commented Jul 27, 2024

So can this feature still be added? A chat template is really needed in many applications.

@lapp0
Copy link
Contributor

lapp0 commented Jul 27, 2024

So can this feature still be added? A chat template is really needed in many applications.

Yes, if you'd like to introduce it in a PR, I'll review and support its inclusion provided it follows the behavior discussed in this thread! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants