Skip to content

Commit

Permalink
Merge pull request #13 from huggingface/mllama-converter-updates
Browse files Browse the repository at this point in the history
Converter: model_max_length, avoid null chat_template
  • Loading branch information
pcuenca authored Sep 19, 2024
2 parents ca94ea0 + 5d98a72 commit 717d579
Showing 1 changed file with 3 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -463,13 +463,15 @@ def __init__(self, vocab_file, num_reserved_special_tokens=256, chat_template=No
self.additional_special_tokens = special_tokens
tokenizer = self.converted()

instruct_kwargs = {"chat_template": chat_template} if instruct else {}
self.tokenizer = PreTrainedTokenizerFast(
tokenizer_object=tokenizer,
bos_token="<|begin_of_text|>",
eos_token="<|end_of_text|>" if not instruct else "<|eot_id|>",
pad_token="<|finetune_right_pad_id|>",
chat_template=chat_template if instruct else None,
model_input_names=["input_ids", "attention_mask"],
model_max_length=131072,
**instruct_kwargs,
)


Expand Down

0 comments on commit 717d579

Please sign in to comment.