-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoTokenizer vs. BertTokenizer #17809
Comments
Hi, The Here you're comparing it to a So the behaviour is expected, and the error message pretty self-explanatory if you ask me. |
The docs for AutoTokenizer say,
I do not pass a config, so I would assume that AutoTokenizer would instantiate |
Hi @macleginn , Thanks for letting us know that this behavior isn't intuitive for you! Regarding the fact that
Do you think we should do something differently to make it clearer? Regarding the error message that you're getting, do you think it would have been clearer to have:
|
Hi @SaulLu,
Yes, sure. Given this message, I would realise, first, that I need to use
Perhaps mention this in the preamble to the model list? Something along the lines of
But if you assume that the user should familiarise themselves with the params, it's okay as it is, as long as the error message points to something that can be found in the docs. |
Hi, |
Hi, CANINE is a bit of a special model, it doesn't have a fast implementation since it's character based (Rust implementations are only for these fancy tokenization algorithms like WordPiece, BPE etc). I'd recommend to just use |
Hello, using |
System Info
Who can help?
With transformers-4.20.1 and tokenizers-0.12.1, I get the following behaviour:
Regardless of whether this is expected or not, this is unintuitive and confusing. E.g., am I even getting correct tokenisation by using a more general tokeniser class?
@SaulLu @LysandreJik
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
See above.
Expected behavior
The text was updated successfully, but these errors were encountered: