-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust: How to handle models with precompiled_charsmap = null
#1627
Comments
precompiled_charsmap
precompiled_charsmap = null
precompiled_charsmap = null
precompiled_charsmap = null
I'm seeing the same error with Python when trying to read the tokenizer from Xenova/speecht5_tts. wget https://huggingface.co/Xenova/speecht5_tts/resolve/main/tokenizer.json from tokenizers import Tokenizer
Tokenizer.from_file("tokenizer.json")
With Tokenizers 0.19.0, this raised an error which could be handled rather than a panic. It looks like this may be related to #1604. |
I'm also facing the same issue (#1645) with speecht5_tts. |
I think passing a |
Xenova implementation doesn't call the value directly but applies iterators over config normalizers. I think that it ignores the null values. I agree with you, add support for |
I've implemented spm_precompiled with null support at vicantwin/spm_precompiled, which includes a test with null support, and all tests pass successfully. But, I need some help with changing this repository, as I'm not entirely familiar with this codebase and unsure how to implement the necessary changes. Any help would be greatly appreciated. |
Hi guys,
I'm currently working on supabase/edge-runtime#368 that pretends to add a rust implementation of
pipeline()
.While I was coding the
translation
task I figured out that I can't load theTokenizer
instance for Xenova/opus-mt-en-fronnx
model and their otheropus-mt-*
variants.I got the following:
I now that it occurs because their
tokenizer.json
file was the following:opus-mt-en-fr:
While the expected behavior must be something like this:
nllb-200-distilled-600M:
Looking in the original version of Helsinki-NLP/opus-mt-en-fr I notice that is no
tokenizer.json
file for it.I would like to know if is the
precompiled_charsmap
necessary expect a non-null?Is there some workaround to execute theses models without change the internal model files?
How can I handle an exported
onnx
model that doesn't have thetokenizer.json
file?The text was updated successfully, but these errors were encountered: