-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wav2Vec2Bert ASR Inference Support #1778
Conversation
Looking forward to this if it's implemented. |
This PR requires transformers>=4.41.0 which is not available yet here so skipped by checking the version. TestWav2Vec2Bert will work once it meets the version. @minhthuc2502, I wonder if I can upgrade transformers by 4.41.0 in python test requirement. Could you please let me know? |
Thank you for your PR. It looks good to me. |
@minhthuc2502 it looks like transformers 4.41.0 makes some conflict on test_transformers_translation, although such embed_scale is available. Any ideas?
|
I think because of some changes in transformers decribed in this PR #1760. I see you have already applied the patch. Thanks! I'll merge this. |
@minhthuc2502 Could you check when the next release is scheduled? We are looking forward to having this feature in the official version so that many other groups can benefit from it. |
This PR allows Wav2Vec2Bert ASR inference within the CTranslate2 framework, specifically improving both speed and memory usage. For the inference processing, Sigmoid activation function is added to process the GLU activation and asymmetric relative positional embedding logic is added in the Attention Class. Compared to the HuggingFace implementation, the int8 quantized model shows an 12% increase in speed and a 61% reduction in GPU memory usage with a 72% reduction in CPU memory usage when processing 300 audio files. Additionally, using an N-gram language model with pyctcdecode further can improve the speech recognition accuracy. My environment includes an NVIDIA GeForce RTX 2080 11GB with CUDA 12.4, torch==2.12+cu12.1, and transformers==4.42.0.