-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: grouped conv1d #1749
feat: grouped conv1d #1749
Conversation
e29cb28
to
d93a7f4
Compare
@ebraraktas Thanks for your PR. Great job! I locally built this and tested it. Now I observe the improved computation in the wav2vec2 model and will make PR soon. But I found one thing to share here. For GPU with int8 quantization, the whisper works well as it did but the wav2vec2 with groups greater than 1 got the throw exception as you expected in your previous PR. The following one you suggested worked out indeed.
I prefer to int8 quantization in Conv1D but the unquantized Conv1D would be OK for the wav2vec2. I am still curious how come the whisper model doesn't need such unquantized Conv1D. I think both models use the same Conv1D except groups (greater than 1) in the wav2vec2. Any comments? |
I don't expect any conversion errors on this branch. Can you share your conversion script? @homink |
@ebraraktas I didn't meet any conversion errors with int8 quantization for both ASR models. Both model conversions worked without errors. And both converted models worked out for the recognition as expected on CPU. But the wav2vec2 met that throw exception for the recognition on GPU. Unquantized Conv1D for the wav2vec2 worked out for the model conversion as well as for the recognition. I have been using the following command for the conversion.
|
@homink I get it. This commit must have fixed that case, too. It converts conv weights to float if device is CUDA (see these lines). If you have a chance to debug, it would be great if you check that if this happens. Otherwise, can you check if conv weights have |
I printed out name by adding std::cout.
The wav2vec2 model:
The whisper model:
The wav2vec2 doesn't show anything including with 'conv'. Here is the spec class variables used for the wav2vec2 conversion in Python. I am not sure what makes not print such feat_layer or pos_conv_embed. Even it's unclear how come the whisper only prints out not up to layer_23 but up to layer_3 where 0-23 is the number of transformer blocks in the model architecture. @ebraraktas any comments?
|
@homink I think it is related to |
That condition removal worked out indeed. Thanks! @ebraraktas |
@minhthuc2502 @vince62s @nguyendc-systran Could you review this PR and merge it? The current conv1d doesn't have groups while PyTorch has it. I guess this was not considered because the Whisper model doesn't have it. However, this is a key component for the Depthwise convolution and other speech recognition models (including Wav2Vec2 model families) have it. This PR will allow more speech recognition models to be available in CTranslate2 framework. I am going to make PR for the Wav2Vec2 upgrade working more efficiently in regard with both speed and memory. The Wav2Vec2-BERT is being reviewed on my end. |
Thank you for your PR @ebraraktas . It looks good to me. |
Implements grouped conv1d. This was requested by @homink .