Replies: 2 comments 4 replies
-
there are plenty free translation models using huggingface's also there is no best setting for whisper, if you are a beginner then default settings work fine, there is a setting called to improve audio quality, you can try SileroVAD to remove silence, Demucs to extract vocals, you can inspire from https://github.com/EtienneAb3d/WhisperHallu for timestamps you should fix it manually, for now |
Beta Was this translation helpful? Give feedback.
-
thank you for your answer. I did a test and it worked fine. To add an output in srt I added :
But I'm not sure if this is the way I should do it, because it didn't output a .srt file. I copied and pasted the output. I would also like to know what you think about the model. From what I understand, the script uses the large template. To transcribe from American, isn't it better to use an .en template? and which one? knowing that using google colab, I don't have any RAM or GPU worries. Can you tell me where I have to make changes. In the 30 lines script or in the file transcribeHallu.py ? And is it possible to add a translation of the .srt file into French directly? How do I translate a .srt file (edited: without losing the time code) |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm a user, tinkerer, but not necessarily experienced.
Thank you for reading me.
The use of Whisper came because I was following a training on youtube in American language, and I'm French, with simple background. So I started by trying Whisper on my PC, but it was long, very long...each video lasting an hour on average.
That's when I discovered Google Colab, with the GPU.
(tell me if it's good or if I should look into Jupyter Notebook or something else...)
I then found a ready-made notebook ( whisper_youtube.ipynb from ArthurFDLR on Github). I started using it, then tried to modify it by adding yt-dlp, then ffmpeg, etc.. but my skills are limited, I repeat.
My project is to translate each video, then add the new subtitles in french. I guess .SRT is the right solution? Any other ideas?
Then the second part would be to use a "TTS Engine" and thus modify the soundtrack. What would be great would be if I could train a voice of my choice.
I know that one of the longest part of the project is the translation, I don't want to use the translation provided by google or deepl without proofreading it.
So I thought that to get the best translation possible, the soundtrack of the video must be of the best quality, then the settings of Whisper must be the best possible. I tried with FP16 and without but no difference, I increased "Number of beams [...] is zero" and there appeared differences, not in the transcription but in the timestamp.
So I would like to ask you where you would advise me to read to understand how each setting works in simple language.
If you have any work or ideas to share, I'm interested.
I have other questions about time stamps, silences, etc., which may not be the same in each language. If it's a project you're interested in, don't hesitate to come in private, I have a semi private Discord where I try to store as much info as possible and to work in collaboration.
I also tried to use Deepl's API, but it costs me 1€ per document translation, it's not cheap, it's a lot of questions, but I'm motivated.
Beta Was this translation helpful? Give feedback.
All reactions