-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process with a VTT or SRT in realtime or not #140
Comments
https://github.com/KoljaB/TurnVoice/blob/main/turnvoice%2Fcore%2Fsynthesis.py#L272 This does something very similar. |
well, even if it's not realtime it will help a lot already ;). I'm working on it for now but my biggest issue is to make a dummy device working as my computer does not have any soundcard.... |
I'd parse the file for lengths to get the duration and put this as desired_duration parameter to the synthesize_duration method. So I get the text spoken in the correct time. Fill up with silence for the parts where nothing is spoken and you're good I guess. |
It's hard to make this realtime. Because the final duration of the synthesis generation is unknown beforehand (especially with neural TTS engines with a nondeterministic synthesis output) we testsynthesize here, measure the duration of the result and apply a speed correction factor afterwards. So we stretch the audio in place. But we need the full audio generated to do this, that's far away from realtime. |
oh my! sorry I just realized the link you sent is another repo. turnvoice is already a very good start indeed! |
@KoljaB I opened a new discussion on turnvoice repo to discuss about vtt/srt import as I think it's a better repo to add an option to import SRT/VTT rather than video/audio then bypass STT, translation, and keep TTS as the only process. |
It would be fantastic to use RealtimeTTS from a VTT or SRT file (or other subtitle formats) to let the engine respect the start time of each segment, so as this we can have a direct audio translation in realtime audio or recorded on an audio file (aac, wav or mp3 for example)
Unless it's already possible to do it?
The text was updated successfully, but these errors were encountered: