-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.tflite files support #41
Comments
Update: adding
|
Same behavior on Colab, so it is not a MacOS issue. |
Hi |
Thanks for the answer :) But it still doesn't work, this is how I installed everything on Colab:
|
It works now, the trick was not to use a venv and to remove
Thanks for your help with my beginner problem :) |
Hi @stefangrotz, thanks for documenting your experiences! Here's an updated recipe for those wanting to use Coqui models with Docker:
|
One question for @stefangrotz @abhirooptalasila - the instructions above are just using the newer Coqui models with the existing DeepSpeech application, right? Would there be an advantage to using the STT toolkit instead of DeepSpeech? If so, any thoughts on what updating AutoSub to use it would look like? |
Hi |
I do not have statistics, but I would assume Coqui is better than DeepSpeech and Coqui comes with a wide variety of language models. Coqui would be the simplest way to expand the functionality of AutoSub. Edit: wav2vec-U, wav2vec 2.0, and NeMo look good too. It would be great if the AutoSub user could pick from any of these backends. |
I would add Vosk to the list, it works very well and has srt creation script out of the box. But to keep things simple, I would say switching to Coqui might be a good first step since it is actively supported by a company while Deepspeech is abandoned by Mozilla. |
I gave Coqui STT a try.
It seems to work as a drop-in replacement.
There is more to do of course to make the switch, but it looks like it works conceptually. Edit: It completed and I was able to compare the transcript between default DeepSpeech 0.9.3 and Coqui STT 1.0.0. Coqui STT was more accurate with complex words and they were about the same with one syllable words. Overall worth upgrading. |
Vosk includes an example python script to generate an srt file. I got that to work too.
|
Have some free time right now, so I will add Coqui support as a starter. |
- By default, Coqui will be used for inference, with an option to switch to DeepSpeech - Coqui supports .tflite models out-of-the-box, whereas DeepSpeech needs a different package. Refer #41 - English models will be automatically downloaded if run without the model argument - Updated README and requirements.txt to reflect changes
@stefangrotz @TechnologyClassroom Can you check the changes I pushed? |
@abhirooptalasila It mostly looks good to me.
Another little thing in the doc is that DeepSpeech can also use tflite instead of pbmm depending on how it is configured and this is how I tested DeepSpeech. |
I could be wrong, but I think the imports need |
Shouldn't happen as I updated the requirements file also. |
That's true, but users would typically only need one or the other. Their requirements are likely to stray in the future as Coqui STT continues to develop. |
After the mozilla layoffs, the deepspeech team forked the Deepspeech repo and founded the company Coqui AI (https://github.com/coqui-ai/STT) where they continue the development and AFAIK they now only allow .tflite files to export models. It theoretically should work with the old code, but for me it didn't.
When I try to run it like this:
python3 autosub/main.py --file /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3 --split-duration 8
with a .tflite file in the main folder and NO language model.
Then I get:
AutoSub
Have I done anything wrong here or doesn't AutoSub support .rflite files?
I tested it on MacOS and installed ffmpeg via homebrew.
The text was updated successfully, but these errors were encountered: