.tflite files support #41

stefangrotz · 2021-11-14T17:26:36Z

After the mozilla layoffs, the deepspeech team forked the Deepspeech repo and founded the company Coqui AI (https://github.com/coqui-ai/STT) where they continue the development and AFAIK they now only allow .tflite files to export models. It theoretically should work with the old code, but for me it didn't.

When I try to run it like this:

python3 autosub/main.py --file /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3 --split-duration 8

with a .tflite file in the main folder and NO language model.

Then I get:

AutoSub

['autosub/main.py', '--file', '/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', '--split-duration', '8']
ARGS: Namespace(dry_run=False, file='/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', format=['srt', 'vtt', 'txt'], model=None, scorer=None, split_duration=8.0)
Warning no models specified via --model and none found in local directory. Please run getmodel.sh convenience script from autosub repo to get some.
Error: Must have pbmm model. Exiting

Have I done anything wrong here or doesn't AutoSub support .rflite files?

I tested it on MacOS and installed ffmpeg via homebrew.

The text was updated successfully, but these errors were encountered:

stefangrotz · 2021-11-14T18:03:48Z

Update: adding --model output_graph.tflite works in the beginning, but then this happens:

AutoSub

['autosub/main.py', '--file', '/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', '--model', 'output_graph.tflite']
ARGS: Namespace(dry_run=False, file='/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', format=['srt', 'vtt', 'txt'], model='output_graph.tflite', scorer=None, split_duration=5)
model: /Users/sgrotz/Downloads/AutoSub/output_graph.tflite
Warning no scorers specified via --scorer and none found in local directory. Please run getmodel.sh convenience script from autosub repo to get some.
scorer:

Input file: /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3
Creating file: /Users/sgrotz/Downloads/AutoSub/output/kp193-hejma-auxtomatigo.srt
Creating file: /Users/sgrotz/Downloads/AutoSub/output/kp193-hejma-auxtomatigo.vtt
Creating file: /Users/sgrotz/Downloads/AutoSub/output/kp193-hejma-auxtomatigo.txt
Extracted audio to audio/kp193-hejma-auxtomatigo.wav
Splitting on silent parts in audio file

Running inference:
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2021-11-14 19:04:06.298928: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Data loss: Can't parse /Users/sgrotz/Downloads/AutoSub/output_graph.tflite as binary proto
Invalid model file. Exiting

stefangrotz · 2021-11-14T18:46:14Z

Same behavior on Colab, so it is not a MacOS issue.

abhirooptalasila · 2021-11-15T03:53:15Z

Hi
There is a different DeepSpeech package for tflite models. You can install it via:
$ pip install --user --update deepspeech-tflite

stefangrotz · 2021-11-15T10:10:49Z

Thanks for the answer :) But it still doesn't work, this is how I installed everything on Colab:


%cd /content/AutoSub
!python3 -m venv sub
!source sub/bin/activate
!pip3 install -r requirements.txt
!pip3 install --user --update deepspeech-tflite

stefangrotz · 2021-11-15T12:09:16Z

It works now, the trick was not to use a venv and to remove --user --update:


%cd /content/AutoSub
#!python3 -m venv sub
#!source sub/bin/activate
!pip3 install -r requirements.txt
!pip install deepspeech-tflite

Thanks for your help with my beginner problem :)

mattdsteele · 2022-01-30T20:42:09Z

Hi @stefangrotz, thanks for documenting your experiences! Here's an updated recipe for those wanting to use Coqui models with Docker:

Download .pbm and .tflite from coqui (e.g. https://coqui.ai/english/coqui/v0.9.3)
Add RUN pip3 install --user --update deepspeech-tflite to Dockerfile
Add COPY *.tflite ./ to Dockerfile
Rebuild container, and add --model model.tflite when starting

mattdsteele · 2022-01-30T20:50:40Z

One question for @stefangrotz @abhirooptalasila - the instructions above are just using the newer Coqui models with the existing DeepSpeech application, right? Would there be an advantage to using the STT toolkit instead of DeepSpeech? If so, any thoughts on what updating AutoSub to use it would look like?

TechnologyClassroom · 2022-02-01T04:22:50Z

Since Coqui STT is the continuation of DeepSpeech, it seems very similar to implement. I believe the process is to install stt==1.0.0, convert deepspeech to stt for python bindings, and point to the new model and scorer.

abhirooptalasila · 2022-02-01T05:41:13Z

Hi
I was planning on implementing either one of Wav2Vec or NeMo as an addition to DeepSpeech #44.
I'm not sure of the performance difference between Coqui, DeepSpeech, and the above two models. If you do, please let me know. The last time I tried Wav2Vec, the accuracy was much better than DeepSpeech.
Coqui is not hard to implement as the codebase is similar to DeepSpeech.

TechnologyClassroom · 2022-02-01T14:11:07Z

I do not have statistics, but I would assume Coqui is better than DeepSpeech and Coqui comes with a wide variety of language models. Coqui would be the simplest way to expand the functionality of AutoSub.

Edit: wav2vec-U, wav2vec 2.0, and NeMo look good too. It would be great if the AutoSub user could pick from any of these backends.

stefangrotz · 2022-02-01T19:18:33Z

I would add Vosk to the list, it works very well and has srt creation script out of the box.

But to keep things simple, I would say switching to Coqui might be a good first step since it is actively supported by a company while Deepspeech is abandoned by Mozilla.

TechnologyClassroom · 2022-02-01T20:43:35Z

I gave Coqui STT a try.

sed -i 's/deepspeech/stt/g' autosub/utils.py
python3 autosub/main.py --model ~/coquistt/models/model.tflite --scorer ~/coquistt/models/huge-vocabulary.scorer --file ~/coquistt/example.webm

It seems to work as a drop-in replacement.

Splitting on silent parts in audio file

Running inference:
TensorFlow: v2.3.0-14-g4bdd3955115
 Coqui STT: v1.0.0-0-g27584037
  7%|███████▎                                                                                              | 20/280 [01:43<12:41,  2.93s/it]

There is more to do of course to make the switch, but it looks like it works conceptually.

Edit: It completed and I was able to compare the transcript between default DeepSpeech 0.9.3 and Coqui STT 1.0.0. Coqui STT was more accurate with complex words and they were about the same with one syllable words. Overall worth upgrading.

TechnologyClassroom · 2022-02-02T13:49:07Z

Vosk includes an example python script to generate an srt file. I got that to work too.

pip3 install vosk
git clone https://github.com/alphacep/vosk-api
cd vosk-api/python/example
wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 model
python3 test_srt.py test.webm > test.srt

abhirooptalasila · 2022-02-02T16:26:37Z

Have some free time right now, so I will add Coqui support as a starter.

- By default, Coqui will be used for inference, with an option to switch to DeepSpeech - Coqui supports .tflite models out-of-the-box, whereas DeepSpeech needs a different package. Refer #41 - English models will be automatically downloaded if run without the model argument - Updated README and requirements.txt to reflect changes

abhirooptalasila · 2022-02-02T18:42:24Z

@stefangrotz @TechnologyClassroom

Can you check the changes I pushed?
Cleaned up some stuff.

TechnologyClassroom · 2022-02-02T18:54:29Z

@abhirooptalasila It mostly looks good to me.

autosub/utils.py in 40bb833#diff-3a061d9e61ec5b9193e9d3b28ac973b27de1957273274863128833dfb99d923b might have some issues.

Lines 10-11 import both deepspeech and stt so both would be required.
Line 20 and 21 have a version mismatch. Line 20 should be "model": "https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v1.0.0/model.tflite",

Another little thing in the doc is that DeepSpeech can also use tflite instead of pbmm depending on how it is configured and this is how I tested DeepSpeech.

abhirooptalasila · 2022-02-03T05:44:47Z

I'm importing both of the models. By default, Coqui will be used, and the user can change to DeepSpeech if needed. Check here.
Will update that link!

I've added a line at the end of this section which points to this thread.

TechnologyClassroom · 2022-02-03T15:06:00Z

I could be wrong, but I think the imports need try as well or it will error.

abhirooptalasila · 2022-02-03T16:00:48Z

Shouldn't happen as I updated the requirements file also.
Can I close this thread?

TechnologyClassroom · 2022-02-03T16:08:54Z

That's true, but users would typically only need one or the other. Their requirements are likely to stray in the future as Coqui STT continues to develop.

stefangrotz changed the title ~~Support .tflite files~~ .tflite files support Nov 14, 2021

stefangrotz closed this as completed Nov 15, 2021

stefangrotz reopened this Feb 2, 2022

abhirooptalasila closed this as completed Feb 8, 2022

Elizabeth-0 mentioned this issue Jun 11, 2024

.tflite doesn't exist #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.tflite files support #41

.tflite files support #41

stefangrotz commented Nov 14, 2021 •

edited

Loading

stefangrotz commented Nov 14, 2021 •

edited

Loading

stefangrotz commented Nov 14, 2021

abhirooptalasila commented Nov 15, 2021

stefangrotz commented Nov 15, 2021

stefangrotz commented Nov 15, 2021 •

edited

Loading

mattdsteele commented Jan 30, 2022

mattdsteele commented Jan 30, 2022

TechnologyClassroom commented Feb 1, 2022

abhirooptalasila commented Feb 1, 2022

TechnologyClassroom commented Feb 1, 2022 •

edited

Loading

stefangrotz commented Feb 1, 2022

TechnologyClassroom commented Feb 1, 2022 •

edited

Loading

TechnologyClassroom commented Feb 2, 2022

abhirooptalasila commented Feb 2, 2022

abhirooptalasila commented Feb 2, 2022

TechnologyClassroom commented Feb 2, 2022

abhirooptalasila commented Feb 3, 2022

TechnologyClassroom commented Feb 3, 2022

abhirooptalasila commented Feb 3, 2022

TechnologyClassroom commented Feb 3, 2022

.tflite files support #41

.tflite files support #41

Comments

stefangrotz commented Nov 14, 2021 • edited Loading

stefangrotz commented Nov 14, 2021 • edited Loading

stefangrotz commented Nov 14, 2021

abhirooptalasila commented Nov 15, 2021

stefangrotz commented Nov 15, 2021

stefangrotz commented Nov 15, 2021 • edited Loading

mattdsteele commented Jan 30, 2022

mattdsteele commented Jan 30, 2022

TechnologyClassroom commented Feb 1, 2022

abhirooptalasila commented Feb 1, 2022

TechnologyClassroom commented Feb 1, 2022 • edited Loading

stefangrotz commented Feb 1, 2022

TechnologyClassroom commented Feb 1, 2022 • edited Loading

TechnologyClassroom commented Feb 2, 2022

abhirooptalasila commented Feb 2, 2022

abhirooptalasila commented Feb 2, 2022

TechnologyClassroom commented Feb 2, 2022

abhirooptalasila commented Feb 3, 2022

TechnologyClassroom commented Feb 3, 2022

abhirooptalasila commented Feb 3, 2022

TechnologyClassroom commented Feb 3, 2022

stefangrotz commented Nov 14, 2021 •

edited

Loading

stefangrotz commented Nov 14, 2021 •

edited

Loading

stefangrotz commented Nov 15, 2021 •

edited

Loading

TechnologyClassroom commented Feb 1, 2022 •

edited

Loading

TechnologyClassroom commented Feb 1, 2022 •

edited

Loading