Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.tflite files support #41

Closed
stefangrotz opened this issue Nov 14, 2021 · 20 comments
Closed

.tflite files support #41

stefangrotz opened this issue Nov 14, 2021 · 20 comments

Comments

@stefangrotz
Copy link

stefangrotz commented Nov 14, 2021

After the mozilla layoffs, the deepspeech team forked the Deepspeech repo and founded the company Coqui AI (https://github.com/coqui-ai/STT) where they continue the development and AFAIK they now only allow .tflite files to export models. It theoretically should work with the old code, but for me it didn't.

When I try to run it like this:

python3 autosub/main.py --file /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3 --split-duration 8

with a .tflite file in the main folder and NO language model.

Then I get:

AutoSub

['autosub/main.py', '--file', '/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', '--split-duration', '8']
ARGS: Namespace(dry_run=False, file='/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', format=['srt', 'vtt', 'txt'], model=None, scorer=None, split_duration=8.0)
Warning no models specified via --model and none found in local directory. Please run getmodel.sh convenience script from autosub repo to get some.
Error: Must have pbmm model. Exiting

Have I done anything wrong here or doesn't AutoSub support .rflite files?

I tested it on MacOS and installed ffmpeg via homebrew.

@stefangrotz stefangrotz changed the title Support .tflite files .tflite files support Nov 14, 2021
@stefangrotz
Copy link
Author

stefangrotz commented Nov 14, 2021

Update: adding --model output_graph.tflite works in the beginning, but then this happens:

AutoSub

['autosub/main.py', '--file', '/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', '--model', 'output_graph.tflite']
ARGS: Namespace(dry_run=False, file='/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', format=['srt', 'vtt', 'txt'], model='output_graph.tflite', scorer=None, split_duration=5)
model: /Users/sgrotz/Downloads/AutoSub/output_graph.tflite
Warning no scorers specified via --scorer and none found in local directory. Please run getmodel.sh convenience script from autosub repo to get some.
scorer:

Input file: /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3
Creating file: /Users/sgrotz/Downloads/AutoSub/output/kp193-hejma-auxtomatigo.srt
Creating file: /Users/sgrotz/Downloads/AutoSub/output/kp193-hejma-auxtomatigo.vtt
Creating file: /Users/sgrotz/Downloads/AutoSub/output/kp193-hejma-auxtomatigo.txt
Extracted audio to audio/kp193-hejma-auxtomatigo.wav
Splitting on silent parts in audio file

Running inference:
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2021-11-14 19:04:06.298928: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Data loss: Can't parse /Users/sgrotz/Downloads/AutoSub/output_graph.tflite as binary proto
Invalid model file. Exiting

@stefangrotz
Copy link
Author

Same behavior on Colab, so it is not a MacOS issue.

@abhirooptalasila
Copy link
Owner

Hi
There is a different DeepSpeech package for tflite models. You can install it via:
$ pip install --user --update deepspeech-tflite

@stefangrotz
Copy link
Author

Thanks for the answer :) But it still doesn't work, this is how I installed everything on Colab:


%cd /content/AutoSub
!python3 -m venv sub
!source sub/bin/activate
!pip3 install -r requirements.txt
!pip3 install --user --update deepspeech-tflite 

@stefangrotz
Copy link
Author

stefangrotz commented Nov 15, 2021

It works now, the trick was not to use a venv and to remove --user --update:


%cd /content/AutoSub
#!python3 -m venv sub
#!source sub/bin/activate
!pip3 install -r requirements.txt
!pip install deepspeech-tflite 

Thanks for your help with my beginner problem :)

@mattdsteele
Copy link

Hi @stefangrotz, thanks for documenting your experiences! Here's an updated recipe for those wanting to use Coqui models with Docker:

  • Download .pbm and .tflite from coqui (e.g. https://coqui.ai/english/coqui/v0.9.3)
  • Add RUN pip3 install --user --update deepspeech-tflite to Dockerfile
  • Add COPY *.tflite ./ to Dockerfile
  • Rebuild container, and add --model model.tflite when starting

@mattdsteele
Copy link

One question for @stefangrotz @abhirooptalasila - the instructions above are just using the newer Coqui models with the existing DeepSpeech application, right? Would there be an advantage to using the STT toolkit instead of DeepSpeech? If so, any thoughts on what updating AutoSub to use it would look like?

@TechnologyClassroom
Copy link

Since Coqui STT is the continuation of DeepSpeech, it seems very similar to implement. I believe the process is to install stt==1.0.0, convert deepspeech to stt for python bindings, and point to the new model and scorer.

@abhirooptalasila
Copy link
Owner

Hi
I was planning on implementing either one of Wav2Vec or NeMo as an addition to DeepSpeech #44.
I'm not sure of the performance difference between Coqui, DeepSpeech, and the above two models. If you do, please let me know. The last time I tried Wav2Vec, the accuracy was much better than DeepSpeech.
Coqui is not hard to implement as the codebase is similar to DeepSpeech.

@TechnologyClassroom
Copy link

TechnologyClassroom commented Feb 1, 2022

I do not have statistics, but I would assume Coqui is better than DeepSpeech and Coqui comes with a wide variety of language models. Coqui would be the simplest way to expand the functionality of AutoSub.

Edit: wav2vec-U, wav2vec 2.0, and NeMo look good too. It would be great if the AutoSub user could pick from any of these backends.

@stefangrotz
Copy link
Author

I would add Vosk to the list, it works very well and has srt creation script out of the box.

But to keep things simple, I would say switching to Coqui might be a good first step since it is actively supported by a company while Deepspeech is abandoned by Mozilla.

@TechnologyClassroom
Copy link

TechnologyClassroom commented Feb 1, 2022

I gave Coqui STT a try.

sed -i 's/deepspeech/stt/g' autosub/utils.py
python3 autosub/main.py --model ~/coquistt/models/model.tflite --scorer ~/coquistt/models/huge-vocabulary.scorer --file ~/coquistt/example.webm

It seems to work as a drop-in replacement.

Splitting on silent parts in audio file

Running inference:
TensorFlow: v2.3.0-14-g4bdd3955115
 Coqui STT: v1.0.0-0-g27584037
  7%|███████▎                                                                                              | 20/280 [01:43<12:41,  2.93s/it]

There is more to do of course to make the switch, but it looks like it works conceptually.

Edit: It completed and I was able to compare the transcript between default DeepSpeech 0.9.3 and Coqui STT 1.0.0. Coqui STT was more accurate with complex words and they were about the same with one syllable words. Overall worth upgrading.

@stefangrotz stefangrotz reopened this Feb 2, 2022
@TechnologyClassroom
Copy link

Vosk includes an example python script to generate an srt file. I got that to work too.

pip3 install vosk
git clone https://github.com/alphacep/vosk-api
cd vosk-api/python/example
wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 model
python3 test_srt.py test.webm > test.srt

@abhirooptalasila
Copy link
Owner

Have some free time right now, so I will add Coqui support as a starter.

abhirooptalasila added a commit that referenced this issue Feb 2, 2022
- By default, Coqui will be used for inference, with an option to switch to DeepSpeech
- Coqui supports .tflite models out-of-the-box, whereas DeepSpeech needs a different package. Refer #41
- English models will be automatically downloaded if run without the model argument
- Updated README and requirements.txt to reflect changes
@abhirooptalasila
Copy link
Owner

@stefangrotz @TechnologyClassroom

Can you check the changes I pushed?
Cleaned up some stuff.

@TechnologyClassroom
Copy link

@abhirooptalasila It mostly looks good to me.

autosub/utils.py in 40bb833#diff-3a061d9e61ec5b9193e9d3b28ac973b27de1957273274863128833dfb99d923b might have some issues.

  • Lines 10-11 import both deepspeech and stt so both would be required.
  • Line 20 and 21 have a version mismatch. Line 20 should be "model": "https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v1.0.0/model.tflite",

Another little thing in the doc is that DeepSpeech can also use tflite instead of pbmm depending on how it is configured and this is how I tested DeepSpeech.

@abhirooptalasila
Copy link
Owner

I'm importing both of the models. By default, Coqui will be used, and the user can change to DeepSpeech if needed. Check here.
Will update that link!

I've added a line at the end of this section which points to this thread.

@TechnologyClassroom
Copy link

I could be wrong, but I think the imports need try as well or it will error.

@abhirooptalasila
Copy link
Owner

Shouldn't happen as I updated the requirements file also.
Can I close this thread?

@TechnologyClassroom
Copy link

That's true, but users would typically only need one or the other. Their requirements are likely to stray in the future as Coqui STT continues to develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants