RNN-Transducer Speech Recognition

End-to-end speech recognition using RNN-Transducer in Tensorflow 2.0

Overview

This speech recognition model is based off Google's Streaming End-to-end Speech Recognition For Mobile Devices research paper and is implemented in Python 3 using Tensorflow 2.0

Setup Your Environment

To setup your environment, run the following command:

git clone --recurse https://github.com/noahchalifour/rnnt-speech-recognition.git
cd rnnt-speech-recognition
pip install tensorflow==2.2.0 # or tensorflow-gpu==2.2.0 for GPU support
pip install -r requirements.txt
./scripts/build_rnnt.sh # to setup the rnnt loss

Common Voice

You can find and download the Common Voice dataset here

Convert all MP3s to WAVs

Before you can train a model on the Common Voice dataset, you must first convert all the audio mp3 filetypes to wavs. Do so by running the following command:

NOTE: Make sure you have ffmpeg installed on your computer, as it uses that to convert mp3 to wav

./scripts/common_voice_convert.sh <data_dir> <# of threads>
python scripts/remove_missing_samples.py \
    --data_dir <data_dir> \
    --replace_old

Preprocessing dataset

After converting all the mp3s to wavs you need to preprocess the dataset, you can do so by running the following command:

python preprocess_common_voice.py \
    --data_dir <data_dir> \
    --output_dir <preprocessed_dir>

Training a model

To train a simple model, run the following command:

python run_rnnt.py \
    --mode train \
    --data_dir <path to data directory>

Pretrained Model

Due to financial restrictions, I don't have the money to train a high quality model. If anybody is willing to train a model, you can send it to me and I will put it up here and give you credit. ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
cmake		cmake
debug		debug
scripts		scripts
utils		utils
warp-transducer @ ecbd47e		warp-transducer @ ecbd47e
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
hparams.py		hparams.py
model.py		model.py
preprocess_common_voice.py		preprocess_common_voice.py
preprocess_librispeech.py		preprocess_librispeech.py
quantize_model.py		quantize_model.py
requirements.txt		requirements.txt
run_rnnt.py		run_rnnt.py
streaming_transcribe.py		streaming_transcribe.py
transcribe_file.py		transcribe_file.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNN-Transducer Speech Recognition

Overview

Setup Your Environment

Common Voice

Convert all MP3s to WAVs

Preprocessing dataset

Training a model

Pretrained Model

About

Releases

Packages

Languages

License

stefan-falk/rnnt-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

RNN-Transducer Speech Recognition

Overview

Setup Your Environment

Common Voice

Convert all MP3s to WAVs

Preprocessing dataset

Training a model

Pretrained Model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages