Implementation of Automatic Speech Recognition inspired by Listen, Attend and Spell and Attention Is All You Need papers in PyTorch
-
Trained on LibriSpeech
-
Encoder-Decoder architecture with attention
-
Encoders:
- 2D Conv network over log-mel spectrogram
- Followed by several GRU layers
- Or followed by several self-attention layers
-
Decoders:
- GRU layers with dot-product attention over encoder
- Self-attention layers with dot-product attention over encoder