Implementation of Automatic Speech Recognition inspired by Listen, Attend and Spell and Attention Is All You Need papers in PyTorch
Trained on LibriSpeech
Encoder-Decoder architecture with attention
- 2D Conv network over log-mel spectrogram
- Followed by several GRU layers
- Or followed by several self-attention layers
- GRU layers with dot-product attention over encoder
- Self-attention layers with dot-product attention over encoder