List of TTS papers with audio samples provided by the authors. The last rows of each paper show the spectrogram inversion (vocoder) being used.
For more comprehensive list of important TTS papers, I recommmend reading xcmyz/speech-synthesis-paper written by Zhengxi Liu.
- FastPitch - FastPitch: Parallel Text-to-speech with Pitch Prediction
- https://fastpitch.github.io/
- WaveGlow
- EATS - End-to-End Adversarial Text-to-Speech
- Glow-TTS - Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
- Flowtron - Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
- https://nv-adlr.github.io/Flowtron
- WaveGlow
- Tacotron2+DCA - Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
- GAN-TTS - High Fidelity Speech Synthesis with Adversarial Networks
- https://storage.googleapis.com/deepmind-media/research/abstract.wav
- End-to-end model (Built on top of 200Hz linguistic & log pitch features)
- Multi-lingual Tacotron2 - Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
- MelNet - MelNet: A Generative Model for Audio in the Frequency Domain
- FastSpeech - FastSpeech: Fast, Robust and Controllable Text to Speech
- ParaNet - Parallel Neural Text-to-Speech
- https://parallel-neural-tts-demo.github.io
- WaveVAE, ClariNet, WaveNet
- Transformer-TTS - Neural Speech Synthesis with Transformer Network
- Multi-speaker Tacotron2 - Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
- Tacotron2+GST - Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
- Tacotron2 - Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
- Tacotron - Tacotron: Towards End-to-End Speech Synthesis
TODO