F5-TTS is a web application that allows users to clone voices and generate text-to-speech audio using advanced AI models.
- Upload and process reference audio
- Automatic transcription of reference audio
- Text-to-speech generation using F5-TTS or E2-TTS models
- Custom prompt input for generated speech
- Audio playback and download
The F5-TTS interface provides an intuitive way to upload reference audio, visualize the waveform, and generate new speech based on the cloned voice.
- Backend: Python, Flask
- Frontend: HTML, JavaScript, Tailwind CSS
- AI Models: F5-TTS, E2-TTS
- Audio Processing: librosa, soundfile, pydub
- Transcription: faster-whisper
The application supports reference audio clips ranging from 1 second to 25 seconds in length. This range is optimized for the best performance of the F5-TTS and E2-TTS models. While users can use longer audio clips, the results may not be as desirable or consistent.
For optimal results, it's recommended to use reference audio within the 1-25 second range. The application includes functionality to process longer audio files, but users should be aware that exceeding the recommended length might impact the quality of the voice cloning and generated speech.
-
Clone the repository:
git clone https://github.com/ThisModernDay/f5-tts.git cd f5-tts
-
Create and activate a new Conda environment with Python 3.10:
conda create -n f5-tts python=3.10 conda activate f5-tts
-
Install the required packages:
pip install -r requirements.txt
-
Set up the environment variables (if necessary).
-
Run the Flask application:
python app.py
-
Open a web browser and navigate to
http://localhost:5000
.
- Upload a reference audio file (WAV or MP3, ideally between 1-25 seconds).
- The application will automatically transcribe the audio.
- Enter your desired prompt text.
- Choose between F5-TTS and E2-TTS models.
- Click "Generate Audio" to create the cloned voice audio.
- Play the generated audio or download it.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.