Elevenlabs + WhisperX autosplicing via pydub

main.py contains a function that, given two strings (one being the actual "spoken" text and one being the text used to give it emotion), does the following:

Generates audio.mp3 containing the full text via elevenlabslib.
Uses whisperX to obtain word-level timestamps.
Identifies the start and end of the spoken text.
Uses pydub to create cut_audio.mp3, containing only the audio for the spoken text.

To try it out (I recommend doing this in a venv):

Do pip install -r pinnedrequirements.txt --no-deps
Edit api_key in main.py to contain your API key
Run it

There is a dependency conflict going on due to pyannote-audio requiring an older version of soundfile than the minimum for mp3 support (which elevenlabslib requires) but it works fine if you just force it, hence the requirements file with pinned versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Elevenlabs + WhisperX autosplicing via pydub

Files

README.md

Latest commit

History

README.md

File metadata and controls

Elevenlabs + WhisperX autosplicing via pydub