Version 3.3.0
TL;DR
pyannote.audio
does speech separation: multi-speaker audio in, one audio channel per speaker out!
pip install pyannote.audio[separation]==3.3.0
New features
- feat(task): add
PixIT
joint speaker diarization and speech separation task (with @joonaskalda) - feat(model): add
ToTaToNet
joint speaker diarization and speech separation model (with @joonaskalda) - feat(pipeline): add
SpeechSeparation
pipeline (with @joonaskalda) - feat(io): add option to select torchaudio
backend
Fixes
- fix(task): fix wrong train/development split when training with (some) meta-protocols (#1709)
- fix(task): fix metadata preparation with missing validation subset (@clement-pages)
Improvements
- improve(io): when available, default to using
soundfile
backend - improve(pipeline): do not extract embeddings when
max_speakers
is set to 1 - improve(pipeline): optimize memory usage of most pipelines (#1713 by @benniekiss)