Skip to content

Version 3.1.0

Compare
Choose a tag to compare
@hbredin hbredin released this 16 Nov 12:37
· 111 commits to develop since this release

TL;DR

pyannote/speaker-diarization-3.1 no longer requires unpopular ONNX runtime

Full changelog

New features

  • feat(model): add WeSpeaker embedding wrapper based on PyTorch
  • feat(model): add support for multi-speaker statistics pooling
  • feat(pipeline): add TimingHook for profiling processing time
  • feat(pipeline): add ArtifactHook for saving internal steps
  • feat(pipeline): add support for list of hooks with Hooks
  • feat(utils): add "soft" option to Powerset.to_multilabel

Fixes

  • fix(pipeline): add missing "embedding" hook call in SpeakerDiarization
  • fix(pipeline): fix AgglomerativeClustering to honor num_clusters when provided
  • fix(pipeline): fix frame-wise speaker count exceeding max_speakers or detected num_speakers in SpeakerDiarization pipeline

Improvements

  • improve(pipeline): compute fbank on GPU when requested

Breaking changes

  • BREAKING(pipeline): rename WeSpeakerPretrainedSpeakerEmbedding to ONNXWeSpeakerPretrainedSpeakerEmbedding
  • BREAKING(setup): remove onnxruntime dependency.
    You can still use ONNX hbredin/wespeaker-voxceleb-resnet34-LM but you will have to install onnxruntime yourself.
  • BREAKING(pipeline): remove logging_hook (use ArtifactHook instead)
  • BREAKING(pipeline): remove onset and offset parameter in SpeakerDiarizationMixin.speaker_count
    You should now binarize segmentations before passing them to speaker_count