Skip to content

D-vector trained on VoxCeleb1

Compare
Choose a tag to compare
@yistLin yistLin released this 25 Jan 06:05
· 5 commits to master since this release

This release is to address the module loading issue after upgrading torchaudio to 0.8.0

Pretrained models

The model was trained on VoxCeleb1 dataset.

Model details:

  • 40-dim log mel spectrogram as input
  • 3-layer LSTM with hidden dimensions being 256
  • 256-dim attentive pooled speaker embedding

Training details:

  • 64 speakers, 10 utterances per speaker in a batch
  • 250K steps